Encyclopaedia Britannica and its Merriam-Webster subsidiary have initiated legal action in Manhattan federal court, accusing Microsoft-backed OpenAI of using their online reference materials to train large language models behind ChatGPT.
The complaint, filed on Friday, contends that OpenAI incorporated Britannica's online articles, encyclopedia entries and dictionary definitions into the dataset used to teach ChatGPT how to respond to human prompts. Britannica says the company has "cannibalized" its web traffic by returning AI-generated summaries that replicate the publisher's content.
According to the filing, OpenAI copied nearly 100,000 Britannica articles to train its GPT family of large language models. Britannica alleges that ChatGPT can produce "near-verbatim" copies of encyclopedia entries, dictionary definitions and other material, diverting users who otherwise would have visited Britannica's sites.
In addition to claims of copyright infringement, the complaint accuses OpenAI of trademark violations. Britannica says OpenAI has implied it had permission to reproduce Britannica content and has wrongfully cited Britannica in false AI "hallucinations," creating an impression of endorsement or authorization.
The lawsuit requests an unspecified amount in monetary damages and asks the court for an order blocking the alleged infringement. The filing also highlights that this case is part of a broader wave of high-stakes litigation brought by copyright owners - including authors and news organizations - who allege that technology companies used their work to train AI systems without authorization.
Britannica notes it previously filed a related suit against a different artificial intelligence startup last year - a case that remains active. The complaint against OpenAI places that prior litigation in context but does not add new factual claims about that separate matter beyond its ongoing status.
Representatives for the parties did not provide immediate comment; spokespeople for the companies did not respond to requests for comment on Monday, according to the complaint's filing notes.
OpenAI and other AI companies have argued in other proceedings that their use of copyrighted material can be protected as transformative fair use. The Britannica complaint presents a contrasting theory - that the copying was unlawful and caused diversion of traffic and reputational harm. The case will test how courts apply existing copyright and trademark laws to training data and AI-generated outputs.
Legal and market implications - The dispute underscores growing legal scrutiny of how AI models are trained and the commercial effects on content owners. For publishers and digital reference services, the case highlights concerns about lost traffic and the misuse of trusted brands. For technology firms, the litigation adds to a suite of intellectual property challenges that could affect data practices and product disclosures.