Encyclopaedia Britannica Sues OpenAI, Alleging Unauthorised Use of Reference Content

Encyclopaedia Britannica and its Merriam-Webster subsidiary filed a lawsuit in Manhattan federal court accusing OpenAI of using their online encyclopedia and dictionary content without permission to train its ChatGPT models. The complaint alleges that OpenAI copied nearly 100,000 articles, produced "near-verbatim" reproductions and "cannibalized" Britannica's web traffic, and also asserted trademark infringements tied to false AI citations. Britannica seeks unspecified monetary damages and an injunction to stop the alleged copying.

Key Points

Encyclopaedia Britannica and Merriam-Webster filed suit in Manhattan federal court alleging OpenAI used their online content without permission to train ChatGPT.
The complaint claims OpenAI copied nearly 100,000 articles and that ChatGPT can produce "near-verbatim" reproductions, allegedly "cannibalizing" Britannica's web traffic - impacting digital publishing and online reference services.
Britannica also alleges trademark infringement linked to false AI "hallucinations" and seeks unspecified monetary damages plus a court order to stop the alleged copying - raising regulatory and legal uncertainty for AI and tech companies.

Encyclopaedia Britannica and its Merriam-Webster subsidiary have initiated legal action in Manhattan federal court, accusing Microsoft-backed OpenAI of using their online reference materials to train large language models behind ChatGPT.

The complaint, filed on Friday, contends that OpenAI incorporated Britannica's online articles, encyclopedia entries and dictionary definitions into the dataset used to teach ChatGPT how to respond to human prompts. Britannica says the company has "cannibalized" its web traffic by returning AI-generated summaries that replicate the publisher's content.

According to the filing, OpenAI copied nearly 100,000 Britannica articles to train its GPT family of large language models. Britannica alleges that ChatGPT can produce "near-verbatim" copies of encyclopedia entries, dictionary definitions and other material, diverting users who otherwise would have visited Britannica's sites.

In addition to claims of copyright infringement, the complaint accuses OpenAI of trademark violations. Britannica says OpenAI has implied it had permission to reproduce Britannica content and has wrongfully cited Britannica in false AI "hallucinations," creating an impression of endorsement or authorization.

The lawsuit requests an unspecified amount in monetary damages and asks the court for an order blocking the alleged infringement. The filing also highlights that this case is part of a broader wave of high-stakes litigation brought by copyright owners - including authors and news organizations - who allege that technology companies used their work to train AI systems without authorization.

Britannica notes it previously filed a related suit against a different artificial intelligence startup last year - a case that remains active. The complaint against OpenAI places that prior litigation in context but does not add new factual claims about that separate matter beyond its ongoing status.

Representatives for the parties did not provide immediate comment; spokespeople for the companies did not respond to requests for comment on Monday, according to the complaint's filing notes.

OpenAI and other AI companies have argued in other proceedings that their use of copyrighted material can be protected as transformative fair use. The Britannica complaint presents a contrasting theory - that the copying was unlawful and caused diversion of traffic and reputational harm. The case will test how courts apply existing copyright and trademark laws to training data and AI-generated outputs.

Legal and market implications - The dispute underscores growing legal scrutiny of how AI models are trained and the commercial effects on content owners. For publishers and digital reference services, the case highlights concerns about lost traffic and the misuse of trusted brands. For technology firms, the litigation adds to a suite of intellectual property challenges that could affect data practices and product disclosures.

Risks

Ongoing litigation may create legal and operational uncertainty for AI developers, potentially affecting model training practices and compliance costs - impacting technology and cloud services sectors.
Content owners and publishers face revenue and traffic risks if AI-generated outputs replicate their material, which could alter monetization models for digital publishing and reference services.
Trademark and reputation risks for both content owners and AI firms arise from incorrect AI citations or implied permissions, potentially affecting consumer trust in digital information platforms.

Menu

Key Points

Risks

More from World