OpenAI and Microsoft move to dismiss newspaper publishers’ copyright lawsuit

MANHATTAN (CN) — OpenAI and Microsoft filed motions to partially dismiss charges related to accusations from a coalition of eight newspapers that the ChatGPT makers used the publishers’ articles, without permission or payment, to fuel the commercialization of their artificial intelligence products.

The publishers owned by the MediaNews Group and Tribune Publishing companies sued OpenAI and Microsoft in April under claims the AI developers drew from large swaths of copyrighted articles to “train” the large language models that enhance the ability of ChatGPT and Copilot to generate language text in a variety of styles.

The publishers include Tribune Publishing’s Chicago Tribune, Orlando Sentinel and South Florida Sun Sentinel and the New York Daily News, as well as Media News Group-owned Mercury News, Denver Post, Orange County Register and St. Paul Pioneer-Press.

In motions to dismiss filed in the U.S. District Court for the Southern District of New York late Tuesday night, OpenAI and Microsoft claim the newspaper publishers fail to assert tangible complaints of copyright.

“Microsoft and OpenAI’s tools neither exploit the protected expression in the plaintiffs’ digital content nor replace it — they extract and share elements of language, culture, ideas and knowledge that belong to all of us,” Microsoft says in its motion.

For instance, the AI developers take issue with the publishers’ claims that “given the right prompt” the AI products will “repeat large portions” of newspaper articles used to train its language models.

According to Microsoft and OpenAI, the publishers fail to state a claim that the AI developers contributed to “end-user copyright infringement,” or encourage users to prompt the GPT-based products to produce content similar to the publishers’ articles.

The AI developers add that the mere possibility that a user could prompt users to use the GPT-based products to produce work that infringes on the publishers’ article is not enough to state a claim for copyright.

“The bare theoretical possibility that someone somewhere might engage in the same acrobatics plaintiffs did here is not enough to plausibly allege direct infringement,” Microsoft says.

Similarly, OpenAI says the publishers’ complaint fails to fully consider whether using copyrighted content to train a generative AI model is fair use under copyright law.

“At the end of the day, the truth will emerge, and it will be clear that ChatGPT is not in fact some highly inefficient way to access, via one out of every thousand or so impermissible attempts, snippets of old newspaper articles that are freely available in full elsewhere online,” OpenAI says.

Kristelia Garcia, a law professor at Georgetown University, says that the hypothetical possibility that GPT-product users could direct an output that produces the publishers’ newspaper articles could be enough to show copyright infringement.

“It proves that it could happen, it’s just a matter of time and they shouldn’t have to wait until they suffer some sort of loss,” Garcia told Courthouse News.

Garcia added that the outcome of the case depends on if the court interprets the publishers’ claims as enough to warrant a copyright argument.

"It depends on if the court decides they want to wait around for genuine infringement, meaning it sort of organically happened through normal or expected prompting, or if manufactured infringement is enough to go forward," Garcia said.

OpenAI claims the publishers also fail to state a claim of injury because any instance in which its GPT-products produce excerpts of the copyrighted articles, the publication name and a link is provided.

“Any user who encountered the outputs identified would have no doubt as to the provenance of the text and could easily find it on plaintiffs’ websites,” OpenAI says.

The publishers also claim that collectively, content from their websites accounts for at least 124 million basic pieces of text included in the Common Crawl depository of data used to train the AI developers’ large language models.

But the AI developers say that using the publishers’ articles as a training tool cannot be considered copyright infringement because it “takes place out of public view.”

“The complaint does not plausibly allege how the training or development of the models could somehow facilitate defendants’ or anyone else’s alleged infringement,” Microsoft says.

“The complaint provides no information about whom removal of copyright management information would conceal infringement from, or how removal could conceivably make it easier for anyone to train a model using copyrighted works when training takes place outside of public view,” Microsoft adds.

Follow @NikaSchoonover

Categories / Law, Media, National, Technology

Subscribe to Closing Arguments

Sign up for new weekly newsletter Closing Arguments to get the latest about ongoing trials, major litigation and hot cases and rulings in courthouses around the U.S. and the world.

OpenAI and Microsoft move to dismiss newspaper publishers’ copyright lawsuit

Subscribe to Closing Arguments

Additional Reads