AI and Copyright: Legal Battles on the Rise
AI and copyright don’t go hand in hand, at least not legally. In a report, OpenAI stated that it would be impossible to train AI systems without accessing protected content. This hasn’t stopped copyright holders from filing numerous lawsuits against the start-up.
The disputes between providers of generative AI solutions and copyright holders of protected content continue to grow. And yet, OpenAI argues that creating generative AI tools is unfeasible without using this content to train them. In a report presented to the House of Lords Communications and Digital Select Committee, the start-up asserts that it would be impossible to train large language models (LLMs) like GPT-4, the underlying technology of ChatGPT, without resorting to protected content. “Given that copyright covers virtually all forms of human expression today – including blog articles, photographs, forum messages, snippets of software code, and government documents – it would be impossible to train the best current AI models without using copyrighted documents,” the report states.
GenAI applications like ChatGPT or the Stable Diffusion image generation tool are built using vast amounts of data collected from the Internet, most of which is covered by intellectual property rights. This situation has led to backlash from publishers and authors, who claim that their work is being used without credit or compensation.
Concerns about Copyrighted Code
“Developers have been using resources like Google and StackOverflow for decades,” stated Daniel Li, CEO of Plus Docs, a company whose software uses genAI to design, create, and edit presentations. According to him, ChatGPT simply makes coding a bit easier. “It is important to keep in mind that developers still need to understand their code. ChatGPT doesn’t change that requirement,” he added. The executive acknowledges that “companies must be very careful not to use copyrighted code or other texts.” And he points out that “this is already a major issue in software acquisitions for large tech companies, and it will only grow in importance.”
OpenAI’s position comes as the company faces a series of lawsuits. Last week, The New York Times filed a complaint against OpenAI and Microsoft, a significant investor in OpenAI and a user of its tools in various Microsoft products. In the complaint, The New York Times accuses both companies of illegally using its content to create OpenAI’s tools. In response, the start-up argued that copyright law does not prohibit training genAI models.
Last year, OpenAI was the subject of a federal class action lawsuit in California, with the plaintiffs accusing the company of illegally using personal data to train its models. This lawsuit, filed in the Northern District of California, cited 15 violations, including the Computer Fraud and Abuse Act and the Electronic Communications Privacy Act, as well as various state consumer protection laws. At the heart of the California complaint is the allegation that OpenAI “illegally collected” the plaintiffs’ private data and used it without providing compensation. According to the complaint, “OpenAI used this misappropriated data to refine and advance ChatGPT through extended language models and advanced language algorithms, enabling it to produce and understand human-like language, a capability that can be applied to a multitude of uses.”
The California case is part of an increasingly active legal battle to limit the rampant collection of data by generative AI tools. A group of authors filed a class action against OpenAI and Microsoft, accusing both companies of infringing authors’ copyrights by using their writings and academic works to train ChatGPT without authorization. The main plaintiff is Julian Sancton, author of “Madhouse at the End of the Earth: The Belgica’s Journey Into the Dark Antarctic.”
In this case, OpenAI and Microsoft are accused of blatantly ignoring copyright laws to create “a multi-billion dollar enterprise using humanity’s collective works without permission.” Instead of compensating for intellectual property, “they act as if copyright laws don’t exist.” John Licato, an associate professor in Computer Science and Engineering at the University of South Florida, believes OpenAI’s stance could amount to copyright infringement. “The line between adapting existing ideas and creating something truly new is already blurry, and AI forces us to see how ill-defined this distinction is,” said John Licato.