Adobe Faces Lawsuit Over Alleged Use of Pirated Books to Train AI Model
Adobe, like many tech companies, has heavily invested in AI technology in recent years. The software giant has introduced various AI services, including Firefly, its AI-powered media-generation suite. However, Adobe now finds itself embroiled in controversy as a lawsuit claims that it used pirated books to train one of its AI models.
The proposed class-action lawsuit, filed on behalf of author Elizabeth Lyon, alleges that Adobe utilized pirated versions of multiple books, including Lyon’s own work, to train its SlimLM program.
According to Adobe, SlimLM is a small language model series optimized for document assistance tasks on mobile devices. The program was pre-trained on the SlimPajama-627B dataset, a deduplicated, multi-corpora, open-source dataset released by Cerebras in June 2023. Lyon asserts that her works were included in the pretraining dataset used by Adobe.
Lyon’s lawsuit contends that Adobe’s SlimPajama dataset was created by manipulating the RedPajama dataset, which includes copyrighted works, such as Lyon’s, without proper authorization. The lawsuit states that the SlimPajama dataset, as a derivative copy of RedPajama, infringes upon the copyrights of Lyon and other authors.
The use of the “Books3” collection, consisting of 191,000 books, for training AI systems has been a subject of legal disputes within the tech industry. Similar cases involving RedPajama have arisen, with companies like Apple and Salesforce facing lawsuits for alleged copyright infringement in their AI training datasets.
These lawsuits highlight a recurring issue in the tech sector, where AI algorithms are trained on large datasets that may contain copyrighted material. In a recent settlement, Anthropic agreed to pay $1.5 billion to authors who accused the company of using pirated content to train its chatbot, underscoring the challenges of navigating copyright issues in AI development.

