AI accused of being trained on pirated content from torrents

A new day, a new controversy around artificial intelligence. This time, Meta was accused of using pirated content from torrents to train its large language model (LLM) Llama, which powers Meta AI. This case was one of the first copyright lawsuits filed against a technology company to train an AI.
The documents reveal that Meta AI was trained with pirated content.
According to Wired, Meta was sued in 2023 for allegedly training Llama, the company's language model, with pirated content. The trial is known as "Kadrey and al. v. Meta Platforms" and was brought by authors Richard Kadrey and Christopher Golden, who claimed that Meta had used protected copyrighted material without authorization.
Until now, Meta had submitted documents containing hidden information to the court. However, US District Judge Vince Chhabria ordered the original documents to be made public – and they were.
The documents reveal conversations between Meta employees about Meta AI and Llama. In one of these conversations, an engineer said: "Downloading from a [Meta] portable torrent doesn't feel just," confirming that the company used pirated content to train its AI. Another conversation suggests that "MZ" (Mark Zuckerberg) authorized the use of pirated material.
The evidence suggests that Meta used content from LibGen, a vast library of stolen books, magazines, and academic articles. LibGen was created in Russia in 2008 and has faced several copyright lawsuits since then, even though no one knows who actually operates this "piracy hub". Meta has also spread rumors that it used content from other "shadow" libraries for AI training.
The company claims to have used public materials under the legal doctrine of "fair use", which allows the use of copyrighted material without authorization in certain circumstances, analyzed on a case-by-case basis. Meta also states that it is simply "using text to statistically model language and generate original expressions".
- Meta announces the expansion of its AI functionalities to 21 additional countries

This is not the first time that major technologies have been accused of training AI models with copyrighted content. Last year, an investigation revealed that Apple's OpenELM model included subtitles from over 170,000 YouTube videos.
Although it initially made people think that Apple was using copyrighted material to train Apple Intelligence, the company later explained that OpenELM was an open-source model created for research purposes and that its database is not used to power Apple Intelligence.
According to Apple, its AI features available on iOS and macOS are trained "on licensed data, including selected data to improve specific features, as well as publicly collected data by our web crawler.
Note that many major publishers such as The New York Times and The Atlantic have chosen not to share their content with Apple's AI training.
- Buy new Apple products at a discount
Catégories
Derniers articles
- <p>Examen du clavier mécanique Satechi Keyboard SM3 : silencieux et parfait pour la productivité</p>
- This old phone became a fire hazard right before my eyes.
- 10 façons dont l’invitation d’Apple diffère du Calendrier Apple
- <p>Apple au travail : 2025 sera-t-il l’année où Apple lancera un concurrent de Google Workspace ?</p>
- Apple improbable d'organiser un événement spécial pour annoncer l'iPhone SE 4.
- Indices et solutions du jeu « Connections » du NYT pour le 8 février (#608)" Let me know if you'd like me to translate this into another language! 😊
- Support for Ubuntu 20.04 LTS is ending.
- Avez-vous une liste de lecture ? Laissez l'IA vous lire.
- Voici mon lecteur Blu-ray du Graal.
- <p>De nouveaux produits Apple seront lancés la semaine prochaine : voici ce qui arrive.</p>
Derniers tags
- rétroéclairage
- compatible
- silencieux
- recyclage
- danger
- gonflées
- Batteries lithium-ion
- Workspace
- Communiqué
- Annonce