Accueil / AI accused of being trained on pirated content from torrents

AI accused of being trained on pirated content from torrents

14/01/2025 05:48:49
Meta accusée d'utiliser du contenu piraté pour former son modèle linguistique LLaMA, ce qui a conduit à une première procédure de plainte contre une entreprise technologique pour l'entraînement d'un IA. Les documents démontrent que Meta AI a été formé avec des contenus piratés. Des conversations entre employés de Meta révèlent que le contenu a été téléchargé à partir de torrents et que Mark Zuckerberg aurait approuvé l'utilisation de matériel piraté. L'entreprise utilise du contenu provenant de LibGen, une bibliothèque en ligne de livres, magazines et articles académiques volés. Meta affirme utiliser des matériaux publics sous le principe du "fair use" pour modéliser statistiquement le langage et générer des expressions originales. La société déclare planifier d'ajouter ses fonctionnalités AI à 21 pays supplémentaires.
AI accused of being trained on pirated content from torrents

A new day, a new controversy around artificial intelligence. This time, Meta was accused of using pirated content from torrents to train its large language model (LLM) Llama, which powers Meta AI. This case was one of the first copyright lawsuits filed against a technology company to train an AI.

The documents reveal that Meta AI was trained with pirated content.

According to Wired, Meta was sued in 2023 for allegedly training Llama, the company's language model, with pirated content. The trial is known as "Kadrey and al. v. Meta Platforms" and was brought by authors Richard Kadrey and Christopher Golden, who claimed that Meta had used protected copyrighted material without authorization.

Until now, Meta had submitted documents containing hidden information to the court. However, US District Judge Vince Chhabria ordered the original documents to be made public – and they were.

The documents reveal conversations between Meta employees about Meta AI and Llama. In one of these conversations, an engineer said: "Downloading from a [Meta] portable torrent doesn't feel just," confirming that the company used pirated content to train its AI. Another conversation suggests that "MZ" (Mark Zuckerberg) authorized the use of pirated material.

The evidence suggests that Meta used content from LibGen, a vast library of stolen books, magazines, and academic articles. LibGen was created in Russia in 2008 and has faced several copyright lawsuits since then, even though no one knows who actually operates this "piracy hub". Meta has also spread rumors that it used content from other "shadow" libraries for AI training.

The company claims to have used public materials under the legal doctrine of "fair use", which allows the use of copyrighted material without authorization in certain circumstances, analyzed on a case-by-case basis. Meta also states that it is simply "using text to statistically model language and generate original expressions".

  • Meta announces the expansion of its AI functionalities to 21 additional countries

AI accused of being trained on pirated content from torrents

This is not the first time that major technologies have been accused of training AI models with copyrighted content. Last year, an investigation revealed that Apple's OpenELM model included subtitles from over 170,000 YouTube videos.

Although it initially made people think that Apple was using copyrighted material to train Apple Intelligence, the company later explained that OpenELM was an open-source model created for research purposes and that its database is not used to power Apple Intelligence.

According to Apple, its AI features available on iOS and macOS are trained "on licensed data, including selected data to improve specific features, as well as publicly collected data by our web crawler.

Note that many major publishers such as The New York Times and The Atlantic have chosen not to share their content with Apple's AI training.

  • Buy new Apple products at a discount

Vous pourriez aimer