Home / Meta accusata di addestrare l'IA utilizzando contenuti pirati da torren

Meta accusata di addestrare l'IA utilizzando contenuti pirati da torren

14/01/2025 05:48:49
Meta accusata di utilizzare contenuti pirati per addestrare il suo modello linguistico AI Llama, che alimenta Meta AI. Il caso è stato un precedente legale contro una tecnologia per addestrare l'IA. I documenti rivelano che Meta AI era stato addestrato con contenuti pirati. Meta fu costretta a sottoporre le sue dichiarazioni censurate alla giustizia, ma il giudice ha ordinato la pubblicazione degli originali. I documenti mostrano conversazioni tra dipendenti di Meta riguardanti Meta AI e Llama. Uno dei conversatori conferma che utilizzare contenuti pirati era stato autorizzato da Mark Zuckerberg. Meta ha dichiarato di aver usato contenuti dalla libreria LibGen, una grande raccolta di libri, riviste e articoli accademici. La company sostiene di avere usato materiali pubblici sotto la legge del "uso ragionevole". Meta estende le sue funzionalità AI a 21 nuovi paesi. Questo non è il primo caso in cui grandi tecnologie sono state accusate di addestrare modelli AI con contenuti protetti da copyright.
Meta accusata di addestrare l'IA utilizzando contenuti pirati da torren

A new day, another controversy surrounding artificial intelligence. This time, Meta was accused of using pirated content from torrents to train its large language model (LLM) Llama, which powers Meta AI. The case was one of the first copyright lawsuits filed against a technology company for training AI.

The documents reveal that Meta AI was trained with pirated content.

As reported by Wired, Meta was forced to sue in 2023 for having accosted Llama, its LLM, with pirated content. The case became known as "Kadrey et al. v. Meta Platforms" and was brought by narrators Richard Kadrey and Christopher Golden, who claimed that Meta had used copyrighted material without authorization.

So far, Meta had submitted censored documents to the court. However, Judge Vince Chhabria of the United States District Court for the Northern District of California ordered that the original documents be made public – and they were.

The documents reveal conversations between Meta employees regarding Meta AI and Llama. In one of the conversations, an engineer states that "downloading from a [Meta] company computer doesn't make sense," confirming that the company used pirated content to train its AI. Another conversation suggests that "MZ" (Mark Zuckerberg) authorized the use of pirated materials.

The discovery suggests that Meta used content from LibGen, a large library of books, journals, and academic articles. LibGen was created in Russia in 2008 and has faced many legal actions to defend copyright holders subsequently, although no one knows who actually operates the 'piracy hub'. Meta also stated that it used content from other 'hidden libraries' for AI training.

The company claims to have used public materials under the legal doctrine of "fair use," which allows the use of copyrighted material without permission in certain circumstances, analyzed on a case-by-case basis. Meta also states that it is simply "using text to statistically model language and generate original expressions."

  • Meta announces the extension of its AI functionalities to an additional 21 countries.

Meta accusata di addestrare l'IA utilizzando contenuti pirati da torren

This is not the first time that major technologies have been accused of training AI models with copyrighted content. Last year, an investigation discovered that the Apple-created OpenELM model included subtitles from over 170,000 YouTube videos.

Although at first this led to the belief that Apple was using copyrighted content to train Apple Intelligence, the company then explained that OpenELM is an open-source model created for research purposes and that its database is not used to power Apple Intelligence.

According to Apple, its AI features available on iOS and macOS are trained "on licensed data, including data selected to improve specific functionalities, as well as publicly available data collected by our web-crawler.

It's important to note that many major publishers like The New York Times and The Atlantic have chosen not to share their content for the training of Apple Intelligence.

  • Buy new Apple products at a discount

Potrebbe interessarti