X (Twitter) Facebook Pinterest LinkedIn E-mail

The legal battle over data used in training generative AI models remains fierce. This time, the spotlight is on Anthropic, the creator of the Claude assistant, which has reached a $1.5 billion settlement to resolve a lawsuit accusing it of training its system with millions of pirated books.

The case, led by writers Andrea Bartz, Charles Graeber, and Kirk Wallace, alleged that the company illicitly downloaded more than 7 million works. The settlement also includes additional payments of $3,000 per 500,000 books downloaded and a requirement to delete all used copies.

Not an isolated case: OpenAI, Stability AI, and Midjourney under scrutiny

Anthropic’s controversy adds to a long list of lawsuits challenging major players in AI:

OpenAI: sued by the New York Times and dozens of writers, who claim their articles and books have been used to train GPT without permission.
Stability AI: accused of powering Stable Diffusion with works by artists without authorization.
Midjourney: criticized for using graphic material by professional authors and entertainment catalogs to train its image generators.

These lawsuits aim not only for monetary compensation but also to establish a precedent regarding what data can legally be used for AI training.

The technical-legal dilemma: fair use or mass infringement?

The core issue is the difference between learning from a dataset and copying a work. Companies argue that training is a statistical process protected under the doctrine of fair use in the U.S., while plaintiffs contend that models can reproduce identical text fragments or artistic styles, constituting a direct copyright infringement.

In Europe, the situation is even more complex: the 2019 copyright directive explicitly limits the use of protected works, except for research purposes. In theory, companies like OpenAI or Anthropic would need commercial licenses to train their models within the EU.

Impact on the industry: rising costs and slowdown

The Anthropic settlement introduces a crucial factor: the real cost of training models with protected data. If companies must pay for licenses or multimillion-dollar compensation, developing new models will become more expensive, which could:

Hinder innovation and limit the emergence of new competitors.
Favor large tech corporations with significant financial resources, at the expense of startups.
Create a new market for dataset licenses that might become standard practice.

What’s next?

The future of generative AI will depend on how these legal battles are resolved. One potential scenario is a hybrid model: combining public and open-domain datasets with licensed catalogs from publishers, media, and production companies.

Meanwhile, the Anthropic case sends a clear signal: creators are not willing to stay on the sidelines. Balancing innovation with copyright rights will likely be the greatest regulatory and technical challenge for AI in this decade.

Frequently Asked Questions

1. Why did Anthropic pay $1.5 billion?
To settle a lawsuit accusing it of using millions of pirated books in training its Claude model.

2. Which other companies are involved in similar proceedings?
OpenAI, Stability AI, and Midjourney face lawsuits for using works by writers, artists, and media without authorization.

3. How do the U.S. and Europe differ on this issue?
In the U.S., the debate centers on fair use, whereas in Europe, regulations require explicit licenses for protected works.

4. How does this affect the future of generative AI?
It could increase training costs, slow innovation, and establish a market for dataset licensing.

via: Legal News

X (Twitter) Facebook Pinterest LinkedIn E-mail