Two recent judgments – decided just days apart – from different judges of the Northern District of California District Court determined that using copyrighted books to train large language models (LLMs) was fair use.1 Critical to both decisions was a finding that use of works to train the LLMs was highly transformative, and the LLMs did not make any meaningful amount of the original works available to the public. 

While both judges ultimately found that fair use applied, their analyses suggest differing opinions on three issues in the fair use analysis: (i) whether training an LLM is for a distinct transformative purpose; (ii) the significance of use of pirated copies; and (iii) the effect of the reproduction on the market. These early decisions may create a divergence in how courts analyze fair use in the context of AI training. 

In Part 1 of this series, we summarize the key findings in Judge Alsup’s decision in Bartz v Anthropic. In Part 2, we will discuss Judge Chhabria’s decision in Kadrey v Meta. In Part 3, we will contrast the two decisions and draw connections with Canada’s fair dealing exception in the context of training AI.


Background

In Anthropic, three book authors brought a class action claiming that their copyrights had been infringed by Anthropic’s training of its AI software service named Claude. To train Claude, Anthropic downloaded millions of pirated books. It also purchased millions of physical books, scanned them into digital form, and destroyed the originals. Anthropic used these collective copies to create a central library of the books. From that library, Anthropic then selected various subsets of both the pirated and digitized works to train the LLMs used in Claude. It retained all the copies in the central library for other potential uses that might arise.

Prior to class certification, Anthropic brought a motion for summary judgment on its fair use defence. 

On the motion, Anthropic argued it copied the authors’ books for only one use: to train LLMs. The authors argued that Anthropic copied the books for two uses: to train the LLMs and to build a vast central library of potentially useful content. Judge Alsup ultimately considered both uses and both sets of works (the pirated books and the digitized hardcopy books). For each, he applied the four fair use factors set out in section 107 of the Copyright Act

Judge Alsup’s three main findings are discussed below.

The use of the works for training the LLM was fair use 

Judge Alsup found that the use of works for training the LLM was fair use. Critical to his analysis was that the use of the works to train the AI was highly transformative and no copies of the books were made available to the public. As recognized by Judge Alsup: “If the outputs seen by users had been infringing, Authors would have a different case.”2 

Judge Alsup also found that use of the works to train LLMs would not affect the market for the works. First, he rejected that training LLMs would result in an explosion of competing works. In his view, the “Authors’ complaint is no different than it would be if they complained that training schoolchildren to write well would result in an explosion of competing works.”3 Second, he rejected the authors’ assertion that the infringement would reduce their market for licensing for training LLMs, finding a potential market for licensing is not one the Copyright Act entitles authors to exploit.4

The use of digitized books to create a central library was fair use

Anthropic’s digitizing of the authors’ books for its central library was fair use. The critical factor was that Anthropic legitimately bought these copies before changing their formats from print to digital and therefore already had the right to keep them in its library. 

The use of pirated works to create the central library was not fair use 

Judge Alsup distinguished using pirated works and digitized works for building a central library. He found that creating a library from digitized works was fair use, but using pirated works for the same use was not: “Pirating copies to build a research library without paying for it, and to retain copies should they prove useful for one thing or another, was its own use — and not a transformative one.”5 That the pirated copies “plainly displaced demand for Authors’ books – copy for copy” and could have been obtained legitimately featured prominently in Judge Alsup’s analysis.6

Conclusion

Judge Alsup’s conclusion that use of the works to train LLMs was fair use did not distinguish between the digitized copies and the pirated copies as his analysis of the central library use does. This suggests that use of both pirated and non-pirated copies to train LLMs was found to be fair use. However, elsewhere in the decision, Judge Alsup also comments that he doubts using pirated works for any purpose (including to train an LLM) could ever be fair use.7 Ultimately, he focuses on the fact that in any event, the pirated copies were also used for creating a central library (which was not fair use). The issue of use of the pirated copies to create the central library is proceeding to trial in the next stage of the case.

In sum, there is some uncertainty as to how pirated copies will be treated in the fair use analysis if they are only used for the narrow purpose of training an LLM. In our next update within this series, we will examine the Kadrey v Meta, a decision released days after Anthropic that also engages with the issue of whether using unauthorized copies to train an LLM is fair use.


Footnotes

1  

Bartz et al v Anthropic, 3:24-cv-05417-WHA; Kadrey v Meta Platforms, Inc., 3:23-cv-03417-VC.

2  

Anthropic, p. 12.

3  

Anthropic, p. 28.

4  

Anthropic, p. 28.

5  

Anthropic, p. 19.

6  

Anthropic, p. 29.

7  

Anthropic, pp. 18-19.



Recent publications

Subscribe and stay up to date with the latest legal news, information and events . . .