Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 1 of 32
1
2
3
4 UNITED STATES DISTRICT COURT
5
NORTHERN DISTRICT OF CALIFORNIA
6
7
8 ANDREA BARTZ, CHARLES GRAEBER,
and KIRK WALLACE JOHNSON, No. C 24-05417 WHA
9
Plaintiffs,
10
v. ORDER ON FAIR USE
11
ANTHROPIC PBC,
12
Northern District of California
Defendant.
United States District Court
13
14
15 INTRODUCTION
16 An artificial intelligence firm downloaded for free millions of copyrighted books in
17 digital form from pirate sites on the internet. The firm also purchased copyrighted books
18 (some overlapping with those acquired from the pirate sites), tore off the bindings, scanned
19 every page, and stored them in digitized, searchable files. All the foregoing was done to amass
20 a central library of “all the books in the world” to retain “forever.” From this central library,
21 the AI firm selected various sets and subsets of digitized books to train various large language
22 models under development to power its AI services. Some of these books were written by
23 plaintiff authors, who now sue for copyright infringement. On summary judgment, the issue is
24 the extent to which any of the uses of the works in question qualify as “fair uses” under
25 Section 107 of the Copyright Act.
26 STATEMENT
27 Defendant Anthropic PBC is an AI software firm founded by former OpenAI employees
28 in January 2021. Its core offering is an AI software service called Claude. When a user
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 2 of 32
1 prompts Claude with text, Claude quickly responds with text — mimicking human reading and
2 writing. Claude can do so because Anthropic trained Claude — or rather trained large
3 language models or LLMs underlying various versions of Claude — using books and other
4 texts selected from a central library Anthropic had assembled. Claude was first released
5 publicly in March 2023. Seven successive versions of Claude have been released since. Users
6 may ask Claude some questions for free. Demanding users and corporate clients pay to use
7 Claude, generating over one billion dollars in annual revenue (Opp. Exh. 18).
8 Plaintiffs Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson are authors of books
9 that Anthropic copied from pirated and purchased sources. Anthropic assembled these copies
10 into a central library of its own, copied further various sets and subsets of those library copies
11 to include in various “data mixes,” and used these mixes to train various LLMs. Anthropic
12 kept the library copies in place as a permanent, general-purpose resource even after deciding it
Northern District of California
United States District Court
13 would not use certain copies to train LLMs or would never use them again to do so. All of
14 Anthropic’s copying was without plaintiffs’ authorization.
15 Author Bartz wrote four novels Anthropic copied and used: The Lost Night: A Novel,
16 The Herd, We Were Never Here, and The Spare Room. Author Graeber wrote two non-fiction
17 books likewise at issue: The Good Nurse: A True Story of Medicine, Madness, and Murder,
18 and The Breakthrough: Immunotherapy and the Race to Cure Cancer. And, Author Johnson
19 penned three non-fiction books also copied and used: To Be A Friend Is Fatal: The Fight to
20 Save the Iraqis America Left Behind, The Feather Thief: Beauty, Obsession, and the Natural
21 History Heist of the Century, and The Fishermen and the Dragon: Fear, Greed, and a Fight
22 for Justice on the Gulf Coast. Plaintiffs Bartz Inc. and MJ + KJ Inc. are corporate entities that
23 Author Bartz and Author Johnson respectively set up to market their works. Between them,
24 these five plaintiffs (“Authors”) own all the copyrights in the above-listed works.
25 From the start, Anthropic “ha[d] many places from which” it could have purchased
26 books, but it preferred to steal them to avoid “legal/practice/business slog,” as cofounder and
27 chief executive officer Dario Amodei put it (see Opp. Exh. 27). So, in January or February
28 2021, another Anthropic cofounder, Ben Mann, downloaded Books3, an online library of
2
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 3 of 32
1 196,640 books that he knew had been assembled from unauthorized copies of copyrighted
2 books — that is, pirated. Anthropic’s next pirated acquisitions involved downloading
3 distributed, reshared copies of other pirate libraries. In June 2021, Mann downloaded in this
4 way at least five million copies of books from Library Genesis, or LibGen, which he knew had
5 been pirated. And, in July 2022, Anthropic likewise downloaded at least two million copies of
6 books from the Pirate Library Mirror, or PiLiMi, which Anthropic knew had been pirated
7 (Opp. Exh. 6 at 4; Opp. Expert Zhao ¶¶ 17–29; see Class Cert. (“CC”) Opp. Expert Iyyer
8 ¶¶ 45–46). Although what was downloaded and later duplicated from these sources was
9 sometimes referred to as data or datasets, at bottom they contained full-text “ebooks or scans of
10 books” saved in individual files in formats like .pdf, .txt, and .epub (see, e.g., Opp. Exh. 12 at -
11 0391318). For Books3, most filenames identified the book inside. For LibGen and PiLiMi,
12 Anthropic downloaded a separate catalog of bibliographic metadata for each collection, with
Northern District of California
United States District Court
13 fields like title, author, and ISBN (see, e.g., ibid.; Opp. Exh. 16 -0533972–73). Anthropic
14 thereby pirated over seven million copies of books, including copies of at least two works at
15 issue for each Author.1
16 As Anthropic trained successive LLMs, it became convinced that using books was the
17 most cost-effective means to achieve a world-class LLM. During this time, however,
18 Anthropic became “not so gung ho about” training on pirated books “for legal reasons” (Opp.
19 Exh. 19). It kept them anyway (e.g., Opp. Exh. 17 at 93–94; CC Opp. Exh. 35 at -0273474).
20 To find a new way to get books, in February 2024, Anthropic hired the former head of
21 partnerships for Google’s book-scanning project, Tom Turvey. He was tasked with obtaining
22 “all the books in the world” while still avoiding as much “legal/practice/business slog” as
23
24 1
Specifically, those works were (see Opp. Expert Zhao ¶ 36; CC Br. Expert Zhao ¶ 66):
25 Author Bartz’s The Herd (five copies total) (in LibGen and PiLiMi);
Author Bartz’s The Lost Night (three copies total) (in Books3, LibGen, and PiLiMi);
26 Author Graeber’s The Breakthrough (four copies) (in Books3, LibGen, and PiLiMi);
Author Graeber’s The Good Nurse (five copies total) (in Books3 and LibGen);
27 Author Johnson’s To Be A Friend Is Fatal (one copy) (in Books3); and
Author Johnson’s The Feather Thief (four copies total) (in Books3, LibGen, PiLiMi).
28 Some evidence suggests Anthropic downloaded still more copies before culling empty files,
duplicates, and so on to reach the numbers kept in the central library and counted here.
3
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 4 of 32
1 possible (Opp. Exhs. 21, 27). So, in spring 2024, Turvey sent an email or two to major
2 publishers to inquire into licensing books for training AI. Had Turvey kept up those
3 conversations, he might have reached agreements to license copies for AI training from
4 publishers — just as another major technology company soon did with one major publisher
5 (e.g., Opp. Expert Malackowski ¶¶ 50, 64). But Turvey let those conversations wither.
6 Instead, Turvey and his team emailed major book distributors and retailers about bulk-
7 purchasing their print copies for the AI firm’s “research library” (Opp. Exh. 22 at 145; Opp.
8 Exh. 31 at -035589). Anthropic spent many millions of dollars to purchase millions of print
9 books, often in used condition. Then, its service providers stripped the books from their
10 bindings, cut their pages to size, and scanned the books into digital form — discarding the
11 paper originals. Each print book resulted in a PDF copy containing images of the scanned
12 pages with machine-readable text (including front and back cover scans for softcover books).
Northern District of California
United States District Court
13 Anthropic created its own catalog of bibliographic metadata for the books it was acquiring. It
14 acquired copies of millions of books, including of all works at issue for all Authors.2
15 Anthropic may have copied portions of Authors’ books on other occasions, too — such
16 as while copying book reviews, academic papers, internet blogposts, or the like for its central
17 library. And, Anthropic’s scanning service providers may have copied Authors’ print books
18 along the way to delivering the final digital copies to Anthropic. But neither side here
19 specifically raises legal issues implicated by any such copies. Nor will this order.
20 From all the above sources, Anthropic created a general “research library” or
21 “generalized data area.” What was this for? As Turvey said, this was a “way of creating
22 information that would be voluminous and that we would use for research,” or otherwise to
23
2
24 In other words, within the scanned books were one or more copies of the following works:
Author Bartz’s The Herd;
25 Author Bartz’s The Lost Night;
Author Bartz’s We Were Never Here;
26 Author Bartz’s The Spare Room;
Author Graeber’s The Breakthrough;
27 Author Graeber’s The Good Nurse;
Author Johnson’s To Be A Friend Is Fatal;
28 Author Johnson’s The Feather Thief; and,
Author Johnson’s The Fishermen.
4
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 5 of 32
1 “inform our — our products” (Opp. Exh. 22 at 145–46, 194). The copies were kept in the
2 original “version of the underlying” book files Anthropic had “obtained or created,” that is,
3 pirated or scanned (Opp. Exh. 30 at 3, 4). Anthropic planned to “store everything forever; we
4 might separate out books into categories[, but t]here [wa]s no compelling reason to delete a
5 book” — even if not used for training LLMs. Over time, Anthropic invested in building more
6 tools for searching its “general purpose” library and for accessing books or sets of books for
7 further uses (see CC Br. Exh. 12 at -0144509; CC Reply Exh. 45 at -0365931–32, -0365939–
8 42 (reviewing and seeking to improve “[w]hat [ ] researchers do today if they want to search
9 for a book,” including improving bibliographic metadata and consolidating varied resources)).
10 One further use was training LLMs. As a preliminary step towards training, engineers
11 browsed books and bibliographic metadata to learn what languages the books were written in,
12 what subjects they concerned, whether they were by famous authors or not, and so on —
Northern District of California
United States District Court
13 sometimes by “open[ing] any of the books” and sometimes using software. From the library
14 copies, engineers copied the sets or subsets of books they believed best for training and
15 “iterate[d]” on those selections over time. For instance, two different subsets of print-sourced
16 books were included in “data mixes” for training two different LLMs. Each was just a fraction
17 of all the print-sourced books. Similarly, different sets or “subsets” or “parts of” or “portions”
18 of the collections sourced from Books3, LibGen, and PiLiMi were used to train different
19 LLMs. Anthropic analyzed the consequences of using more books, fewer books, different
20 books. The goal was to improve the “data mix“ to improve each LLM and, ultimately,
21 Claude’s performance for paying customers.3
22
23
3
24 (See, e.g., Opp. Exh. 12 at -0391318 (engineers were able to “open any of the books”); CC
Reply Exh. 45 at -0365941 (some engineers “want[ed] to search for a book” and get its “scanned
25 book file[ ]”); Opp. Exh. 30 at 3 (made copies of “each such dataset or portions thereof” for
training); Opp. Exh. 6 at 3–4 (trained on “portions of datasets,” with at least two such portions
26 from LibGen and four from PiLiMi); Opp. Expert Zhao ¶¶ 27–28, 30–31 (plus two more from
PiLiMi, and at least three from scanned books); CC Opp. Exh. 35 at -0273477–82 (tested subsets
27 of pirated and purchased-and-scanned books to see consequences for training); CC Br. Exh. 12 at -
0144508–09 (“iterate[d]” selections from library and “train[ed] new models on the best data”); Br.
28 Expert Kaplan ¶¶ 42–45 (explained goals of improving data mixes); Br. Expert Peterson ¶ 14
(similar)).
5
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 6 of 32
1 Over time, Anthropic came to value most highly for its data mixes books like the ones
2 Authors had written, and it valued them because of the creative expressions they contained.
3 Claude’s customers wanted Claude to write as accurately and as compellingly as Authors. So,
4 it was best to train the LLMs underlying Claude on works just like the ones Authors had
5 written, with well-curated facts, well-organized analyses, and captivating fictional
6 narratives — above all with “good writing” of the kind “an editor would approve of” (Opp.
7 Exh. 3 at -03433). Anthropic could have trained its LLMs without using such books or any
8 books at all. That would have required spending more on, say, staff writers to create
9 competing exemplars of good writing, engineers to revise bad exemplars into better ones,
10 energy bills to power more rounds of training and fine-tuning, and so on. Having canonical
11 texts to draw upon helped (e.g., Opp. Expert Zhao ¶ 81).
12 Each work selected for training any given LLM was copied in four main ways — and in
Northern District of California
United States District Court
13 fact so many times that Anthropic admits it would be impractical even to estimate.
14 First, each work selected was copied from the central library to create a working copy for
15 the training set.
16 Second, each work was cleaned to remove a small amount of lower-valued or repeating
17 text (like headers, footers, or page numbers), with a “cleaned” copy resulting. If the same book
18 appeared twice, or if while looking across the entire provisional training set it became clear
19 there was some other reason to cull a book or category, Anthropic had the capability to delete
20 relevant copy(ies) from the set at this step (see CC Br. Expert Zhao ¶¶ 71–72).
21 Third, each cleaned copy was translated into a “tokenized” copy. Some words were
22 “stemmed” or “lemmatized” into simpler forms (e.g., “studying” to “study”). And, all
23 characters were grouped into short sequences and translated into corresponding number
24 sequences or “tokens” according to an Anthropic-made dictionary. The resulting tokenized
25 copies were then copied repeatedly during training. By one account, this process involved the
26 iterative, trial-and-error discovery of contingent statistical relationships between each word
27 fragment and all other word fragments both within any work and across trillions of word
28
6
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 7 of 32
1 fragments from other copied books, copied websites, and the like. Other steps in training are
2 not at issue here (id. ¶¶ 73–76; see Opp. Expert Zhao ¶ 38 & n.6).
3 Fourth, each fully trained LLM itself retained “compressed” copies of the works it had
4 trained upon, or so Authors contend and this order takes for granted. In essence, each LLM’s
5 mapping of contingent relationships was so complete it mapped or indeed simply “memorized”
6 the works it trained upon almost verbatim. So, if each completed LLM had been asked to
7 recite works it had trained upon, it could have done so (e.g., Opp. Expert Zhao ¶ 74). Further
8 steps refining the LLM are not at issue here.
9 However, that was as far as the training copies propagated towards the outside world.
10 When each LLM was put into a public-facing version of Claude, it was complemented by other
11 software that filtered user inputs to the LLM and filtered outputs from the LLM back to the
12 user (id. ¶¶ 75–77). As a result, Authors do not allege that any infringing copy of their works
Northern District of California
United States District Court
13 was or would ever be provided to users by the Claude service. Yes, Claude could help less
14 capable writers create works as well-written as Authors’ and competing in the same categories.
15 But Claude created no exact copy, nor any substantial knock-off. Nothing traceable to
16 Authors’ works. Such allegations are simply not part of plaintiffs’ amended complaint, nor in
17 our record.
18 Neither side puts directly at issue any copies of any works that might have been used for
19 the filtering software. Nor will this order.
20 In sum, the copies of books pirated or purchased-and-destructively-scanned were placed
21 into a central “research library” or “generalized data area,” sets or subsets were copied again to
22 create training copies for data mixes, the training copies were successively copied to be
23 cleaned, tokenized, and compressed into any given trained LLM, and once trained an LLM did
24 not output through Claude to the public any further copies. Finally, once Anthropic decided a
25 copy of a pirated or scanned book in the library would not be used for training at all or ever
26 again, Anthropic still retained that work as a “hard resource” for other uses or future uses. At
27 least one work from each Author was present in every phase described above.
28 * * *
7
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 8 of 32
1 In August 2024, the three individual authors brought this putative class action
2 complaining that Anthropic had infringed its federal copyrights by pirating copies for its
3 library and by reproducing them to train its LLMs (Compl. ¶¶ 45–46, 71; see Amd. Compl.
4 ¶¶ 47–48, 75). In October 2024, a scheduling order required that any motion for class
5 certification be brought by March 6, 2025 (Dkt. No. 49).
6 The individual authors soon amended their complaint to include affiliated corporate
7 entities as named plaintiffs, with consent. And, Anthropic chose not to move to dismiss the
8 amended complaint, as it earlier had planned (see Dkt. No. 37). Instead, Anthropic moved to
9 allow an early motion for summary judgment on fair use, even before class certification
10 (Dkt. No. 88; see Feb. 25, 2025 Tr. 15). Permission was granted.
11 Anthropic now moves for summary judgment on fair use only. Fair use is a legal
12 question for the judge with underlying fact questions, if any, for the jury. To prevail on
Northern District of California
United States District Court
13 summary judgment, Anthropic must rely on undisputed facts and/or factual inferences favoring
14 the opposing side. Anthropic thus bears the burdens of production and persuasion in this
15 motion. See Google LLC v. Oracle Am., Inc., 593 U.S. 1, 23–24 (2021); Andy Warhol Found.
16 for the Visual Arts, Inc. v. Goldsmith, 598 U.S. 508, 547 n.21 (2023); Campbell v. Acuff-Rose
17 Music, Inc., 510 U.S. 569, 590 & n.20, 594 (1994); see also Nissan Fire & Marine Ins. Co. v.
18 Fritz Cos., 210 F.3d 1099, 1102–03 (9th Cir. 2000).
19 Notably, in its motion, Anthropic argues that pirating initial copies of Authors’ books and
20 millions of other books was justified because all those copies were at least reasonably
21 necessary for training LLMs — and yet Anthropic has resisted putting into the record what
22 copies or even sets of copies were in fact used for training LLMs. For example, at oral
23 argument, Anthropic asserted that if a purported fair user had retained pirated copies for uses
24 beyond the fair use, then her piracy would not be excused by the fair use (Tr. 53, 56). But
25 when Authors earlier interrogated Anthropic in discovery about what library copies (the
26 original copies “obtained or created” by Anthropic) Anthropic had recopied for further uses,
27 Anthropic responded that providing information about any copies made for uses beyond
28 training commercially released LLMs would be overbroad, and that it could not count up all its
8
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 9 of 32
1 copying even for LLMs in any case (e.g., Opp Exh. 30 at 3). We know that Anthropic has
2 more information about what it in fact copied for training LLMs (or not). Anthropic earlier
3 produced a spreadsheet that showed the composition of various data mixes used for training
4 various LLMs — yet it clawed back that spreadsheet in April (Opp. Fredricks Decl. ¶¶ 2–3). A
5 discovery dispute regarding that spreadsheet remains pending. But Anthropic did not need a
6 court order to offer up what it possessed in support of its motion. All deficiencies must be held
7 against Anthropic and not the other way around.
8 This is the first substantive order in this case. A contemporaneous motion for class
9 certification remains pending. It proposes one class related to works that were pirated
10 (whether or not used to train LLMs), and a second class related to works that were purchased,
11 scanned, and used in training LLMs. This order follows full briefing, a hearing, and
12 supplemental briefing.
Northern District of California
United States District Court
13 To summarize the analysis that now follows, the use of the books at issue to train Claude
14 and its precursors was exceedingly transformative and was a fair use under Section 107 of the
15 Copyright Act. And, the digitization of the books purchased in print form by Anthropic was
16 also a fair use but not for the same reason as applies to the training copies. Instead, it was a
17 fair use because all Anthropic did was replace the print copies it had purchased for its central
18 library with more convenient space-saving and searchable digital copies for its central
19 library — without adding new copies, creating new works, or redistributing existing copies.
20 However, Anthropic had no entitlement to use pirated copies for its central library. Creating a
21 permanent, general-purpose library was not itself a fair use excusing Anthropic’s piracy.
22 ANALYSIS
23 Section 107 of the Copyright Act identifies four factors for determining whether a given
24 use of a copyrighted work is a fair use:
25 [T]he fair use of a copyrighted work . . . for purposes such as
criticism, comment, news reporting, teaching (including multiple
26 copies for classroom use), scholarship, or research, is not an
infringement of copyright. In determining whether the use made
27 of a work in any particular case is a fair use the factors to be
considered shall include —
28
9
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 10 of 32
1 (1) the purpose and character of the use, including whether such
use is of a commercial nature or is for nonprofit educational
2 purposes;
3 (2) the nature of the copyrighted work;
4 (3) the amount and substantiality of the portion used in relation to
the copyrighted work as a whole; and
5
(4) the effect of the use upon the potential market for or value of
6 the copyrighted work.
7 These factors presuppose a “use.” So, at the threshold, a court must decide whether a
8 “copyrighted [work] has been used in multiple ways,” then evaluate each. Warhol, 598 U.S. at
9 533. Uses do not turn on “the subjective intent of the user” but on “an objective inquiry into
10 what use was made, i.e., what the user d[id] with the original work.” Id. at 544–45. A “use”
11 should be construed narrowly enough to not “swallow” distinguishable infringing uses, much
12 less categories of exclusive rights in toto. Id. at 541, 543 n.18, 546–48. Sometimes, the
Northern District of California
United States District Court
13 challenged copying involves just one use: In Perfect 10, Inc. v. Amazon.com, Inc., Google
14 visited websites having full-sized images, made only reduced-sized copies, and incorporated
15 those directly into its search engine — the sole use of the thumbnails being as “pointer[s]” to
16 the images themselves. 508 F.3d 1146, 1157, 1160, 1165 (9th Cir. 2007). Sometimes, the
17 copying involves many uses: In the Google Books cases, Google borrowed books from
18 libraries, made both full-image and text-only copies, and incorporated different copies into
19 different tools — one use being to reveal information “about those books,” another use being
20 to provide the books to print-disabled patrons, and still another being to back up the print
21 books if lost. Authors Guild v. Google, Inc., 804 F.3d 202, 217 (2d Cir. 2015) (quoted);
22 Authors Guild, Inc. v. HathiTrust, 755 F.3d 87, 97, 101, 103 (2d Cir. 2014) (other cited uses).
23 Our parties debate an instructive decision. In American Geophysical Union v. Texaco
24 Inc., Texaco employees used scientific articles in a central library, used copies of them in
25 personal desk libraries, and used selected copies again in the scientific laboratory — the first
26 use paid for, the second infringing, and the third plausibly fair but in fact a rare occurrence.
27 802 F. Supp. 1, 4–5, 14 (S.D.N.Y. 1992) (Judge Pierre Leval), aff’d, 60 F.3d 913, 918–19, 926
28 (2d Cir. 1994).
10
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 11 of 32
1 Here, our parties contest what use or uses are at issue. Anthropic contends it copied
2 Authors’ books only for one use: Only to train LLMs. By contrast, Authors contend it did so
3 for at least two uses: First to build a vast, central library of potentially useful content, and
4 second to train specific LLMs using shifting sets and subsets of that content — over time
5 selecting the more well-organized and well-expressed works for training. Authors also
6 complain that the print-to-digital format change was itself an infringement not abridged as a
7 fair use (Opp. 15, 25). Authors do not allege, however, that any LLM outputs infringing upon
8 their works ever reached users of the public-facing Claude service.
9 This order addresses each of the four factors in turn, pointing out how each applies to the
10 training copies and to the purchased and pirated library copies. It concludes with an integrated
11 analysis.
12 1. THE PURPOSE AND CHARACTER OF THE USE.
Northern District of California
United States District Court
13 For a given use at issue, the first factor addresses “the purpose and character of th[at] use,
14 including whether [it] is of a commercial nature or is for nonprofit educational purposes.” 17
15 U.S.C. § 107(1).
16 A. THE COPIES USED TO TRAIN SPECIFIC LLMS.
17 All agree that one use at issue was training LLMs to receive text inputs and return text
18 outputs. More specifically, Anthropic used copies of Authors’ copyrighted works to iteratively
19 map statistical relationships between every text-fragment and every sequence of text-fragments
20 so that a completed LLM could receive new text inputs and return new text outputs as if it were
21 a human reading prompts and writing responses. Authors further argue — and this order takes
22 for granted — that such training entailed “memoriz[ing]” works by “compress[ing]” copies of
23 those works into the LLM (Opp. 16–17; see Opp. Expert Zhao ¶ 74). The LLMs “memorize[d]
24 A LOT, like A LOT” (Opp. Exh. 35 at -029109). Regardless, the “purpose and character” of
25 using works to train LLMs was transformative — spectacularly so.
26 To repeat and be clear: Authors do not allege that any LLM output provided to users
27 infringed upon Authors’ works. Our record shows the opposite. Users interacted only with the
28 Claude service, which placed additional software between the user and the underlying LLM to
11
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 12 of 32
1 ensure that no infringing output ever reached the users. This was akin to the limits Google
2 imposed on how many snippets of text from any one book could be seen by any one user
3 through its Google Books service, preventing its search tool from devolving into a reading tool.
4 Google, 804 F.2d at 222. Here, if the outputs seen by users had been infringing, Authors
5 would have a different case. And, if the outputs were ever to become infringing, Authors
6 could bring such a case. But that is not this case.
7 Instead, Authors challenge only the inputs, not the outputs, of these LLMs. They point to
8 the fully trained LLMs and the Claude service only to shed light on how training itself uses
9 copies of their works and the ways the Claude service could be used to produce still other
10 works that would compete with their works. This order does the same. Authors’ arguments
11 that the training use is not transformative are unavailing.
12 First, Authors argue that using works to train Claude’s underlying LLMs was like using
Northern District of California
United States District Court
13 works to train any person to read and write, so Authors should be able to exclude Anthropic
14 from this use (Opp. 16). But Authors cannot rightly exclude anyone from using their works for
15 training or learning as such. Everyone reads texts, too, then writes new texts. They may need
16 to pay for getting their hands on a text in the first instance. But to make anyone pay
17 specifically for the use of a book each time they read it, each time they recall it from memory,
18 each time they later draw upon it when writing new things in new ways would be unthinkable.
19 For centuries, we have read and re-read books. We have admired, memorized, and internalized
20 their sweeping themes, their substantive points, and their stylistic solutions to recurring writing
21 problems.
22 Second, to that last point, Authors further argue that the training was intended to
23 memorize their works’ creative elements — not just their works’ non-protectable ones (Opp.
24 17). But this is the same argument. Again, Anthropic’s LLMs have not reproduced to the
25 public a given work’s creative elements, nor even one author’s identifiable expressive style
26 (assuming arguendo that these are even copyrightable). Yes, Claude has outputted grammar,
27 composition, and style that the underlying LLM distilled from thousands of works. But if
28 someone were to read all the modern-day classics because of their exceptional expression,
12
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 13 of 32
1 memorize them, and then emulate a blend of their best writing, would that violate the
2 Copyright Act? Of course not. Copyright does not extend to “method[s] of operation,
3 concept[s], [or] principle[s]” “illustrated[ ] or embodied in [a] work.” 17 U.S.C. § 102(b); see,
4 e.g., Nichols v. Universal Pictures Corp., 45 F.2d 119, 120–22 (2d Cir. 1930) (Judge Learned
5 Hand) (stage properties and storytelling elements); Apple Comput., Inc. v. Microsoft Corp., 35
6 F.3d 1435, 1445 (9th Cir. 1994) (“user-friendly” design principles and elements); Swirsky v.
7 Carey, 376 F.3d 841, 848 (9th Cir. 2004) (music theory principles and chord progressions).
8 Third, Authors next argue that computers nonetheless should not be allowed to do what
9 people do.
10 Authors cite a decision seeming to say as much (Opp. 16–17). But the judge there twice
11 emphasized while discussing “purpose and character” of the use that what was trained was “not
12 generative AI (AI that writes new content itself).” Rather, what was trained — using a
Northern District of California
United States District Court
13 proprietary system for finding court opinions in response to a given legal topic — was a
14 competing AI tool for finding court opinions in response to a given legal topic. That was not
15 transformative. Thomson Reuters Enter. Centre GmbH v. Ross Intell. Inc., 765 F. Supp. 3d
16 382, 398 (D. Del. 2025) (Judge Stephanos Bibas), appeal docketed, No. 25-8018 (3d Cir. Apr.
17 14, 2025).
18 A better analogue to our facts would be an AI tool trained — using court opinions, and
19 briefs, law review articles, and the like — to receive legal prompts and respond with fresh legal
20 writing. And, on facts much like those, a different court came out the other way. It found fair
21 use. White v. W. Pub. Corp., 29 F. Supp. 3d 396, 400 (S.D.N.Y. 2014) (Judge Jed Rakoff).
22 The latter use stood sufficiently “orthogonal” to anything that any copyright owner
23 rightly could expect to control. See Warhol, 598 U.S. at 538–40. It could thus be freed up for
24 the copyist to use, “promot[ing] the progress of science and the arts, without diminishing the
25 incentive to create.” Id. at 531 (emphasis added); see U.S. CONST. art. I, § 8, cl. 8.
26 In short, the purpose and character of using copyrighted works to train LLMs to generate
27 new text was quintessentially transformative. Like any reader aspiring to be a writer,
28 Anthropic’s LLMs trained upon works not to race ahead and replicate or supplant them — but
13
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 14 of 32
1 to turn a hard corner and create something different. If this training process reasonably
2 required making copies within the LLM or otherwise, those copies were engaged in a
3 transformative use.
4 The first factor favors fair use for the training copies.
5 B. THE COPIES USED TO BUILD A CENTRAL LIBRARY.
6 But that is not the only use at issue. Recall that Anthropic purchased millions of print
7 books for its central library and pirated millions of digital books for its central library, too. It
8 used specific sets and subsets of books for training specific LLMs. And, it then retained all the
9 copies in its central library for other uses that might arise even after deciding it would not use
10 them to train any LLM (at all or ever again). Anthropic seems to believe that because some of
11 the works it copied were sometimes used in training LLMs, Anthropic was entitled to take for
12 free all the works in the world and keep them forever with no further accounting. There is no
Northern District of California
United States District Court
13 carveout, however, from the Copyright Act for AI companies.
14 Because the legal issues differ between the library copies Anthropic purchased and
15 pirated, this order takes them in turn.
16 (i) The Purchased Library Copies Converted from Print to Digital.
17 Anthropic purchased millions of print copies to “build a research library” (Opp. Exh. 22
18 at 145, 148). It destroyed each print copy while replacing it with a digital copy for use in its
19 library (not for sharing nor sale outside the company). As to these copies, Authors do not
20 complain that Anthropic failed to pay to acquire a library copy. Authors only complain that
21 Anthropic changed each copy’s format from print to digital (see Opp. 15, 25 & n.14). On the
22 facts here, that format change itself added no new copies, eased storage and enabled
23 searchability, and was not done for purposes trenching upon the copyright owner’s rightful
24 interests — it was transformative.
25 Anthropic purchased its print copies fair and square. With each purchase came
26 entitlement for Anthropic to “dispose[ ]” each copy as it saw fit. 17 U.S.C. § 109(a). So,
27 Anthropic was entitled to keep the copies in its central library for all the ordinary uses. Yes,
28
14
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 15 of 32
1 Anthropic changed the format of these library copies from print to digital — giving rise to the
2 issue here.
3 All agree on the facts of the format change. Anthropic “destructively scan[ned]” the
4 print copies to create the digital ones. Anthropic or its vendors stripped the bindings from the
5 print books, cut the pages to workable dimensions, and scanned those pages — discarding each
6 print copy while creating a digital one in its place. The digital copy was then housed in the
7 “research library” or “generalized data area” in place of the print copy (Opp. Exh. 22 at 145–
8 46, 193–94). Authors do not allege and our record does not show that Anthropic provided its
9 converted digital copies of print books to anyone outside Anthropic.
10 The parties disagree about the legal consequences of the format change. Was scanning
11 the print copies to create digital replacements transformative? Anthropic argues it was because
12 it was reasonably necessary to training LLMs. Authors argue it was a distinguishable step
Northern District of California
United States District Court
13 requiring independent justification.
14 Here, for reasons narrower than Anthropic offers, the mere format change was a fair use.
15 Storage and searchability are not creative properties of the copyrighted work itself but
16 physical properties of the frame around the work or informational properties about the work.
17 See Texaco, 802 F. Supp. at 14 (physical), aff’d, 60 F.3d at 919; Google, 804 F.3d at 225
18 (informational); Sony Corp. of Am. v. Universal City Studios, Inc. (“Sony Betamax”), 464 U.S.
19 417, 447 (1984) (rightful interests). In Texaco, the court reasoned that if a purchased scientific
20 journal article had been copied “onto microfilm to conserve space, this might [have been] a
21 persuasive transformative use.” 802 F. Supp. at 14 (Judge Pierre Leval), aff’d, 60 F.3d at 919
22 (reducing “bulk[ ]” “might suffice to tilt the first fair use factor in favor of Texaco if these
23 purposes were dominant“). In Google Books, the court reasoned that a print-to-digital change
24 to expose information about the work was transformative. Google, 804 F.3d at 225 (Judge
25 Pierre Leval). And, in Sony Betamax, the Supreme Court held that making a recording of a
26 television show in order to instead watch it at a later time was copying but did not usurp any
27 rightful interest of the copyright owner. 464 U.S. at 447, 455. Important to the Supreme
28 Court’s reasoning was the expectation that most such copiers would not distribute the
15
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 16 of 32
1 permanent copies of the work. Finally, in A&M Records, Inc. v. Napster, Inc., our court of
2 appeals recognized the reasoning just explained, and therefore rejected by contrast a
3 digitization effort that was touted as space-shifting but in fact resulted in the multiplication of
4 copies shared with outsiders through a file-sharing service. 239 F.3d 1004, 1019 (9th Cir.
5 2001), aff’g in this part 114 F. Supp. 2d 896, 912–13, 915–16 (N.D. Cal. 2000) (Judge Marilyn
6 Hall Patel) (citing Sony Betamax and Texaco).
7 Here, every purchased print copy was copied in order to save storage space and to enable
8 searchability as a digital copy. The print original was destroyed. One replaced the other. And,
9 there is no evidence that the new, digital copy was shown, shared, or sold outside the company.
10 This use was even more clearly transformative than those in Texaco, Google, and Sony
11 Betamax (where the number of copies went up by at least one), and, of course, more
12 transformative than those uses rejected in Napster (where the number went up by “millions” of
Northern District of California
United States District Court
13 copies shared for free with others).
14 Yes, Anthropic is a commercial outfit. And, this order takes for granted that Anthropic in
15 fact benefited from the print-to-digital format change — or it would not have gone to all the
16 trouble. But the crux of the first fair use factor’s concern for “commercial” use is in protecting
17 the copyright owners and their entitlements to exploit their copyright as they see fit (or not).
18 See, e.g., Harper & Row, Publishers, Inc. v. Nation Enters., 471 U.S. 539, 562 (1985). That
19 the accused is a commercial entity is indicative, not dispositive. That the accused stands to
20 benefit is likewise indicative. But what matters most is whether the format change exploits
21 anything the Copyright Act reserves to the copyright owner. Anthropic already had purchased
22 permanent library copies (print ones). It did not create new copies to share or sell outside.
23 Yes, Authors also might have wished to charge Anthropic more for digital than for print
24 copies. And, this order takes for granted that Authors could have succeeded if Anthropic had
25 been barred from the format change. “But the Constitution’s language [in Clause 8] nowhere
26 suggests that [the copyright owner’s] limited exclusive right should include a right to divide
27 markets or a concomitant right to charge different purchasers different prices for the same
28 book, [merely] say to increase or to maximize gain.” See Kirtsaeng v. John Wiley & Sons,
16
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 17 of 32
1 Inc., 568 U.S. 519, 552 (2013); see also U.S. CONST. art. I., § 8, cl. 8. Nor does the Copyright
2 Act itself. Section 106 sets out exclusive rights that fair uses under Section 107 abridge.
3 Section 106(1) reserves to the copyright owner the right to make reproductions. But on our
4 facts we face the unusual situation where one copy entirely replaced the another. And,
5 Section 106(2) reserves to the copyright owner the right to make derivative works that add or
6 subtract creative material — as occurs in a “translation, musical arrangement, dramatization,
7 fictionalization, motion picture version, sound recording, art reproduction, abridgment, [or]
8 condensation” of a book, 17 U.S.C. § 101 (definitions). For some “other modification[ ]” of a
9 book to constitute a “derivative work,” it must itself “represent an original work of
10 authorship.” Ibid. But on our facts the format was changed but no content was added or
11 subtracted. See Mirage Editions, Inc. v. Albuquerque A.R.T. Co., 856 F.2d 1341, 1342, 1343–
12 44 (9th Cir. 1988) (yes where elements added to create new decorative ceramic).4
Northern District of California
United States District Court
13 Section 106(3) further reserves to the copyright owner the right to distribute copies. But again,
14 the replacement copy here was kept in the central library, not distributed. Cf. Fox News
15 Network, LLC v. TVEyes, Inc., 883 F.3d 169, 176–78 (2d Cir. 2018) (enabling searching for
16 “information about the material” can be transformative use, even if some distribution results);
17 Lewis Galoob Toys, Inc. v. Nintendo of Am., Inc., 964 F.2d 965, 968, 971 (9th Cir. 1992)
18 (using nifty converter to “merely enhance[ ]” audiovisual displays emitted from purchased
19 videogame cartridge was fair use of those displays partly because no surplus copies of
20 cartridge or displays were ever created).
21 As a result, Anthropic’s format-change from print library copies to digital library copies
22 was transformative under fair use factor one. Anthropic was entitled to retain a copy of these
23 works in a print format. It retained them instead in a digital format, easing storage and
24
25 4
Even if print-to-digital format change did infringe the right to prepare derivative works,
26 Authors have conceded that “Plaintiffs’ infringement claims are predicated on Anthropic’s
unauthorized reproduction (17 U.S.C. § 106(1)); Plaintiffs are not alleging infringement by
27 Anthropic of any right to prepare derivative works (id. at § 106(2))” (Dkt. No. 203 at 2 (citations
original)). Whether this concession had consequence for copies tokenized and used for training or
28 “compressed” into the trained LLMs is not reached by this order because Anthropic does not rely
on Authors’ concession and those copies were here used transformatively.
17
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 18 of 32
1 searchability. And, the further copies made therefrom for purposes of training LLMs were
2 themselves transformative for that further reason, as above.
3 To be clear, this print-to-digital conversion involved a different and narrower form of
4 transformative use than the broader one advanced by Anthropic. Anthropic argues that the
5 central library use was part and parcel of the LLM training use and therefore transformative.
6 This order disagrees. However, this order holds that the mere conversion of a print book to a
7 digital file to save space and enable searchability was transformative for that reason alone.
8 Therefore, the digital copy should be treated just as if the purchased print copy had been placed
9 in the central library.
10 In sum, the first fair use factor favors fair use for the digital library copies converted from
11 purchased print library copies — but these do not excuse the pirated library copies.
12 (ii) The Pirated Library Copies.
Northern District of California
United States District Court
13 Before buying books for its central library, Anthropic downloaded over seven million
14 pirated copies of books, paid nothing, and kept these pirated copies in its library even after
15 deciding it would not use them to train its AI (at all or ever again). Authors argue Anthropic
16 should have paid for these pirated library copies (e.g., Tr. 24–25, 65; Opp. 7, 12–13). This
17 order agrees.
18 The basic problem here was well-stated by Anthropic at oral argument: “You can’t just
19 bless yourself by saying I have a research purpose and, therefore, go and take any textbook you
20 want. That would destroy the academic publishing market if that were the case” (Tr. 53). Of
21 course, the person who purchases the textbook owes no further accounting for keeping the
22 copy. But the person who copies the textbook from a pirate site has infringed already, full
23 stop. This order further rejects Anthropic’s assumption that the use of the copies for a central
24 library can be excused as fair use merely because some will eventually be used to train LLMs.
25 This order doubts that any accused infringer could ever meet its burden of explaining
26 why downloading source copies from pirate sites that it could have purchased or otherwise
27 accessed lawfully was itself reasonably necessary to any subsequent fair use. There is no
28 decision holding or requiring that pirating a book that could have been bought at a bookstore
18
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 19 of 32
1 was reasonably necessary to writing a book review, conducting research on facts in the book,
2 or creating an LLM. Such piracy of otherwise available copies is inherently, irredeemably
3 infringing even if the pirated copies are immediately used for the transformative use and
4 immediately discarded.
5 But this order need not decide this case on that rule. Anthropic did not use these copies
6 only for training its LLM. Indeed, it retained pirated copies even after deciding it would not
7 use them or copies from them for training its LLMs ever again. They were acquired and
8 retained, as a central library of all the books in the world.
9 Building a central library of works to be available for any number of further uses was
10 itself the use for which Anthropic acquired these copies. One further use was making further
11 copies for training LLMs. But not every book Anthropic pirated was used to train LLMs.
12 And, every pirated library copy was retained even if it was determined it would not be so used.
Northern District of California
United States District Court
13 Pirating copies to build a research library without paying for it, and to retain copies should they
14 prove useful for one thing or another, was its own use — and not a transformative one (see
15 Tr. 24–25, 35, 65; Opp. 4–10, 12 n.6; CC Br. Exh. 12 at -0144509 (“everything forever”)).
16 Napster, 239 F.3d at 1015; BMG Music v. Gonzalez, 430 F.3d 888, 890 (7th Cir. 2005).
17 Anthropic’s briefing contains other reasons why it believes its pirated library copies are
18 irrelevant to our fair use analysis, notwithstanding its own statements at our oral argument.
19 First, Anthropic accepts in this posture that it acted in bad faith but argues that its bad
20 faith in pirating copies cannot “somehow short-circuit[ ]” the fair use analysis (Reply 6
21 (downplaying Atari Games Corp. v. Nintendo of Am., Inc., 975 F.2d 832, 843 (Fed. Cir. 1992)
22 (applying law of Ninth Circuit))). But its bad faith is not the basis for this decision. Each use
23 of a work must be analyzed objectively. Warhol, 598 U.S. at 544–45. The objective analysis
24 here shows the initial copies were pirated to create a central, general-purpose library, as a
25 substitute for paid copies to do the same thing. (Of course, if infringement is found, bad faith
26 would matter for determining willfulness. 17 U.S.C. § 504(c)(2).)
27 Second, Anthropic argues that its goal to put the copies eventually “to a highly
28 transformative use” requires that each copy and use along the way be justified as having a
19
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 20 of 32
1 transformative use, too (Reply 14). But now Anthropic seeks to take the shortcut Anthropic
2 just said cannot be taken. Again, the Supreme Court tasks us with looking past the “subjective
3 intent of the user” to the objective use made of each copy. Warhol, 598 U.S. at 544–45
4 (emphasis added). Put another way, what a copyist says or thinks or feels matters only to the
5 extent it shows what a copyist in fact does with the work. Indeed, the same copy can be used
6 one way, then another, each with a different result. Id. at 533. Here, what Anthropic said
7 about its acquisitions at the time — that they were made to “build[ ] a research library” while
8 avoiding a “huge legal/practice/business slog” — are relevant in this regard. And, Anthropic’s
9 actual use of these pirated copies was to create its central library of texts that, like any
10 university or corporate library, stored the works’ well-organized facts, analyses, and expressive
11 examples for various contingent uses, one being training.5
12 Third, Anthropic argues that Texaco — the case involving copies used in a central
Northern District of California
United States District Court
13 library, copies used in desk libraries, and copies used in the laboratory — is inapposite.
14 Anthropic argues that the disputed copies in Texaco were never used in the laboratory but
15 instead in personal desk libraries for a use “identical to the original purpose and use” of the
16 central library copies, and so not for a transformative use (Reply 8 (summarizing 60 F.3d at
17 922–23)). By contrast, says Anthropic, here it did use copies in the laboratory to train
18 LLMs — a very transformative use. But this is a fast glide over thin ice. Like Texaco,
19 Anthropic possessed copies it did not put into use in the laboratory and it kept those copies in a
20 central library even after its transformative use had been completed. But, unlike Texaco,
21 which bought those copies, Anthropic never paid for the central library copies stolen off the
22
23
24 5
Our court of appeals has not yet reappraised how bad faith (or good faith) figures in fair use
25 after Warhol. Its prior appraisal applied the Supreme Court’s statement that “[f]air use
presupposes good faith and fair dealing,” Harper & Row, 471 U.S. at 562 (cleaned up). See
26 Perfect 10, 508 F.3d at1164 n.8. Since then, the Supreme Court has renewed its “skepticism about
whether bad faith has any role.” Oracle, 593 U.S. at 32–33 (reiterating doubts of Campbell, 510
27 U.S. at 585 n.18). And, recently, the Supreme Court has held squarely that it is not the “subjective
intent” of a copyist that counts, but the “objective . . . use” of the copy. Warhol, 598 U.S. at 544–
28 45. This order applies this most recent analysis. Miller v. Gammie, 335 F.3d 889, 900 (9th Cir.
2003) (en banc).
20
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 21 of 32
1 internet. Texaco also shows why Anthropic is wrong to suppose that so long as you create an
2 exciting end product, every “back-end step, invisible to the public,” is excused (Br. 10).
3 Notably, this is not a case where source copies were unavailable for separate purchase or
4 loan. See, e.g., NXIVM Corp. v. Ross Inst., 364 F.3d 471, 475–76, 478–79 (2d Cir. 2004)
5 (using selections of training manual — otherwise available only to cult’s trainees subject to
6 NDAs — to expose cult in critical review); Time Inc. v. Bernard Geis Assocs., 293 F. Supp.
7 130, 135–36, 138, 146 (S.D.N.Y. 1968) (Judge Inzer Bass Wyatt) (making charcoal drawings
8 of photographs taken of originals otherwise not on sale or loan out to illustrate a history
9 book).6 Nor were the copies made only incidentally and necessarily from pirated copies. See,
10 e.g., Perfect 10, 508 F.3d at 1164 n.8 (copies of images that had been pirated by third-party
11 websites were used to index those same websites while indexing the entire web). Here, piracy
12 was the point: To build a central library that one could have paid for, just as Anthropic later
Northern District of California
United States District Court
13 did, but without paying for it.
14 Nor were the initial copies made immediately transformed into a significantly altered
15 form. In Perfect 10, images were copied by the search engine in thumbnail form only and
16 deployed immediately into the transformative use of identifying the full-sized images and the
17 pages from which they came. 508 F.3d at 1160, 1165, 1167. And, in Kelly v. Arriba Software
18 Corp., images were copied at full size and then into thumbnails for immediate use in building a
19 search engine, after which the full-sized copies were immediately deleted. 336 F.3d 811, 815
20 (9th Cir. 2003). Not here. The full-text copies of books were downloaded and maintained
21 “forever.”
22 Nor does the initial copying here even resemble the full-text copying in the Google Books
23 cases. There, libraries of authorized copies already had been assembled, and all copies
24
6
25 Anthropic repeats the misleading characterization of the copyright holder in Oracle that the
initial copies were there purloined (Reply 5). Not so. “All agree[d] that Google was and
26 remain[ed] free to use the Java language itself. All agree[d] that Google’s virtual machine [wa]s
free of any copyright issues. All agree[d] that the six-thousand-plus method implementations by
27 Google [we]re free of copyright issues. The copyright issue, rather,” was the use of Java for
purposes of creating competing software having the same familiar, functional schema. Oracle
28 Am., Inc. v. Google Inc., 872 F. Supp. 2d 974, 978 (N.D. Cal. 2012), aff’d and rev’d in part, 750
F.3d 1339 (Fed. Cir. 2014).
21
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 22 of 32
1 therefrom were made for direct employment in a one-to-one further fair use — whether the
2 transformative use of pointing to the works themselves, the use of providing the works in
3 formats for print-disabled patrons, or the use of insuring against going out of print, getting lost,
4 and becoming otherwise unavailable. HathiTrust, 755 F.3d at 97, 101, 103; Google, 804 F.3d
5 at 206, 216–18, 228 (further distinguishing search and snippet uses, which “test[ed] the
6 boundaries of fair use”). Not so here concerning the pirated copies. No authorized copies
7 existed from which Anthropic made its first copies. No full-text copy therefrom was put
8 immediately into use training LLMs. Not every copy was even necessary nor used for training
9 LLMs. No initial copy was ever deleted, even if never used or no longer used.7 The university
10 libraries and Google went to exceedingly great lengths to ensure that all copies were secured
11 against unauthorized uses — both through technical measures and through legal agreements
12 among all participants. Not so here. The library copies lacked internal controls limiting access
Northern District of California
United States District Court
13 and use.
14 Nor do the decisions on intermediate copying require anything less than the analysis
15 applied here. Anthropic argues that our court of appeals in Sega Enterprises Ltd. v. Accolade,
16 Inc. looked only at the “ultimate use” and “did not analyze a series of atomized acts of
17 ‘infringement’ distinct from that overall purpose” (Reply 3). To the contrary, the appeals court
18 examined the initial, intermediate, and ultimate copies used by the copyist. The court
19 explained that the copyist initially purchased commercially available copies of game
20 cartridges and then made further copies necessarily and “solely in order to discover the
21 functional requirements for compatibility.” 977 F.2d 1510, 1522 (9th Cir. 1992). Thus, it
22 reached only one result because on those facts there was only one “overall purpose” for the
23 unauthorized copies. Indeed, the court reaffirmed prior caselaw holding that “intermediate
24
25 7
Training LLMs was not a use where perpetually maintaining a library copy was intrinsic to the
26 proffered fair use (e.g., for a plagiarism-checker service). Nor is this an instance where retaining
at least one copy was authorized by contract with the copyright owners (e.g., by agreement to
27 express terms upon submission to a plagiarism-checker service, notwithstanding proposed terms
scrawled on a paper prior to submission). A.V. ex rel. Vanderhye v. iParadigms, LLC, 562 F.3d
28 630, 635–36 & n.5, 645 n.8 (4th Cir. 2009), aff’g in relevant parts 544 F. Supp. 2d 473, 480 (E.D.
Va. 2008) (Judge Claude Hilton). Anthropic mischaracterizes this case.
22
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 23 of 32
1 copying of [a work] may infringe the exclusive rights granted to the copyright owner in
2 [S]ection 106 of the Copyright Act regardless of whether the end product of the copying also
3 infringes those rights.” Id. at 1518–19 (reaffirming Walker v. Univ. Books, 602 F.2d 859, 864
4 (9th Cir. 1979)).
5 Similarly, in Sony Computer Entertainment, Inc. v. Connectix Corp., our appeals court
6 applied the same law to similarly focused conduct. Another copyist allegedly had purchased
7 an authorized copy and then made further copies solely and necessarily to reverse-engineer
8 compatibility requirements. 203 F.3d 596, 601, 602–03 (9th Cir. 2000).
9 Both Sega and Sony avoided imposing an “artificial hurdle” to fair use by generously
10 construing the intermediate copying necessary to the fair use. As one example, Sega stated
11 that an engineer should be permitted to reboot her computer while undertaking to reverse-
12 engineer software loaded onto it — even if doing so creates another digital copy of the
Northern District of California
United States District Court
13 software and is not strictly necessary to reverse-engineering. Id. at 605. But neither Sega nor
14 Sony fathomed gifting an “artificial head start” to a fair user, either, by treating even the initial
15 copy as an intermediate one.
16 And, yes, some courts have “not inquire[d]” into intermediate or initial copying at all
17 (Reply 2 (citing Campbell as not inquiring into surplus copies in the studio)). But if a “close
18 reading of those cases [ ] reveals that in none of them was the legality of the [initial or]
19 intermediate copying at issue,” then it was not raised and not necessarily decided. Sega, 977
20 F.2d at 1519; see Webster v. Fall, 266 U.S. 507, 511 (1925). It was expressly decided
21 elsewhere: Our analysis must attend to different uses of different copies, and even to different
22 uses of the same copies. Warhol, 598 U.S. at 533.
23 Finally, Anthropic argues that even if the initial copies served a different use than the
24 intermediate and ultimate copies, it was not a use for which Anthropic necessarily would have
25 needed to pay Authors for a copy. In theory, argues Anthropic, it could have done as Google
26 did in Google Books — find an existing reference library willing to loan its copies for free as
27 source copies. Or, in theory, it could have done as Anthropic did later — go buy used copies
28 without having to pay Authors at all. See 17 U.S.C. § 109(a). But Anthropic did not do those
23
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 24 of 32
1 things — instead it stole the works for its central library by downloading them from pirated
2 libraries.
3 In sum, the first factor points against fair use for the central library copies made from
4 pirated sources — and no damages from pirating copies could be undone by later paying for
5 copies of the same works.
6 2. THE NATURE OF THE COPYRIGHTED WORK.
7 The second fair use factor is “the nature of the copyrighted work.” 17 U.S.C. § 107(2).
8 This factor “calls for recognition that some works are closer to the core of intended copyright
9 protection than others, with the consequence that fair use is more difficult to establish when the
10 former works are copied.” Campbell, 510 U.S. at 586. For one thing, less protection is due
11 published works than unpublished ones. For another, less protection is due “factual works than
12 works of fiction or fantasy.” Harper & Row, 471 U.S. at 563. But less protection is not no
Northern District of California
United States District Court
13 protection. Even the arrangement of otherwise unprotectable facts surpasses the low bar for a
14 protectable original work of authorship. Google, 804 F.3d at 220.
15 Here, Anthropic accepts that all of Authors’ books — all published, whether non-fiction
16 or fiction — contained expressive elements (Reply 9). And, as set out above, this order
17 accepts Authors’ view of the evidence that their works were chosen for their expressive
18 qualities in building a central library and then in training specific LLMs (Opp. 11, 17 (citing,
19 e.g., Opp. Exh. 3 at -03433)).
20 The main function of the second factor is to help assess the other factors: to reveal
21 differences between the nature of the works at issue and the nature of their secondary use
22 (above), and to reveal any relation between the amount and substantiality of each work taken
23 and the secondary use (next). E.g., Campbell, 510 U.S. at 586; Kelly, 336 F.3d at 820; Google,
24 804 F.3d at 220; HathiTrust, 755 F.3d at 98; Bill Graham Archives v. Dorling Kindersley Ltd.,
25 448 F.3d 605, 612–13 (2d Cir. 2006).
26 The second factor points against fair use for all copies alike.
27
28
24
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 25 of 32
1 3. THE AMOUNT AND SUBSTANTIALITY OF THE PORTION USED.
2 The third fair use factor is “the amount and substantiality of the portion” of the
3 copyrighted work used by the accused. 17 U.S.C. § 107(3). The crux of this factor is whether
4 the amount was “reasonable in relation to the purpose of the copying.” Campbell, 510 U.S. at
5 586. Thus, the amount of copying is considered first against the work itself, then more
6 importantly against the proposed transformative purpose. See Warhol, 598 U.S. at 543 & n.18.
7 A. THE COPIES USED TO TRAIN SPECIFIC LLMS.
8 Copies selected for inclusion in training sets were selected because they were complete
9 and because they contained rich protectible expression, or so this order accepts the record
10 shows for Authors. Was all this copying reasonably necessary to the transformative use?
11 Yes.
12 “What matters [ ] is not so much ‘the amount and substantiality of the portion used’ in
Northern District of California
United States District Court
13 making a copy, but rather the amount and substantiality of what is thereby made accessible to a
14 public [in the purported secondary use] for which it may serve as a competing substitute [for
15 the primary use].” Google, 804 F.3d at 222. Here, once again, there is no allegation of any
16 traceable connection between the Claude service’s outputs and Authors’ works. The copying
17 used to train the LLMs underlying Claude was thus especially reasonable.
18 In response, Authors object primarily that the copying used in training was both
19 extremely extensive and not strictly necessary.
20 As to extensive copying, it is true that entire works were copied. And, “copying [ ] entire
21 work[s] ‘militate[s] against a finding of fair use.’” Worldwide Church of God v. Philadelphia
22 Church of God, Inc., 227 F.3d 1110, 1118 (9th Cir. 2000) (quoting Hustler Mag. Inc. v. Moral
23 Majority Inc., 796 F.2d 1148, 1155 (9th Cir. 1986)); see Campbell, 510 U.S. at 587. But we
24 just addressed why Authors’ argument is misdirected. The copies that count for this factor are
25 those that would merely serve the same use as the work’s ordinary one. Authors do not allege
26 such copying. The accused use here of the incremental copies is as orthogonal as can be
27 imagined to the ordinary use of a book.
28
25
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 26 of 32
1 As to strict necessity, Authors make a stronger point. When a productive use is made
2 possible only by borrowing from a specific work, fair use climbs towards its zenith. When a
3 productive use is possible without that borrowing, fair use falls to its nadir — and the
4 borrowing deserves a particularly compelling justification. See Warhol, 598 U.S. at 543 &
5 n.18, 547. Here, it is true that Anthropic could have used some other books or no books at all
6 for training its LLMs — or so this order accepts the record shows for Authors. But Anthropic
7 has presented a compelling explanation for why it was reasonably necessary to use them
8 anyway.
9 For one thing, all agree Anthropic needed billions of words to train any given LLM. If
10 using only books, Anthropic would have needed millions of books per model. If using a set
11 comprising only a small fraction of books and a larger fraction of other texts, Anthropic still
12 would have needed hundreds of thousands of books. Authors contend that because Anthropic
Northern District of California
United States District Court
13 showed it could use such smaller sets of books, it surely could have used no books at all — or
14 at least not their books (Opp. 23). But Authors forget that “reasonably necessary” does not
15 mean “strictly necessary.” Authors do not contest that the volume of text required to train an
16 LLM is monumental. Because using so many works was reasonably necessary, using any one
17 work for actually training LLMs was about as reasonable as the next.
18 For another thing, no output to the public was even alleged to be infringing. So, yes,
19 Authors’ works were chosen as the strongest examples of writing. But the compelling benefits
20 of training the LLMs on strong examples were not offset by revelations to the public of any
21 portion of the works themselves. What was copied was therefore especially reasonable and
22 compelling.
23 The third factor thus favors fair use for the training copies.
24 B. THE COPIES USED TO BUILD A CENTRAL LIBRARY.
25 But again, there was a separate use — a distinction that makes some difference as to
26 whether the amount and substantiality of the copying was “reasonable in relation to the
27 purpose of the copying” for the library copies. Campbell, 510 U.S. at 586.
28
26
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 27 of 32
1 (i) The Purchased Library Copies Converted from Print to Digital.
2 For the print library copies that Anthropic purchased and then converted into digital
3 library copies, Anthropic already enjoyed entitlement to keep the copies in its library. The
4 purpose of the copying was to keep them in its library but with more favorable storage and
5 searchability properties. Copying the entire work was exactly what this purpose required.
6 There was no surplus copying. The source copy was destroyed.
7 The third fair use factor favors fair use for the purchased library copies converted from
8 print to digital.
9 (ii) The Pirated Library Copies.
10 For the pirated library copies, however, Anthropic lacked any entitlement to hold copies
11 of the books at all. Its purpose, it says, was to train LLMs. But its objective conduct was to
12 seek “all the books in the world” and then retain them even after deciding it would not make
Northern District of California
United States District Court
13 further copies from them for training — indicating there were other further uses. Against the
14 purpose of acquiring all the books one could on the chance some might prove useful for
15 training LLMs and maybe other stuff too, almost any unauthorized copying would have been
16 too much. Anthropic copied millions of books in toto, Authors’ works among them.
17 The third factor points against fair use for the pirated library copies.
18
4. THE EFFECT OF THE USE UPON THE MARKET FOR OR VALUE OF THE
19 COPYRIGHTED WORK.
20 The final factor is “the effect of the use upon the potential market for or value of the
21 copyrighted work.” 17 U.S.C. § 107(4). This factor points against fair use when a copyist
22 makes copies available that displace demand for copies the copyright owner already makes
23 available or readily could. Texaco, 60 F.3d at 926–28 (reproduced copies); Dr. Seuss Enters.,
24 L.P. v. ComicMix LLC, 983 F.3d 443, 461 (9th Cir. 2020) (derivative copies). “While the first
25 factor considers whether and to what extent an original work and secondary use [in principle
26 could] have substitutable purposes, the fourth factor focuses on actual or potential market
27 substitution.” Warhol, 598 U.S. at 536 n.12 (emphasis added).
28
27
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 28 of 32
1 A. THE COPIES USED TO TRAIN SPECIFIC LLMS.
2 The copies used to train specific LLMs did not and will not displace demand for copies
3 of Authors’ works, or not in the way that counts under the Copyright Act.
4 Again, Authors concede that training LLMs did not result in any exact copies nor even
5 infringing knockoffs of their works being provided to the public. If that were not so, this
6 would be a different case. Authors remain free to bring that case in the future should such
7 facts develop.
8 Instead, Authors contend generically that training LLMs will result in an explosion of
9 works competing with their works — such as by creating alternative summaries of factual
10 events, alternative examples of compelling writing about fictional events, and so on. This
11 order assumes that is so (Opp. 22–23 (citing, e.g., Opp. Exh. 38)). But Authors’ complaint is
12 no different than it would be if they complained that training schoolchildren to write well
Northern District of California
United States District Court
13 would result in an explosion of competing works. This is not the kind of competitive or
14 creative displacement that concerns the Copyright Act. The Act seeks to advance original
15 works of authorship, not to protect authors against competition. Sega, 977 F.2d at 1523–24.
16 Authors next contend that training LLMs displaced (or will) an emerging market for
17 licensing their works for the narrow purpose of training LLMs (Opp. 21–22). Anthropic
18 argues that transactional costs would exceed Anthropic’s expected benefit from any such
19 bargain, prompting it to cease dealing with any rightsholders or else to cease developing such
20 technology altogether (Br. 22–23). Our record could support either account — so this order
21 must assume Authors are correct. A market could develop (Opp. 19–21 (citing record)). Even
22 so, such a market for that use is not one the Copyright Act entitles Authors to exploit.
23 None of the cases cited by Authors requires a different result. All contemplated losses of
24 something the Copyright Act properly protected — not the kinds of fair uses for which a
25 copyright owner cannot rightly expect to control. See TVEyes, Inc., 883 F.3d at 181 (use of a
26 right legally reserved to and factually already being licensed by copyright owner); Texaco, 60
27 F.3d 931 (same); Ringgold v. BET, Inc., 126 F.3d 70, 80–81 (2d Cir. 1997) (use of a right
28 legally reserved to and factually likely to be marketable by copyright owner — displaying
28
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 29 of 32
1 images of her artistic work in television shows); cf. Seltzer v. Green Day, Inc., 725 F.3d 1170,
2 1179 (9th Cir. 2013) (no evidence use could be or “was likely to” be marketable).
3 The fourth factor thus favors fair use for the training copies.
4 B. THE COPIES USED TO BUILD A CENTRAL LIBRARY.
5 (i) The Purchased Library Copies Converted from Print to Digital.
6 For these copies, this order assumes Anthropic’s format change from print to digital
7 displaced purchases of new digital copies that Anthropic would have made directly from
8 Authors (had it not been able to purchase print copies in used condition). But for reasons
9 stated under the first factor, such losses did not relate to something the Copyright Act reserves
10 for Authors to exploit. It was a format change.
11 Authors’ next argument, it seems, is that the format change nonetheless exposed it to
12 usurpation of the opportunity to sell rightful copies because Anthropic might transmit
Northern District of California
United States District Court
13 additional unauthorized digital copies more readily than it could have transmitted additional
14 unauthorized print copies — and that the same would be true for all format converters (cf. Opp.
15 25 n.14; Opp. Expert Malackowski ¶ 52). But after much discovery, there is no inkling in our
16 record of intent to redistribute library copies once acquired nor of inability to secure that
17 valuable library against outside actors. And, if the internal, central library copies did or do in
18 fact lead to further reproduction or distribution, those further copies remain redressable
19 separately by Authors. The format change did not itself usurp the Authors’ rightful
20 entitlements.
21 This factor is thus neutral for the purchased library copies converted from print to digital.
22 (ii) The Pirated Library Copies.
23 The copies used to build a central library and that were obtained from pirated sources
24 plainly displaced demand for Authors’ books — copy for copy. Not every person who merely
25 intends to make a fair use of a work is thereby entitled to a full copy in the meantime, nor even
26 to steal a copy so that achieving this fair use is especially simple or cost-effective. Here, the
27 copies employed in training LLMs were one thing, but the copies acquired to assemble a
28
29
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 30 of 32
1 convenient, general-purpose library of works for various uses for which the company might
2 have of them, if any, was a different use altogether.
3 Anthropic has almost no rebuttal on these points. First, Anthropic argues that “Claude’s
4 services do not reduce [or usurp] the value of Plaintiffs’ works through substitution in their
5 traditional markets” (see Br. Expert Peterson ¶ 33). But stealing pirated copies of Authors’
6 works plainly did. Second, Anthropic argues that it may have been able to purchase some
7 books on the open market (and some other texts), but not other texts it copied (cf. id. ¶ 48 (re
8 licensing)). But this case does not concern those other texts it could not have purchased. It
9 could have purchased Authors’ books (and many others). In fact it later did. Finally,
10 Anthropic argues that the effect on these texts from one book foregone was too small to be
11 considered (see id. ¶ 77). But the test requires that we contemplate the likely result were the
12 conduct to be condoned as a fair use — namely to steal a work you could otherwise buy (a
Northern District of California
United States District Court
13 book, millions of books) so long as you at least loosely intend to make further copies for a
14 purportedly transformative use (writing a book review with excerpts, training LLMs, etc.),
15 without any accountability. As Anthropic itself suggested, “That would destroy the [entire]
16 publishing market if that were the case” (see Tr. 53; see also Tr. 32, 41; Opp. Expert
17 Malackowski ¶¶ 31–34, 38).
18 The fourth factor points against fair use for the pirated library copies.
19 5. OVERALL ANALYSIS.
20 After the four factors and any others deemed relevant are “explored, [ ] the results [are]
21 weighed together, in light of the purposes of copyright.” Campbell, 510 U.S. at 578.
22 The copies used to train specific LLMs were justified as a fair use. Every factor but the
23 nature of the copyrighted work favors this result. The technology at issue was among the most
24 transformative many of us will see in our lifetimes.
25 The copies used to convert purchased print library copies into digital library copies were
26 justified, too, though for a different fair use. The first factor strongly favors this result, and the
27 third favors it, too. The fourth is neutral. Only the second slightly disfavors it. On balance, as
28
30
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 31 of 32
1 the purchased print copy was destroyed and its digital replacement not redistributed, this was a
2 fair use.
3 The downloaded pirated copies used to build a central library were not justified by a fair
4 use. Every factor points against fair use. Anthropic employees said copies of works (pirated
5 ones, too) would be retained “forever” for “general purpose” even after Anthropic determined
6 they would never be used for training LLMs. A separate justification was required for each
7 use. None is even offered here except for Anthropic’s pocketbook and convenience.
8 And, as for any copies made from central library copies but not used for training, this
9 order does not grant summary judgment for Anthropic. On this record in this posture, the
10 central library copies were retained even when no longer serving as sources for training copies,
11 “hundreds of engineers” could access them to make copies for other uses, and engineers did
12 make other copies. Anthropic has dodged discovery on these points (e.g., Opp. Exh. 17 at 93–
Northern District of California
United States District Court
13 94 (retained); Opp. Exh. 22 at 196 (no limits); Opp. Exh. 30 at 3, 4 (no accounting); see also
14 Opp. 15). We cannot determine the right answer concerning such copies because the record is
15 too poorly developed as to them. Anthropic is not entitled to an order blessing all copying
16 “that Anthropic has ever made after obtaining the data,” to use its words (Opp. Exh. 30 at 3, 4).
17 CONCLUSION
18 With respect to the training copies and the print-to-digital converted copies, this order has
19 drawn all ambiguities and inferences in favor of the opposing side, namely Authors. With
20 respect to the pirated copies, this order has also accepted the Authors’ version of the facts.
21 Authors did not move for summary judgment but if they had, then we would have been
22 obligated to accept all reasonable views given the evidence in defendant’s favor instead.
23 This order grants summary judgment for Anthropic that the training use was a fair use.
24 And, it grants that the print-to-digital format change was a fair use for a different reason. But it
25 denies summary judgment for Anthropic that the pirated library copies must be treated as
26 training copies.
27 We will have a trial on the pirated copies used to create Anthropic’s central library and
28 the resulting damages, actual or statutory (including for willfulness). That Anthropic later
31
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 32 of 32
1 bought a copy of a book it earlier stole off the internet will not absolve it of liability for the
2 theft but it may affect the extent of statutory damages. Nothing is foreclosed as to any other
3 copies flowing from library copies for uses other than for training LLMs.
4 IT IS SO ORDERED.
5
6 Dated: June 23, 2025.
7
8
WILLIAM ALSUP
9 UNITED STATES DISTRICT JUDGE
10
11
12
Northern District of California
United States District Court
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
32
.png)
