Bartz vs. Anthropic PBC: "Training Use Is Fair Use"

4 months ago 5
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 1 of 32 1 2 3 4 UNITED STATES DISTRICT COURT 5 NORTHERN DISTRICT OF CALIFORNIA 6 7 8 ANDREA BARTZ, CHARLES GRAEBER, and KIRK WALLACE JOHNSON, No. C 24-05417 WHA 9 Plaintiffs, 10 v. ORDER ON FAIR USE 11 ANTHROPIC PBC, 12 Northern District of California Defendant. United States District Court 13 14 15 INTRODUCTION 16 An artificial intelligence firm downloaded for free millions of copyrighted books in 17 digital form from pirate sites on the internet. The firm also purchased copyrighted books 18 (some overlapping with those acquired from the pirate sites), tore off the bindings, scanned 19 every page, and stored them in digitized, searchable files. All the foregoing was done to amass 20 a central library of “all the books in the world” to retain “forever.” From this central library, 21 the AI firm selected various sets and subsets of digitized books to train various large language 22 models under development to power its AI services. Some of these books were written by 23 plaintiff authors, who now sue for copyright infringement. On summary judgment, the issue is 24 the extent to which any of the uses of the works in question qualify as “fair uses” under 25 Section 107 of the Copyright Act. 26 STATEMENT 27 Defendant Anthropic PBC is an AI software firm founded by former OpenAI employees 28 in January 2021. Its core offering is an AI software service called Claude. When a user Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 2 of 32 1 prompts Claude with text, Claude quickly responds with text — mimicking human reading and 2 writing. Claude can do so because Anthropic trained Claude — or rather trained large 3 language models or LLMs underlying various versions of Claude — using books and other 4 texts selected from a central library Anthropic had assembled. Claude was first released 5 publicly in March 2023. Seven successive versions of Claude have been released since. Users 6 may ask Claude some questions for free. Demanding users and corporate clients pay to use 7 Claude, generating over one billion dollars in annual revenue (Opp. Exh. 18). 8 Plaintiffs Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson are authors of books 9 that Anthropic copied from pirated and purchased sources. Anthropic assembled these copies 10 into a central library of its own, copied further various sets and subsets of those library copies 11 to include in various “data mixes,” and used these mixes to train various LLMs. Anthropic 12 kept the library copies in place as a permanent, general-purpose resource even after deciding it Northern District of California United States District Court 13 would not use certain copies to train LLMs or would never use them again to do so. All of 14 Anthropic’s copying was without plaintiffs’ authorization. 15 Author Bartz wrote four novels Anthropic copied and used: The Lost Night: A Novel, 16 The Herd, We Were Never Here, and The Spare Room. Author Graeber wrote two non-fiction 17 books likewise at issue: The Good Nurse: A True Story of Medicine, Madness, and Murder, 18 and The Breakthrough: Immunotherapy and the Race to Cure Cancer. And, Author Johnson 19 penned three non-fiction books also copied and used: To Be A Friend Is Fatal: The Fight to 20 Save the Iraqis America Left Behind, The Feather Thief: Beauty, Obsession, and the Natural 21 History Heist of the Century, and The Fishermen and the Dragon: Fear, Greed, and a Fight 22 for Justice on the Gulf Coast. Plaintiffs Bartz Inc. and MJ + KJ Inc. are corporate entities that 23 Author Bartz and Author Johnson respectively set up to market their works. Between them, 24 these five plaintiffs (“Authors”) own all the copyrights in the above-listed works. 25 From the start, Anthropic “ha[d] many places from which” it could have purchased 26 books, but it preferred to steal them to avoid “legal/practice/business slog,” as cofounder and 27 chief executive officer Dario Amodei put it (see Opp. Exh. 27). So, in January or February 28 2021, another Anthropic cofounder, Ben Mann, downloaded Books3, an online library of 2 Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 3 of 32 1 196,640 books that he knew had been assembled from unauthorized copies of copyrighted 2 books — that is, pirated. Anthropic’s next pirated acquisitions involved downloading 3 distributed, reshared copies of other pirate libraries. In June 2021, Mann downloaded in this 4 way at least five million copies of books from Library Genesis, or LibGen, which he knew had 5 been pirated. And, in July 2022, Anthropic likewise downloaded at least two million copies of 6 books from the Pirate Library Mirror, or PiLiMi, which Anthropic knew had been pirated 7 (Opp. Exh. 6 at 4; Opp. Expert Zhao ¶¶ 17–29; see Class Cert. (“CC”) Opp. Expert Iyyer 8 ¶¶ 45–46). Although what was downloaded and later duplicated from these sources was 9 sometimes referred to as data or datasets, at bottom they contained full-text “ebooks or scans of 10 books” saved in individual files in formats like .pdf, .txt, and .epub (see, e.g., Opp. Exh. 12 at - 11 0391318). For Books3, most filenames identified the book inside. For LibGen and PiLiMi, 12 Anthropic downloaded a separate catalog of bibliographic metadata for each collection, with Northern District of California United States District Court 13 fields like title, author, and ISBN (see, e.g., ibid.; Opp. Exh. 16 -0533972–73). Anthropic 14 thereby pirated over seven million copies of books, including copies of at least two works at 15 issue for each Author.1 16 As Anthropic trained successive LLMs, it became convinced that using books was the 17 most cost-effective means to achieve a world-class LLM. During this time, however, 18 Anthropic became “not so gung ho about” training on pirated books “for legal reasons” (Opp. 19 Exh. 19). It kept them anyway (e.g., Opp. Exh. 17 at 93–94; CC Opp. Exh. 35 at -0273474). 20 To find a new way to get books, in February 2024, Anthropic hired the former head of 21 partnerships for Google’s book-scanning project, Tom Turvey. He was tasked with obtaining 22 “all the books in the world” while still avoiding as much “legal/practice/business slog” as 23 24 1 Specifically, those works were (see Opp. Expert Zhao ¶ 36; CC Br. Expert Zhao ¶ 66): 25  Author Bartz’s The Herd (five copies total) (in LibGen and PiLiMi);  Author Bartz’s The Lost Night (three copies total) (in Books3, LibGen, and PiLiMi); 26  Author Graeber’s The Breakthrough (four copies) (in Books3, LibGen, and PiLiMi);  Author Graeber’s The Good Nurse (five copies total) (in Books3 and LibGen); 27  Author Johnson’s To Be A Friend Is Fatal (one copy) (in Books3); and  Author Johnson’s The Feather Thief (four copies total) (in Books3, LibGen, PiLiMi). 28 Some evidence suggests Anthropic downloaded still more copies before culling empty files, duplicates, and so on to reach the numbers kept in the central library and counted here. 3 Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 4 of 32 1 possible (Opp. Exhs. 21, 27). So, in spring 2024, Turvey sent an email or two to major 2 publishers to inquire into licensing books for training AI. Had Turvey kept up those 3 conversations, he might have reached agreements to license copies for AI training from 4 publishers — just as another major technology company soon did with one major publisher 5 (e.g., Opp. Expert Malackowski ¶¶ 50, 64). But Turvey let those conversations wither. 6 Instead, Turvey and his team emailed major book distributors and retailers about bulk- 7 purchasing their print copies for the AI firm’s “research library” (Opp. Exh. 22 at 145; Opp. 8 Exh. 31 at -035589). Anthropic spent many millions of dollars to purchase millions of print 9 books, often in used condition. Then, its service providers stripped the books from their 10 bindings, cut their pages to size, and scanned the books into digital form — discarding the 11 paper originals. Each print book resulted in a PDF copy containing images of the scanned 12 pages with machine-readable text (including front and back cover scans for softcover books). Northern District of California United States District Court 13 Anthropic created its own catalog of bibliographic metadata for the books it was acquiring. It 14 acquired copies of millions of books, including of all works at issue for all Authors.2 15 Anthropic may have copied portions of Authors’ books on other occasions, too — such 16 as while copying book reviews, academic papers, internet blogposts, or the like for its central 17 library. And, Anthropic’s scanning service providers may have copied Authors’ print books 18 along the way to delivering the final digital copies to Anthropic. But neither side here 19 specifically raises legal issues implicated by any such copies. Nor will this order. 20 From all the above sources, Anthropic created a general “research library” or 21 “generalized data area.” What was this for? As Turvey said, this was a “way of creating 22 information that would be voluminous and that we would use for research,” or otherwise to 23 2 24 In other words, within the scanned books were one or more copies of the following works:  Author Bartz’s The Herd; 25  Author Bartz’s The Lost Night;  Author Bartz’s We Were Never Here; 26  Author Bartz’s The Spare Room;  Author Graeber’s The Breakthrough; 27  Author Graeber’s The Good Nurse;  Author Johnson’s To Be A Friend Is Fatal; 28  Author Johnson’s The Feather Thief; and,  Author Johnson’s The Fishermen. 4 Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 5 of 32 1 “inform our — our products” (Opp. Exh. 22 at 145–46, 194). The copies were kept in the 2 original “version of the underlying” book files Anthropic had “obtained or created,” that is, 3 pirated or scanned (Opp. Exh. 30 at 3, 4). Anthropic planned to “store everything forever; we 4 might separate out books into categories[, but t]here [wa]s no compelling reason to delete a 5 book” — even if not used for training LLMs. Over time, Anthropic invested in building more 6 tools for searching its “general purpose” library and for accessing books or sets of books for 7 further uses (see CC Br. Exh. 12 at -0144509; CC Reply Exh. 45 at -0365931–32, -0365939– 8 42 (reviewing and seeking to improve “[w]hat [ ] researchers do today if they want to search 9 for a book,” including improving bibliographic metadata and consolidating varied resources)). 10 One further use was training LLMs. As a preliminary step towards training, engineers 11 browsed books and bibliographic metadata to learn what languages the books were written in, 12 what subjects they concerned, whether they were by famous authors or not, and so on — Northern District of California United States District Court 13 sometimes by “open[ing] any of the books” and sometimes using software. From the library 14 copies, engineers copied the sets or subsets of books they believed best for training and 15 “iterate[d]” on those selections over time. For instance, two different subsets of print-sourced 16 books were included in “data mixes” for training two different LLMs. Each was just a fraction 17 of all the print-sourced books. Similarly, different sets or “subsets” or “parts of” or “portions” 18 of the collections sourced from Books3, LibGen, and PiLiMi were used to train different 19 LLMs. Anthropic analyzed the consequences of using more books, fewer books, different 20 books. The goal was to improve the “data mix“ to improve each LLM and, ultimately, 21 Claude’s performance for paying customers.3 22 23 3 24 (See, e.g., Opp. Exh. 12 at -0391318 (engineers were able to “open any of the books”); CC Reply Exh. 45 at -0365941 (some engineers “want[ed] to search for a book” and get its “scanned 25 book file[ ]”); Opp. Exh. 30 at 3 (made copies of “each such dataset or portions thereof” for training); Opp. Exh. 6 at 3–4 (trained on “portions of datasets,” with at least two such portions 26 from LibGen and four from PiLiMi); Opp. Expert Zhao ¶¶ 27–28, 30–31 (plus two more from PiLiMi, and at least three from scanned books); CC Opp. Exh. 35 at -0273477–82 (tested subsets 27 of pirated and purchased-and-scanned books to see consequences for training); CC Br. Exh. 12 at - 0144508–09 (“iterate[d]” selections from library and “train[ed] new models on the best data”); Br. 28 Expert Kaplan ¶¶ 42–45 (explained goals of improving data mixes); Br. Expert Peterson ¶ 14 (similar)). 5 Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 6 of 32 1 Over time, Anthropic came to value most highly for its data mixes books like the ones 2 Authors had written, and it valued them because of the creative expressions they contained. 3 Claude’s customers wanted Claude to write as accurately and as compellingly as Authors. So, 4 it was best to train the LLMs underlying Claude on works just like the ones Authors had 5 written, with well-curated facts, well-organized analyses, and captivating fictional 6 narratives — above all with “good writing” of the kind “an editor would approve of” (Opp. 7 Exh. 3 at -03433). Anthropic could have trained its LLMs without using such books or any 8 books at all. That would have required spending more on, say, staff writers to create 9 competing exemplars of good writing, engineers to revise bad exemplars into better ones, 10 energy bills to power more rounds of training and fine-tuning, and so on. Having canonical 11 texts to draw upon helped (e.g., Opp. Expert Zhao ¶ 81). 12 Each work selected for training any given LLM was copied in four main ways — and in Northern District of California United States District Court 13 fact so many times that Anthropic admits it would be impractical even to estimate. 14 First, each work selected was copied from the central library to create a working copy for 15 the training set. 16 Second, each work was cleaned to remove a small amount of lower-valued or repeating 17 text (like headers, footers, or page numbers), with a “cleaned” copy resulting. If the same book 18 appeared twice, or if while looking across the entire provisional training set it became clear 19 there was some other reason to cull a book or category, Anthropic had the capability to delete 20 relevant copy(ies) from the set at this step (see CC Br. Expert Zhao ¶¶ 71–72). 21 Third, each cleaned copy was translated into a “tokenized” copy. Some words were 22 “stemmed” or “lemmatized” into simpler forms (e.g., “studying” to “study”). And, all 23 characters were grouped into short sequences and translated into corresponding number 24 sequences or “tokens” according to an Anthropic-made dictionary. The resulting tokenized 25 copies were then copied repeatedly during training. By one account, this process involved the 26 iterative, trial-and-error discovery of contingent statistical relationships between each word 27 fragment and all other word fragments both within any work and across trillions of word 28 6 Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 7 of 32 1 fragments from other copied books, copied websites, and the like. Other steps in training are 2 not at issue here (id. ¶¶ 73–76; see Opp. Expert Zhao ¶ 38 & n.6). 3 Fourth, each fully trained LLM itself retained “compressed” copies of the works it had 4 trained upon, or so Authors contend and this order takes for granted. In essence, each LLM’s 5 mapping of contingent relationships was so complete it mapped or indeed simply “memorized” 6 the works it trained upon almost verbatim. So, if each completed LLM had been asked to 7 recite works it had trained upon, it could have done so (e.g., Opp. Expert Zhao ¶ 74). Further 8 steps refining the LLM are not at issue here. 9 However, that was as far as the training copies propagated towards the outside world. 10 When each LLM was put into a public-facing version of Claude, it was complemented by other 11 software that filtered user inputs to the LLM and filtered outputs from the LLM back to the 12 user (id. ¶¶ 75–77). As a result, Authors do not allege that any infringing copy of their works Northern District of California United States District Court 13 was or would ever be provided to users by the Claude service. Yes, Claude could help less 14 capable writers create works as well-written as Authors’ and competing in the same categories. 15 But Claude created no exact copy, nor any substantial knock-off. Nothing traceable to 16 Authors’ works. Such allegations are simply not part of plaintiffs’ amended complaint, nor in 17 our record. 18 Neither side puts directly at issue any copies of any works that might have been used for 19 the filtering software. Nor will this order. 20 In sum, the copies of books pirated or purchased-and-destructively-scanned were placed 21 into a central “research library” or “generalized data area,” sets or subsets were copied again to 22 create training copies for data mixes, the training copies were successively copied to be 23 cleaned, tokenized, and compressed into any given trained LLM, and once trained an LLM did 24 not output through Claude to the public any further copies. Finally, once Anthropic decided a 25 copy of a pirated or scanned book in the library would not be used for training at all or ever 26 again, Anthropic still retained that work as a “hard resource” for other uses or future uses. At 27 least one work from each Author was present in every phase described above. 28 * * * 7 Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 8 of 32 1 In August 2024, the three individual authors brought this putative class action 2 complaining that Anthropic had infringed its federal copyrights by pirating copies for its 3 library and by reproducing them to train its LLMs (Compl. ¶¶ 45–46, 71; see Amd. Compl. 4 ¶¶ 47–48, 75). In October 2024, a scheduling order required that any motion for class 5 certification be brought by March 6, 2025 (Dkt. No. 49). 6 The individual authors soon amended their complaint to include affiliated corporate 7 entities as named plaintiffs, with consent. And, Anthropic chose not to move to dismiss the 8 amended complaint, as it earlier had planned (see Dkt. No. 37). Instead, Anthropic moved to 9 allow an early motion for summary judgment on fair use, even before class certification 10 (Dkt. No. 88; see Feb. 25, 2025 Tr. 15). Permission was granted. 11 Anthropic now moves for summary judgment on fair use only. Fair use is a legal 12 question for the judge with underlying fact questions, if any, for the jury. To prevail on Northern District of California United States District Court 13 summary judgment, Anthropic must rely on undisputed facts and/or factual inferences favoring 14 the opposing side. Anthropic thus bears the burdens of production and persuasion in this 15 motion. See Google LLC v. Oracle Am., Inc., 593 U.S. 1, 23–24 (2021); Andy Warhol Found. 16 for the Visual Arts, Inc. v. Goldsmith, 598 U.S. 508, 547 n.21 (2023); Campbell v. Acuff-Rose 17 Music, Inc., 510 U.S. 569, 590 & n.20, 594 (1994); see also Nissan Fire & Marine Ins. Co. v. 18 Fritz Cos., 210 F.3d 1099, 1102–03 (9th Cir. 2000). 19 Notably, in its motion, Anthropic argues that pirating initial copies of Authors’ books and 20 millions of other books was justified because all those copies were at least reasonably 21 necessary for training LLMs — and yet Anthropic has resisted putting into the record what 22 copies or even sets of copies were in fact used for training LLMs. For example, at oral 23 argument, Anthropic asserted that if a purported fair user had retained pirated copies for uses 24 beyond the fair use, then her piracy would not be excused by the fair use (Tr. 53, 56). But 25 when Authors earlier interrogated Anthropic in discovery about what library copies (the 26 original copies “obtained or created” by Anthropic) Anthropic had recopied for further uses, 27 Anthropic responded that providing information about any copies made for uses beyond 28 training commercially released LLMs would be overbroad, and that it could not count up all its 8 Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 9 of 32 1 copying even for LLMs in any case (e.g., Opp Exh. 30 at 3). We know that Anthropic has 2 more information about what it in fact copied for training LLMs (or not). Anthropic earlier 3 produced a spreadsheet that showed the composition of various data mixes used for training 4 various LLMs — yet it clawed back that spreadsheet in April (Opp. Fredricks Decl. ¶¶ 2–3). A 5 discovery dispute regarding that spreadsheet remains pending. But Anthropic did not need a 6 court order to offer up what it possessed in support of its motion. All deficiencies must be held 7 against Anthropic and not the other way around. 8 This is the first substantive order in this case. A contemporaneous motion for class 9 certification remains pending. It proposes one class related to works that were pirated 10 (whether or not used to train LLMs), and a second class related to works that were purchased, 11 scanned, and used in training LLMs. This order follows full briefing, a hearing, and 12 supplemental briefing. Northern District of California United States District Court 13 To summarize the analysis that now follows, the use of the books at issue to train Claude 14 and its precursors was exceedingly transformative and was a fair use under Section 107 of the 15 Copyright Act. And, the digitization of the books purchased in print form by Anthropic was 16 also a fair use but not for the same reason as applies to the training copies. Instead, it was a 17 fair use because all Anthropic did was replace the print copies it had purchased for its central 18 library with more convenient space-saving and searchable digital copies for its central 19 library — without adding new copies, creating new works, or redistributing existing copies. 20 However, Anthropic had no entitlement to use pirated copies for its central library. Creating a 21 permanent, general-purpose library was not itself a fair use excusing Anthropic’s piracy. 22 ANALYSIS 23 Section 107 of the Copyright Act identifies four factors for determining whether a given 24 use of a copyrighted work is a fair use: 25 [T]he fair use of a copyrighted work . . . for purposes such as criticism, comment, news reporting, teaching (including multiple 26 copies for classroom use), scholarship, or research, is not an infringement of copyright. In determining whether the use made 27 of a work in any particular case is a fair use the factors to be considered shall include — 28 9 Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 10 of 32 1 (1) the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational 2 purposes; 3 (2) the nature of the copyrighted work; 4 (3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and 5 (4) the effect of the use upon the potential market for or value of 6 the copyrighted work. 7 These factors presuppose a “use.” So, at the threshold, a court must decide whether a 8 “copyrighted [work] has been used in multiple ways,” then evaluate each. Warhol, 598 U.S. at 9 533. Uses do not turn on “the subjective intent of the user” but on “an objective inquiry into 10 what use was made, i.e., what the user d[id] with the original work.” Id. at 544–45. A “use” 11 should be construed narrowly enough to not “swallow” distinguishable infringing uses, much 12 less categories of exclusive rights in toto. Id. at 541, 543 n.18, 546–48. Sometimes, the Northern District of California United States District Court 13 challenged copying involves just one use: In Perfect 10, Inc. v. Amazon.com, Inc., Google 14 visited websites having full-sized images, made only reduced-sized copies, and incorporated 15 those directly into its search engine — the sole use of the thumbnails being as “pointer[s]” to 16 the images themselves. 508 F.3d 1146, 1157, 1160, 1165 (9th Cir. 2007). Sometimes, the 17 copying involves many uses: In the Google Books cases, Google borrowed books from 18 libraries, made both full-image and text-only copies, and incorporated different copies into 19 different tools — one use being to reveal information “about those books,” another use being 20 to provide the books to print-disabled patrons, and still another being to back up the print 21 books if lost. Authors Guild v. Google, Inc., 804 F.3d 202, 217 (2d Cir. 2015) (quoted); 22 Authors Guild, Inc. v. HathiTrust, 755 F.3d 87, 97, 101, 103 (2d Cir. 2014) (other cited uses). 23 Our parties debate an instructive decision. In American Geophysical Union v. Texaco 24 Inc., Texaco employees used scientific articles in a central library, used copies of them in 25 personal desk libraries, and used selected copies again in the scientific laboratory — the first 26 use paid for, the second infringing, and the third plausibly fair but in fact a rare occurrence. 27 802 F. Supp. 1, 4–5, 14 (S.D.N.Y. 1992) (Judge Pierre Leval), aff’d, 60 F.3d 913, 918–19, 926 28 (2d Cir. 1994). 10 Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 11 of 32 1 Here, our parties contest what use or uses are at issue. Anthropic contends it copied 2 Authors’ books only for one use: Only to train LLMs. By contrast, Authors contend it did so 3 for at least two uses: First to build a vast, central library of potentially useful content, and 4 second to train specific LLMs using shifting sets and subsets of that content — over time 5 selecting the more well-organized and well-expressed works for training. Authors also 6 complain that the print-to-digital format change was itself an infringement not abridged as a 7 fair use (Opp. 15, 25). Authors do not allege, however, that any LLM outputs infringing upon 8 their works ever reached users of the public-facing Claude service. 9 This order addresses each of the four factors in turn, pointing out how each applies to the 10 training copies and to the purchased and pirated library copies. It concludes with an integrated 11 analysis. 12 1. THE PURPOSE AND CHARACTER OF THE USE. Northern District of California United States District Court 13 For a given use at issue, the first factor addresses “the purpose and character of th[at] use, 14 including whether [it] is of a commercial nature or is for nonprofit educational purposes.” 17 15 U.S.C. § 107(1). 16 A. THE COPIES USED TO TRAIN SPECIFIC LLMS. 17 All agree that one use at issue was training LLMs to receive text inputs and return text 18 outputs. More specifically, Anthropic used copies of Authors’ copyrighted works to iteratively 19 map statistical relationships between every text-fragment and every sequence of text-fragments 20 so that a completed LLM could receive new text inputs and return new text outputs as if it were 21 a human reading prompts and writing responses. Authors further argue — and this order takes 22 for granted — that such training entailed “memoriz[ing]” works by “compress[ing]” copies of 23 those works into the LLM (Opp. 16–17; see Opp. Expert Zhao ¶ 74). The LLMs “memorize[d] 24 A LOT, like A LOT” (Opp. Exh. 35 at -029109). Regardless, the “purpose and character” of 25 using works to train LLMs was transformative — spectacularly so. 26 To repeat and be clear: Authors do not allege that any LLM output provided to users 27 infringed upon Authors’ works. Our record shows the opposite. Users interacted only with the 28 Claude service, which placed additional software between the user and the underlying LLM to 11 Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 12 of 32 1 ensure that no infringing output ever reached the users. This was akin to the limits Google 2 imposed on how many snippets of text from any one book could be seen by any one user 3 through its Google Books service, preventing its search tool from devolving into a reading tool. 4 Google, 804 F.2d at 222. Here, if the outputs seen by users had been infringing, Authors 5 would have a different case. And, if the outputs were ever to become infringing, Authors 6 could bring such a case. But that is not this case. 7 Instead, Authors challenge only the inputs, not the outputs, of these LLMs. They point to 8 the fully trained LLMs and the Claude service only to shed light on how training itself uses 9 copies of their works and the ways the Claude service could be used to produce still other 10 works that would compete with their works. This order does the same. Authors’ arguments 11 that the training use is not transformative are unavailing. 12 First, Authors argue that using works to train Claude’s underlying LLMs was like using Northern District of California United States District Court 13 works to train any person to read and write, so Authors should be able to exclude Anthropic 14 from this use (Opp. 16). But Authors cannot rightly exclude anyone from using their works for 15 training or learning as such. Everyone reads texts, too, then writes new texts. They may need 16 to pay for getting their hands on a text in the first instance. But to make anyone pay 17 specifically for the use of a book each time they read it, each time they recall it from memory, 18 each time they later draw upon it when writing new things in new ways would be unthinkable. 19 For centuries, we have read and re-read books. We have admired, memorized, and internalized 20 their sweeping themes, their substantive points, and their stylistic solutions to recurring writing 21 problems. 22 Second, to that last point, Authors further argue that the training was intended to 23 memorize their works’ creative elements — not just their works’ non-protectable ones (Opp. 24 17). But this is the same argument. Again, Anthropic’s LLMs have not reproduced to the 25 public a given work’s creative elements, nor even one author’s identifiable expressive style 26 (assuming arguendo that these are even copyrightable). Yes, Claude has outputted grammar, 27 composition, and style that the underlying LLM distilled from thousands of works. But if 28 someone were to read all the modern-day classics because of their exceptional expression, 12 Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 13 of 32 1 memorize them, and then emulate a blend of their best writing, would that violate the 2 Copyright Act? Of course not. Copyright does not extend to “method[s] of operation, 3 concept[s], [or] principle[s]” “illustrated[ ] or embodied in [a] work.” 17 U.S.C. § 102(b); see, 4 e.g., Nichols v. Universal Pictures Corp., 45 F.2d 119, 120–22 (2d Cir. 1930) (Judge Learned 5 Hand) (stage properties and storytelling elements); Apple Comput., Inc. v. Microsoft Corp., 35 6 F.3d 1435, 1445 (9th Cir. 1994) (“user-friendly” design principles and elements); Swirsky v. 7 Carey, 376 F.3d 841, 848 (9th Cir. 2004) (music theory principles and chord progressions). 8 Third, Authors next argue that computers nonetheless should not be allowed to do what 9 people do. 10 Authors cite a decision seeming to say as much (Opp. 16–17). But the judge there twice 11 emphasized while discussing “purpose and character” of the use that what was trained was “not 12 generative AI (AI that writes new content itself).” Rather, what was trained — using a Northern District of California United States District Court 13 proprietary system for finding court opinions in response to a given legal topic — was a 14 competing AI tool for finding court opinions in response to a given legal topic. That was not 15 transformative. Thomson Reuters Enter. Centre GmbH v. Ross Intell. Inc., 765 F. Supp. 3d 16 382, 398 (D. Del. 2025) (Judge Stephanos Bibas), appeal docketed, No. 25-8018 (3d Cir. Apr. 17 14, 2025). 18 A better analogue to our facts would be an AI tool trained — using court opinions, and 19 briefs, law review articles, and the like — to receive legal prompts and respond with fresh legal 20 writing. And, on facts much like those, a different court came out the other way. It found fair 21 use. White v. W. Pub. Corp., 29 F. Supp. 3d 396, 400 (S.D.N.Y. 2014) (Judge Jed Rakoff). 22 The latter use stood sufficiently “orthogonal” to anything that any copyright owner 23 rightly could expect to control. See Warhol, 598 U.S. at 538–40. It could thus be freed up for 24 the copyist to use, “promot[ing] the progress of science and the arts, without diminishing the 25 incentive to create.” Id. at 531 (emphasis added); see U.S. CONST. art. I, § 8, cl. 8. 26 In short, the purpose and character of using copyrighted works to train LLMs to generate 27 new text was quintessentially transformative. Like any reader aspiring to be a writer, 28 Anthropic’s LLMs trained upon works not to race ahead and replicate or supplant them — but 13 Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 14 of 32 1 to turn a hard corner and create something different. If this training process reasonably 2 required making copies within the LLM or otherwise, those copies were engaged in a 3 transformative use. 4 The first factor favors fair use for the training copies. 5 B. THE COPIES USED TO BUILD A CENTRAL LIBRARY. 6 But that is not the only use at issue. Recall that Anthropic purchased millions of print 7 books for its central library and pirated millions of digital books for its central library, too. It 8 used specific sets and subsets of books for training specific LLMs. And, it then retained all the 9 copies in its central library for other uses that might arise even after deciding it would not use 10 them to train any LLM (at all or ever again). Anthropic seems to believe that because some of 11 the works it copied were sometimes used in training LLMs, Anthropic was entitled to take for 12 free all the works in the world and keep them forever with no further accounting. There is no Northern District of California United States District Court 13 carveout, however, from the Copyright Act for AI companies. 14 Because the legal issues differ between the library copies Anthropic purchased and 15 pirated, this order takes them in turn. 16 (i) The Purchased Library Copies Converted from Print to Digital. 17 Anthropic purchased millions of print copies to “build a research library” (Opp. Exh. 22 18 at 145, 148). It destroyed each print copy while replacing it with a digital copy for use in its 19 library (not for sharing nor sale outside the company). As to these copies, Authors do not 20 complain that Anthropic failed to pay to acquire a library copy. Authors only complain that 21 Anthropic changed each copy’s format from print to digital (see Opp. 15, 25 & n.14). On the 22 facts here, that format change itself added no new copies, eased storage and enabled 23 searchability, and was not done for purposes trenching upon the copyright owner’s rightful 24 interests — it was transformative. 25 Anthropic purchased its print copies fair and square. With each purchase came 26 entitlement for Anthropic to “dispose[ ]” each copy as it saw fit. 17 U.S.C. § 109(a). So, 27 Anthropic was entitled to keep the copies in its central library for all the ordinary uses. Yes, 28 14 Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 15 of 32 1 Anthropic changed the format of these library copies from print to digital — giving rise to the 2 issue here. 3 All agree on the facts of the format change. Anthropic “destructively scan[ned]” the 4 print copies to create the digital ones. Anthropic or its vendors stripped the bindings from the 5 print books, cut the pages to workable dimensions, and scanned those pages — discarding each 6 print copy while creating a digital one in its place. The digital copy was then housed in the 7 “research library” or “generalized data area” in place of the print copy (Opp. Exh. 22 at 145– 8 46, 193–94). Authors do not allege and our record does not show that Anthropic provided its 9 converted digital copies of print books to anyone outside Anthropic. 10 The parties disagree about the legal consequences of the format change. Was scanning 11 the print copies to create digital replacements transformative? Anthropic argues it was because 12 it was reasonably necessary to training LLMs. Authors argue it was a distinguishable step Northern District of California United States District Court 13 requiring independent justification. 14 Here, for reasons narrower than Anthropic offers, the mere format change was a fair use. 15 Storage and searchability are not creative properties of the copyrighted work itself but 16 physical properties of the frame around the work or informational properties about the work. 17 See Texaco, 802 F. Supp. at 14 (physical), aff’d, 60 F.3d at 919; Google, 804 F.3d at 225 18 (informational); Sony Corp. of Am. v. Universal City Studios, Inc. (“Sony Betamax”), 464 U.S. 19 417, 447 (1984) (rightful interests). In Texaco, the court reasoned that if a purchased scientific 20 journal article had been copied “onto microfilm to conserve space, this might [have been] a 21 persuasive transformative use.” 802 F. Supp. at 14 (Judge Pierre Leval), aff’d, 60 F.3d at 919 22 (reducing “bulk[ ]” “might suffice to tilt the first fair use factor in favor of Texaco if these 23 purposes were dominant“). In Google Books, the court reasoned that a print-to-digital change 24 to expose information about the work was transformative. Google, 804 F.3d at 225 (Judge 25 Pierre Leval). And, in Sony Betamax, the Supreme Court held that making a recording of a 26 television show in order to instead watch it at a later time was copying but did not usurp any 27 rightful interest of the copyright owner. 464 U.S. at 447, 455. Important to the Supreme 28 Court’s reasoning was the expectation that most such copiers would not distribute the 15 Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 16 of 32 1 permanent copies of the work. Finally, in A&M Records, Inc. v. Napster, Inc., our court of 2 appeals recognized the reasoning just explained, and therefore rejected by contrast a 3 digitization effort that was touted as space-shifting but in fact resulted in the multiplication of 4 copies shared with outsiders through a file-sharing service. 239 F.3d 1004, 1019 (9th Cir. 5 2001), aff’g in this part 114 F. Supp. 2d 896, 912–13, 915–16 (N.D. Cal. 2000) (Judge Marilyn 6 Hall Patel) (citing Sony Betamax and Texaco). 7 Here, every purchased print copy was copied in order to save storage space and to enable 8 searchability as a digital copy. The print original was destroyed. One replaced the other. And, 9 there is no evidence that the new, digital copy was shown, shared, or sold outside the company. 10 This use was even more clearly transformative than those in Texaco, Google, and Sony 11 Betamax (where the number of copies went up by at least one), and, of course, more 12 transformative than those uses rejected in Napster (where the number went up by “millions” of Northern District of California United States District Court 13 copies shared for free with others). 14 Yes, Anthropic is a commercial outfit. And, this order takes for granted that Anthropic in 15 fact benefited from the print-to-digital format change — or it would not have gone to all the 16 trouble. But the crux of the first fair use factor’s concern for “commercial” use is in protecting 17 the copyright owners and their entitlements to exploit their copyright as they see fit (or not). 18 See, e.g., Harper & Row, Publishers, Inc. v. Nation Enters., 471 U.S. 539, 562 (1985). That 19 the accused is a commercial entity is indicative, not dispositive. That the accused stands to 20 benefit is likewise indicative. But what matters most is whether the format change exploits 21 anything the Copyright Act reserves to the copyright owner. Anthropic already had purchased 22 permanent library copies (print ones). It did not create new copies to share or sell outside. 23 Yes, Authors also might have wished to charge Anthropic more for digital than for print 24 copies. And, this order takes for granted that Authors could have succeeded if Anthropic had 25 been barred from the format change. “But the Constitution’s language [in Clause 8] nowhere 26 suggests that [the copyright owner’s] limited exclusive right should include a right to divide 27 markets or a concomitant right to charge different purchasers different prices for the same 28 book, [merely] say to increase or to maximize gain.” See Kirtsaeng v. John Wiley & Sons, 16 Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 17 of 32 1 Inc., 568 U.S. 519, 552 (2013); see also U.S. CONST. art. I., § 8, cl. 8. Nor does the Copyright 2 Act itself. Section 106 sets out exclusive rights that fair uses under Section 107 abridge. 3 Section 106(1) reserves to the copyright owner the right to make reproductions. But on our 4 facts we face the unusual situation where one copy entirely replaced the another. And, 5 Section 106(2) reserves to the copyright owner the right to make derivative works that add or 6 subtract creative material — as occurs in a “translation, musical arrangement, dramatization, 7 fictionalization, motion picture version, sound recording, art reproduction, abridgment, [or] 8 condensation” of a book, 17 U.S.C. § 101 (definitions). For some “other modification[ ]” of a 9 book to constitute a “derivative work,” it must itself “represent an original work of 10 authorship.” Ibid. But on our facts the format was changed but no content was added or 11 subtracted. See Mirage Editions, Inc. v. Albuquerque A.R.T. Co., 856 F.2d 1341, 1342, 1343– 12 44 (9th Cir. 1988) (yes where elements added to create new decorative ceramic).4 Northern District of California United States District Court 13 Section 106(3) further reserves to the copyright owner the right to distribute copies. But again, 14 the replacement copy here was kept in the central library, not distributed. Cf. Fox News 15 Network, LLC v. TVEyes, Inc., 883 F.3d 169, 176–78 (2d Cir. 2018) (enabling searching for 16 “information about the material” can be transformative use, even if some distribution results); 17 Lewis Galoob Toys, Inc. v. Nintendo of Am., Inc., 964 F.2d 965, 968, 971 (9th Cir. 1992) 18 (using nifty converter to “merely enhance[ ]” audiovisual displays emitted from purchased 19 videogame cartridge was fair use of those displays partly because no surplus copies of 20 cartridge or displays were ever created). 21 As a result, Anthropic’s format-change from print library copies to digital library copies 22 was transformative under fair use factor one. Anthropic was entitled to retain a copy of these 23 works in a print format. It retained them instead in a digital format, easing storage and 24 25 4 Even if print-to-digital format change did infringe the right to prepare derivative works, 26 Authors have conceded that “Plaintiffs’ infringement claims are predicated on Anthropic’s unauthorized reproduction (17 U.S.C. § 106(1)); Plaintiffs are not alleging infringement by 27 Anthropic of any right to prepare derivative works (id. at § 106(2))” (Dkt. No. 203 at 2 (citations original)). Whether this concession had consequence for copies tokenized and used for training or 28 “compressed” into the trained LLMs is not reached by this order because Anthropic does not rely on Authors’ concession and those copies were here used transformatively. 17 Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 18 of 32 1 searchability. And, the further copies made therefrom for purposes of training LLMs were 2 themselves transformative for that further reason, as above. 3 To be clear, this print-to-digital conversion involved a different and narrower form of 4 transformative use than the broader one advanced by Anthropic. Anthropic argues that the 5 central library use was part and parcel of the LLM training use and therefore transformative. 6 This order disagrees. However, this order holds that the mere conversion of a print book to a 7 digital file to save space and enable searchability was transformative for that reason alone. 8 Therefore, the digital copy should be treated just as if the purchased print copy had been placed 9 in the central library. 10 In sum, the first fair use factor favors fair use for the digital library copies converted from 11 purchased print library copies — but these do not excuse the pirated library copies. 12 (ii) The Pirated Library Copies. Northern District of California United States District Court 13 Before buying books for its central library, Anthropic downloaded over seven million 14 pirated copies of books, paid nothing, and kept these pirated copies in its library even after 15 deciding it would not use them to train its AI (at all or ever again). Authors argue Anthropic 16 should have paid for these pirated library copies (e.g., Tr. 24–25, 65; Opp. 7, 12–13). This 17 order agrees. 18 The basic problem here was well-stated by Anthropic at oral argument: “You can’t just 19 bless yourself by saying I have a research purpose and, therefore, go and take any textbook you 20 want. That would destroy the academic publishing market if that were the case” (Tr. 53). Of 21 course, the person who purchases the textbook owes no further accounting for keeping the 22 copy. But the person who copies the textbook from a pirate site has infringed already, full 23 stop. This order further rejects Anthropic’s assumption that the use of the copies for a central 24 library can be excused as fair use merely because some will eventually be used to train LLMs. 25 This order doubts that any accused infringer could ever meet its burden of explaining 26 why downloading source copies from pirate sites that it could have purchased or otherwise 27 accessed lawfully was itself reasonably necessary to any subsequent fair use. There is no 28 decision holding or requiring that pirating a book that could have been bought at a bookstore 18 Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 19 of 32 1 was reasonably necessary to writing a book review, conducting research on facts in the book, 2 or creating an LLM. Such piracy of otherwise available copies is inherently, irredeemably 3 infringing even if the pirated copies are immediately used for the transformative use and 4 immediately discarded. 5 But this order need not decide this case on that rule. Anthropic did not use these copies 6 only for training its LLM. Indeed, it retained pirated copies even after deciding it would not 7 use them or copies from them for training its LLMs ever again. They were acquired and 8 retained, as a central library of all the books in the world. 9 Building a central library of works to be available for any number of further uses was 10 itself the use for which Anthropic acquired these copies. One further use was making further 11 copies for training LLMs. But not every book Anthropic pirated was used to train LLMs. 12 And, every pirated library copy was retained even if it was determined it would not be so used. Northern District of California United States District Court 13 Pirating copies to build a research library without paying for it, and to retain copies should they 14 prove useful for one thing or another, was its own use — and not a transformative one (see 15 Tr. 24–25, 35, 65; Opp. 4–10, 12 n.6; CC Br. Exh. 12 at -0144509 (“everything forever”)). 16 Napster, 239 F.3d at 1015; BMG Music v. Gonzalez, 430 F.3d 888, 890 (7th Cir. 2005). 17 Anthropic’s briefing contains other reasons why it believes its pirated library copies are 18 irrelevant to our fair use analysis, notwithstanding its own statements at our oral argument. 19 First, Anthropic accepts in this posture that it acted in bad faith but argues that its bad 20 faith in pirating copies cannot “somehow short-circuit[ ]” the fair use analysis (Reply 6 21 (downplaying Atari Games Corp. v. Nintendo of Am., Inc., 975 F.2d 832, 843 (Fed. Cir. 1992) 22 (applying law of Ninth Circuit))). But its bad faith is not the basis for this decision. Each use 23 of a work must be analyzed objectively. Warhol, 598 U.S. at 544–45. The objective analysis 24 here shows the initial copies were pirated to create a central, general-purpose library, as a 25 substitute for paid copies to do the same thing. (Of course, if infringement is found, bad faith 26 would matter for determining willfulness. 17 U.S.C. § 504(c)(2).) 27 Second, Anthropic argues that its goal to put the copies eventually “to a highly 28 transformative use” requires that each copy and use along the way be justified as having a 19 Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 20 of 32 1 transformative use, too (Reply 14). But now Anthropic seeks to take the shortcut Anthropic 2 just said cannot be taken. Again, the Supreme Court tasks us with looking past the “subjective 3 intent of the user” to the objective use made of each copy. Warhol, 598 U.S. at 544–45 4 (emphasis added). Put another way, what a copyist says or thinks or feels matters only to the 5 extent it shows what a copyist in fact does with the work. Indeed, the same copy can be used 6 one way, then another, each with a different result. Id. at 533. Here, what Anthropic said 7 about its acquisitions at the time — that they were made to “build[ ] a research library” while 8 avoiding a “huge legal/practice/business slog” — are relevant in this regard. And, Anthropic’s 9 actual use of these pirated copies was to create its central library of texts that, like any 10 university or corporate library, stored the works’ well-organized facts, analyses, and expressive 11 examples for various contingent uses, one being training.5 12 Third, Anthropic argues that Texaco — the case involving copies used in a central Northern District of California United States District Court 13 library, copies used in desk libraries, and copies used in the laboratory — is inapposite. 14 Anthropic argues that the disputed copies in Texaco were never used in the laboratory but 15 instead in personal desk libraries for a use “identical to the original purpose and use” of the 16 central library copies, and so not for a transformative use (Reply 8 (summarizing 60 F.3d at 17 922–23)). By contrast, says Anthropic, here it did use copies in the laboratory to train 18 LLMs — a very transformative use. But this is a fast glide over thin ice. Like Texaco, 19 Anthropic possessed copies it did not put into use in the laboratory and it kept those copies in a 20 central library even after its transformative use had been completed. But, unlike Texaco, 21 which bought those copies, Anthropic never paid for the central library copies stolen off the 22 23 24 5 Our court of appeals has not yet reappraised how bad faith (or good faith) figures in fair use 25 after Warhol. Its prior appraisal applied the Supreme Court’s statement that “[f]air use presupposes good faith and fair dealing,” Harper & Row, 471 U.S. at 562 (cleaned up). See 26 Perfect 10, 508 F.3d at1164 n.8. Since then, the Supreme Court has renewed its “skepticism about whether bad faith has any role.” Oracle, 593 U.S. at 32–33 (reiterating doubts of Campbell, 510 27 U.S. at 585 n.18). And, recently, the Supreme Court has held squarely that it is not the “subjective intent” of a copyist that counts, but the “objective . . . use” of the copy. Warhol, 598 U.S. at 544– 28 45. This order applies this most recent analysis. Miller v. Gammie, 335 F.3d 889, 900 (9th Cir. 2003) (en banc). 20 Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 21 of 32 1 internet. Texaco also shows why Anthropic is wrong to suppose that so long as you create an 2 exciting end product, every “back-end step, invisible to the public,” is excused (Br. 10). 3 Notably, this is not a case where source copies were unavailable for separate purchase or 4 loan. See, e.g., NXIVM Corp. v. Ross Inst., 364 F.3d 471, 475–76, 478–79 (2d Cir. 2004) 5 (using selections of training manual — otherwise available only to cult’s trainees subject to 6 NDAs — to expose cult in critical review); Time Inc. v. Bernard Geis Assocs., 293 F. Supp. 7 130, 135–36, 138, 146 (S.D.N.Y. 1968) (Judge Inzer Bass Wyatt) (making charcoal drawings 8 of photographs taken of originals otherwise not on sale or loan out to illustrate a history 9 book).6 Nor were the copies made only incidentally and necessarily from pirated copies. See, 10 e.g., Perfect 10, 508 F.3d at 1164 n.8 (copies of images that had been pirated by third-party 11 websites were used to index those same websites while indexing the entire web). Here, piracy 12 was the point: To build a central library that one could have paid for, just as Anthropic later Northern District of California United States District Court 13 did, but without paying for it. 14 Nor were the initial copies made immediately transformed into a significantly altered 15 form. In Perfect 10, images were copied by the search engine in thumbnail form only and 16 deployed immediately into the transformative use of identifying the full-sized images and the 17 pages from which they came. 508 F.3d at 1160, 1165, 1167. And, in Kelly v. Arriba Software 18 Corp., images were copied at full size and then into thumbnails for immediate use in building a 19 search engine, after which the full-sized copies were immediately deleted. 336 F.3d 811, 815 20 (9th Cir. 2003). Not here. The full-text copies of books were downloaded and maintained 21 “forever.” 22 Nor does the initial copying here even resemble the full-text copying in the Google Books 23 cases. There, libraries of authorized copies already had been assembled, and all copies 24 6 25 Anthropic repeats the misleading characterization of the copyright holder in Oracle that the initial copies were there purloined (Reply 5). Not so. “All agree[d] that Google was and 26 remain[ed] free to use the Java language itself. All agree[d] that Google’s virtual machine [wa]s free of any copyright issues. All agree[d] that the six-thousand-plus method implementations by 27 Google [we]re free of copyright issues. The copyright issue, rather,” was the use of Java for purposes of creating competing software having the same familiar, functional schema. Oracle 28 Am., Inc. v. Google Inc., 872 F. Supp. 2d 974, 978 (N.D. Cal. 2012), aff’d and rev’d in part, 750 F.3d 1339 (Fed. Cir. 2014). 21 Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 22 of 32 1 therefrom were made for direct employment in a one-to-one further fair use — whether the 2 transformative use of pointing to the works themselves, the use of providing the works in 3 formats for print-disabled patrons, or the use of insuring against going out of print, getting lost, 4 and becoming otherwise unavailable. HathiTrust, 755 F.3d at 97, 101, 103; Google, 804 F.3d 5 at 206, 216–18, 228 (further distinguishing search and snippet uses, which “test[ed] the 6 boundaries of fair use”). Not so here concerning the pirated copies. No authorized copies 7 existed from which Anthropic made its first copies. No full-text copy therefrom was put 8 immediately into use training LLMs. Not every copy was even necessary nor used for training 9 LLMs. No initial copy was ever deleted, even if never used or no longer used.7 The university 10 libraries and Google went to exceedingly great lengths to ensure that all copies were secured 11 against unauthorized uses — both through technical measures and through legal agreements 12 among all participants. Not so here. The library copies lacked internal controls limiting access Northern District of California United States District Court 13 and use. 14 Nor do the decisions on intermediate copying require anything less than the analysis 15 applied here. Anthropic argues that our court of appeals in Sega Enterprises Ltd. v. Accolade, 16 Inc. looked only at the “ultimate use” and “did not analyze a series of atomized acts of 17 ‘infringement’ distinct from that overall purpose” (Reply 3). To the contrary, the appeals court 18 examined the initial, intermediate, and ultimate copies used by the copyist. The court 19 explained that the copyist initially purchased commercially available copies of game 20 cartridges and then made further copies necessarily and “solely in order to discover the 21 functional requirements for compatibility.” 977 F.2d 1510, 1522 (9th Cir. 1992). Thus, it 22 reached only one result because on those facts there was only one “overall purpose” for the 23 unauthorized copies. Indeed, the court reaffirmed prior caselaw holding that “intermediate 24 25 7 Training LLMs was not a use where perpetually maintaining a library copy was intrinsic to the 26 proffered fair use (e.g., for a plagiarism-checker service). Nor is this an instance where retaining at least one copy was authorized by contract with the copyright owners (e.g., by agreement to 27 express terms upon submission to a plagiarism-checker service, notwithstanding proposed terms scrawled on a paper prior to submission). A.V. ex rel. Vanderhye v. iParadigms, LLC, 562 F.3d 28 630, 635–36 & n.5, 645 n.8 (4th Cir. 2009), aff’g in relevant parts 544 F. Supp. 2d 473, 480 (E.D. Va. 2008) (Judge Claude Hilton). Anthropic mischaracterizes this case. 22 Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 23 of 32 1 copying of [a work] may infringe the exclusive rights granted to the copyright owner in 2 [S]ection 106 of the Copyright Act regardless of whether the end product of the copying also 3 infringes those rights.” Id. at 1518–19 (reaffirming Walker v. Univ. Books, 602 F.2d 859, 864 4 (9th Cir. 1979)). 5 Similarly, in Sony Computer Entertainment, Inc. v. Connectix Corp., our appeals court 6 applied the same law to similarly focused conduct. Another copyist allegedly had purchased 7 an authorized copy and then made further copies solely and necessarily to reverse-engineer 8 compatibility requirements. 203 F.3d 596, 601, 602–03 (9th Cir. 2000). 9 Both Sega and Sony avoided imposing an “artificial hurdle” to fair use by generously 10 construing the intermediate copying necessary to the fair use. As one example, Sega stated 11 that an engineer should be permitted to reboot her computer while undertaking to reverse- 12 engineer software loaded onto it — even if doing so creates another digital copy of the Northern District of California United States District Court 13 software and is not strictly necessary to reverse-engineering. Id. at 605. But neither Sega nor 14 Sony fathomed gifting an “artificial head start” to a fair user, either, by treating even the initial 15 copy as an intermediate one. 16 And, yes, some courts have “not inquire[d]” into intermediate or initial copying at all 17 (Reply 2 (citing Campbell as not inquiring into surplus copies in the studio)). But if a “close 18 reading of those cases [ ] reveals that in none of them was the legality of the [initial or] 19 intermediate copying at issue,” then it was not raised and not necessarily decided. Sega, 977 20 F.2d at 1519; see Webster v. Fall, 266 U.S. 507, 511 (1925). It was expressly decided 21 elsewhere: Our analysis must attend to different uses of different copies, and even to different 22 uses of the same copies. Warhol, 598 U.S. at 533. 23 Finally, Anthropic argues that even if the initial copies served a different use than the 24 intermediate and ultimate copies, it was not a use for which Anthropic necessarily would have 25 needed to pay Authors for a copy. In theory, argues Anthropic, it could have done as Google 26 did in Google Books — find an existing reference library willing to loan its copies for free as 27 source copies. Or, in theory, it could have done as Anthropic did later — go buy used copies 28 without having to pay Authors at all. See 17 U.S.C. § 109(a). But Anthropic did not do those 23 Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 24 of 32 1 things — instead it stole the works for its central library by downloading them from pirated 2 libraries. 3 In sum, the first factor points against fair use for the central library copies made from 4 pirated sources — and no damages from pirating copies could be undone by later paying for 5 copies of the same works. 6 2. THE NATURE OF THE COPYRIGHTED WORK. 7 The second fair use factor is “the nature of the copyrighted work.” 17 U.S.C. § 107(2). 8 This factor “calls for recognition that some works are closer to the core of intended copyright 9 protection than others, with the consequence that fair use is more difficult to establish when the 10 former works are copied.” Campbell, 510 U.S. at 586. For one thing, less protection is due 11 published works than unpublished ones. For another, less protection is due “factual works than 12 works of fiction or fantasy.” Harper & Row, 471 U.S. at 563. But less protection is not no Northern District of California United States District Court 13 protection. Even the arrangement of otherwise unprotectable facts surpasses the low bar for a 14 protectable original work of authorship. Google, 804 F.3d at 220. 15 Here, Anthropic accepts that all of Authors’ books — all published, whether non-fiction 16 or fiction — contained expressive elements (Reply 9). And, as set out above, this order 17 accepts Authors’ view of the evidence that their works were chosen for their expressive 18 qualities in building a central library and then in training specific LLMs (Opp. 11, 17 (citing, 19 e.g., Opp. Exh. 3 at -03433)). 20 The main function of the second factor is to help assess the other factors: to reveal 21 differences between the nature of the works at issue and the nature of their secondary use 22 (above), and to reveal any relation between the amount and substantiality of each work taken 23 and the secondary use (next). E.g., Campbell, 510 U.S. at 586; Kelly, 336 F.3d at 820; Google, 24 804 F.3d at 220; HathiTrust, 755 F.3d at 98; Bill Graham Archives v. Dorling Kindersley Ltd., 25 448 F.3d 605, 612–13 (2d Cir. 2006). 26 The second factor points against fair use for all copies alike. 27 28 24 Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 25 of 32 1 3. THE AMOUNT AND SUBSTANTIALITY OF THE PORTION USED. 2 The third fair use factor is “the amount and substantiality of the portion” of the 3 copyrighted work used by the accused. 17 U.S.C. § 107(3). The crux of this factor is whether 4 the amount was “reasonable in relation to the purpose of the copying.” Campbell, 510 U.S. at 5 586. Thus, the amount of copying is considered first against the work itself, then more 6 importantly against the proposed transformative purpose. See Warhol, 598 U.S. at 543 & n.18. 7 A. THE COPIES USED TO TRAIN SPECIFIC LLMS. 8 Copies selected for inclusion in training sets were selected because they were complete 9 and because they contained rich protectible expression, or so this order accepts the record 10 shows for Authors. Was all this copying reasonably necessary to the transformative use? 11 Yes. 12 “What matters [ ] is not so much ‘the amount and substantiality of the portion used’ in Northern District of California United States District Court 13 making a copy, but rather the amount and substantiality of what is thereby made accessible to a 14 public [in the purported secondary use] for which it may serve as a competing substitute [for 15 the primary use].” Google, 804 F.3d at 222. Here, once again, there is no allegation of any 16 traceable connection between the Claude service’s outputs and Authors’ works. The copying 17 used to train the LLMs underlying Claude was thus especially reasonable. 18 In response, Authors object primarily that the copying used in training was both 19 extremely extensive and not strictly necessary. 20 As to extensive copying, it is true that entire works were copied. And, “copying [ ] entire 21 work[s] ‘militate[s] against a finding of fair use.’” Worldwide Church of God v. Philadelphia 22 Church of God, Inc., 227 F.3d 1110, 1118 (9th Cir. 2000) (quoting Hustler Mag. Inc. v. Moral 23 Majority Inc., 796 F.2d 1148, 1155 (9th Cir. 1986)); see Campbell, 510 U.S. at 587. But we 24 just addressed why Authors’ argument is misdirected. The copies that count for this factor are 25 those that would merely serve the same use as the work’s ordinary one. Authors do not allege 26 such copying. The accused use here of the incremental copies is as orthogonal as can be 27 imagined to the ordinary use of a book. 28 25 Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 26 of 32 1 As to strict necessity, Authors make a stronger point. When a productive use is made 2 possible only by borrowing from a specific work, fair use climbs towards its zenith. When a 3 productive use is possible without that borrowing, fair use falls to its nadir — and the 4 borrowing deserves a particularly compelling justification. See Warhol, 598 U.S. at 543 & 5 n.18, 547. Here, it is true that Anthropic could have used some other books or no books at all 6 for training its LLMs — or so this order accepts the record shows for Authors. But Anthropic 7 has presented a compelling explanation for why it was reasonably necessary to use them 8 anyway. 9 For one thing, all agree Anthropic needed billions of words to train any given LLM. If 10 using only books, Anthropic would have needed millions of books per model. If using a set 11 comprising only a small fraction of books and a larger fraction of other texts, Anthropic still 12 would have needed hundreds of thousands of books. Authors contend that because Anthropic Northern District of California United States District Court 13 showed it could use such smaller sets of books, it surely could have used no books at all — or 14 at least not their books (Opp. 23). But Authors forget that “reasonably necessary” does not 15 mean “strictly necessary.” Authors do not contest that the volume of text required to train an 16 LLM is monumental. Because using so many works was reasonably necessary, using any one 17 work for actually training LLMs was about as reasonable as the next. 18 For another thing, no output to the public was even alleged to be infringing. So, yes, 19 Authors’ works were chosen as the strongest examples of writing. But the compelling benefits 20 of training the LLMs on strong examples were not offset by revelations to the public of any 21 portion of the works themselves. What was copied was therefore especially reasonable and 22 compelling. 23 The third factor thus favors fair use for the training copies. 24 B. THE COPIES USED TO BUILD A CENTRAL LIBRARY. 25 But again, there was a separate use — a distinction that makes some difference as to 26 whether the amount and substantiality of the copying was “reasonable in relation to the 27 purpose of the copying” for the library copies. Campbell, 510 U.S. at 586. 28 26 Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 27 of 32 1 (i) The Purchased Library Copies Converted from Print to Digital. 2 For the print library copies that Anthropic purchased and then converted into digital 3 library copies, Anthropic already enjoyed entitlement to keep the copies in its library. The 4 purpose of the copying was to keep them in its library but with more favorable storage and 5 searchability properties. Copying the entire work was exactly what this purpose required. 6 There was no surplus copying. The source copy was destroyed. 7 The third fair use factor favors fair use for the purchased library copies converted from 8 print to digital. 9 (ii) The Pirated Library Copies. 10 For the pirated library copies, however, Anthropic lacked any entitlement to hold copies 11 of the books at all. Its purpose, it says, was to train LLMs. But its objective conduct was to 12 seek “all the books in the world” and then retain them even after deciding it would not make Northern District of California United States District Court 13 further copies from them for training — indicating there were other further uses. Against the 14 purpose of acquiring all the books one could on the chance some might prove useful for 15 training LLMs and maybe other stuff too, almost any unauthorized copying would have been 16 too much. Anthropic copied millions of books in toto, Authors’ works among them. 17 The third factor points against fair use for the pirated library copies. 18 4. THE EFFECT OF THE USE UPON THE MARKET FOR OR VALUE OF THE 19 COPYRIGHTED WORK. 20 The final factor is “the effect of the use upon the potential market for or value of the 21 copyrighted work.” 17 U.S.C. § 107(4). This factor points against fair use when a copyist 22 makes copies available that displace demand for copies the copyright owner already makes 23 available or readily could. Texaco, 60 F.3d at 926–28 (reproduced copies); Dr. Seuss Enters., 24 L.P. v. ComicMix LLC, 983 F.3d 443, 461 (9th Cir. 2020) (derivative copies). “While the first 25 factor considers whether and to what extent an original work and secondary use [in principle 26 could] have substitutable purposes, the fourth factor focuses on actual or potential market 27 substitution.” Warhol, 598 U.S. at 536 n.12 (emphasis added). 28 27 Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 28 of 32 1 A. THE COPIES USED TO TRAIN SPECIFIC LLMS. 2 The copies used to train specific LLMs did not and will not displace demand for copies 3 of Authors’ works, or not in the way that counts under the Copyright Act. 4 Again, Authors concede that training LLMs did not result in any exact copies nor even 5 infringing knockoffs of their works being provided to the public. If that were not so, this 6 would be a different case. Authors remain free to bring that case in the future should such 7 facts develop. 8 Instead, Authors contend generically that training LLMs will result in an explosion of 9 works competing with their works — such as by creating alternative summaries of factual 10 events, alternative examples of compelling writing about fictional events, and so on. This 11 order assumes that is so (Opp. 22–23 (citing, e.g., Opp. Exh. 38)). But Authors’ complaint is 12 no different than it would be if they complained that training schoolchildren to write well Northern District of California United States District Court 13 would result in an explosion of competing works. This is not the kind of competitive or 14 creative displacement that concerns the Copyright Act. The Act seeks to advance original 15 works of authorship, not to protect authors against competition. Sega, 977 F.2d at 1523–24. 16 Authors next contend that training LLMs displaced (or will) an emerging market for 17 licensing their works for the narrow purpose of training LLMs (Opp. 21–22). Anthropic 18 argues that transactional costs would exceed Anthropic’s expected benefit from any such 19 bargain, prompting it to cease dealing with any rightsholders or else to cease developing such 20 technology altogether (Br. 22–23). Our record could support either account — so this order 21 must assume Authors are correct. A market could develop (Opp. 19–21 (citing record)). Even 22 so, such a market for that use is not one the Copyright Act entitles Authors to exploit. 23 None of the cases cited by Authors requires a different result. All contemplated losses of 24 something the Copyright Act properly protected — not the kinds of fair uses for which a 25 copyright owner cannot rightly expect to control. See TVEyes, Inc., 883 F.3d at 181 (use of a 26 right legally reserved to and factually already being licensed by copyright owner); Texaco, 60 27 F.3d 931 (same); Ringgold v. BET, Inc., 126 F.3d 70, 80–81 (2d Cir. 1997) (use of a right 28 legally reserved to and factually likely to be marketable by copyright owner — displaying 28 Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 29 of 32 1 images of her artistic work in television shows); cf. Seltzer v. Green Day, Inc., 725 F.3d 1170, 2 1179 (9th Cir. 2013) (no evidence use could be or “was likely to” be marketable). 3 The fourth factor thus favors fair use for the training copies. 4 B. THE COPIES USED TO BUILD A CENTRAL LIBRARY. 5 (i) The Purchased Library Copies Converted from Print to Digital. 6 For these copies, this order assumes Anthropic’s format change from print to digital 7 displaced purchases of new digital copies that Anthropic would have made directly from 8 Authors (had it not been able to purchase print copies in used condition). But for reasons 9 stated under the first factor, such losses did not relate to something the Copyright Act reserves 10 for Authors to exploit. It was a format change. 11 Authors’ next argument, it seems, is that the format change nonetheless exposed it to 12 usurpation of the opportunity to sell rightful copies because Anthropic might transmit Northern District of California United States District Court 13 additional unauthorized digital copies more readily than it could have transmitted additional 14 unauthorized print copies — and that the same would be true for all format converters (cf. Opp. 15 25 n.14; Opp. Expert Malackowski ¶ 52). But after much discovery, there is no inkling in our 16 record of intent to redistribute library copies once acquired nor of inability to secure that 17 valuable library against outside actors. And, if the internal, central library copies did or do in 18 fact lead to further reproduction or distribution, those further copies remain redressable 19 separately by Authors. The format change did not itself usurp the Authors’ rightful 20 entitlements. 21 This factor is thus neutral for the purchased library copies converted from print to digital. 22 (ii) The Pirated Library Copies. 23 The copies used to build a central library and that were obtained from pirated sources 24 plainly displaced demand for Authors’ books — copy for copy. Not every person who merely 25 intends to make a fair use of a work is thereby entitled to a full copy in the meantime, nor even 26 to steal a copy so that achieving this fair use is especially simple or cost-effective. Here, the 27 copies employed in training LLMs were one thing, but the copies acquired to assemble a 28 29 Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 30 of 32 1 convenient, general-purpose library of works for various uses for which the company might 2 have of them, if any, was a different use altogether. 3 Anthropic has almost no rebuttal on these points. First, Anthropic argues that “Claude’s 4 services do not reduce [or usurp] the value of Plaintiffs’ works through substitution in their 5 traditional markets” (see Br. Expert Peterson ¶ 33). But stealing pirated copies of Authors’ 6 works plainly did. Second, Anthropic argues that it may have been able to purchase some 7 books on the open market (and some other texts), but not other texts it copied (cf. id. ¶ 48 (re 8 licensing)). But this case does not concern those other texts it could not have purchased. It 9 could have purchased Authors’ books (and many others). In fact it later did. Finally, 10 Anthropic argues that the effect on these texts from one book foregone was too small to be 11 considered (see id. ¶ 77). But the test requires that we contemplate the likely result were the 12 conduct to be condoned as a fair use — namely to steal a work you could otherwise buy (a Northern District of California United States District Court 13 book, millions of books) so long as you at least loosely intend to make further copies for a 14 purportedly transformative use (writing a book review with excerpts, training LLMs, etc.), 15 without any accountability. As Anthropic itself suggested, “That would destroy the [entire] 16 publishing market if that were the case” (see Tr. 53; see also Tr. 32, 41; Opp. Expert 17 Malackowski ¶¶ 31–34, 38). 18 The fourth factor points against fair use for the pirated library copies. 19 5. OVERALL ANALYSIS. 20 After the four factors and any others deemed relevant are “explored, [ ] the results [are] 21 weighed together, in light of the purposes of copyright.” Campbell, 510 U.S. at 578. 22 The copies used to train specific LLMs were justified as a fair use. Every factor but the 23 nature of the copyrighted work favors this result. The technology at issue was among the most 24 transformative many of us will see in our lifetimes. 25 The copies used to convert purchased print library copies into digital library copies were 26 justified, too, though for a different fair use. The first factor strongly favors this result, and the 27 third favors it, too. The fourth is neutral. Only the second slightly disfavors it. On balance, as 28 30 Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 31 of 32 1 the purchased print copy was destroyed and its digital replacement not redistributed, this was a 2 fair use. 3 The downloaded pirated copies used to build a central library were not justified by a fair 4 use. Every factor points against fair use. Anthropic employees said copies of works (pirated 5 ones, too) would be retained “forever” for “general purpose” even after Anthropic determined 6 they would never be used for training LLMs. A separate justification was required for each 7 use. None is even offered here except for Anthropic’s pocketbook and convenience. 8 And, as for any copies made from central library copies but not used for training, this 9 order does not grant summary judgment for Anthropic. On this record in this posture, the 10 central library copies were retained even when no longer serving as sources for training copies, 11 “hundreds of engineers” could access them to make copies for other uses, and engineers did 12 make other copies. Anthropic has dodged discovery on these points (e.g., Opp. Exh. 17 at 93– Northern District of California United States District Court 13 94 (retained); Opp. Exh. 22 at 196 (no limits); Opp. Exh. 30 at 3, 4 (no accounting); see also 14 Opp. 15). We cannot determine the right answer concerning such copies because the record is 15 too poorly developed as to them. Anthropic is not entitled to an order blessing all copying 16 “that Anthropic has ever made after obtaining the data,” to use its words (Opp. Exh. 30 at 3, 4). 17 CONCLUSION 18 With respect to the training copies and the print-to-digital converted copies, this order has 19 drawn all ambiguities and inferences in favor of the opposing side, namely Authors. With 20 respect to the pirated copies, this order has also accepted the Authors’ version of the facts. 21 Authors did not move for summary judgment but if they had, then we would have been 22 obligated to accept all reasonable views given the evidence in defendant’s favor instead. 23 This order grants summary judgment for Anthropic that the training use was a fair use. 24 And, it grants that the print-to-digital format change was a fair use for a different reason. But it 25 denies summary judgment for Anthropic that the pirated library copies must be treated as 26 training copies. 27 We will have a trial on the pirated copies used to create Anthropic’s central library and 28 the resulting damages, actual or statutory (including for willfulness). That Anthropic later 31 Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 32 of 32 1 bought a copy of a book it earlier stole off the internet will not absolve it of liability for the 2 theft but it may affect the extent of statutory damages. Nothing is foreclosed as to any other 3 copies flowing from library copies for uses other than for training LLMs. 4 IT IS SO ORDERED. 5 6 Dated: June 23, 2025. 7 8 WILLIAM ALSUP 9 UNITED STATES DISTRICT JUDGE 10 11 12 Northern District of California United States District Court 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 32
Read Entire Article