Google Books, Fair Uses, and “Copyright” as Misnomer

March 24th, 2011 · 19 Comments

Tim Lee has a great analysis at Ars Technica of this week’s ruling invalidating the controversial Google Books settlement. Tim, like the court, focuses on aspects of the agreement that seem to give Google a unique advantage in the online book market—and hopes that instead Google will now simply defend its copying of books for indexing purposes as a fair use. Somewhat to my surprise, Siva Vaidyhanathan seems hostile to this approach in his writeup at Slate:

Back in 2004 Google shocked the publishing world by announcing that it had been for some time secretly scanning books from major university libraries. Some of these libraries were allowing Google to scan in books that were clearly still covered by copyright. Google tried to convince everyone that this was all just fine under U.S. copyright law by asserting that we readers would only get to experience “snippets” of the entire book that sat in Google’s servers.

This, to Google, was an example of “fair use,” a defense used against an accusation of copyright infringement that gives the public a way to deploy portions (or sometimes all) of a copyrighted work for some publicly valuable use. Critics, journalists, teachers, and scholars rely on fair use every day to quote from copyrighted sources. But what Google proposed was wholesale and revolutionary: It would have turned the copyright system on its head and redefined fair use in ways that were never intended.

I fear Siva’s general wariness of Google is overpowering his normally sound instincts on copyright in the digital era. It’s true, of course, that Google must scan the whole book in order to be able to provide users with search results. But that’s not exactly unprecedented—it’s how search engines work. Google’s computers need to make a copy of the text on all those Web pages in order to index them—how else would they know which pages contain text that matches your search query? And, of course, most of that text is copyright-protected by default. The same goes for image indexing, which courts have given a green light so far. I’d suggest that one reason people have a different intuition about Google Books is that Web content is already in digital form—which means everyone who reads it is “making a copy”—whereas Google was making digital copies of analog works. But as a policy matter, it’s not clear why this should make such an important difference.

Suppose I tweet that I’m trying to remember which Borges story has that line about how “mirrors and copulation are abominable, because they increase the number of men.” Some of my diligent friends hurry to their libraries, flip through their Borges collections, and tweet back the answer—along with a few sentences of the surrounding context. Clearly there’s nothing intrinsically objectionable about the search function, and a quotation of a sufficiently limited portion of the whole work in reply would normally be protected by fair use. The problem is just that Google’s search—and indeed, any computer search—technically requires that a copy be made. But to my mind, this just underscores how increasingly maladaptive it is to make “copying” the primary locus of regulation in our system of intellectual property.

Technology even complicates the question of just what constitutes a “copy”—an intriguing issue I explored in a few articles back in my days at Ars Technica. Imagine, for instance, that Google took a different approach to indexing in hopes of avoiding thorny copyright questions. Instead of storing “copies” of each book, suppose they created a huge database called Google Concordance, consisting of an enormous catalog of every word or short phrase someone might want to look up, followed by a long list, like a kind of super-index, specifying the location on every page of every book in which that word or phrase appears. (“Aardvark: Blackwell Guide to the Philosophy of Computing and Information, Page 221, Line 17, word 3…”) Obviously, the Google Concordance would be a very valuable and useful reference text, and nowhere in the database would you find anything resembling a “copy” of any of the cataloged works. But just as obviously, it would contain all the information a clever programmer would need to reconstruct an arbitrary portion of the original text on the fly, assuming the database could be queried fast enough. You can imagine someone creating certain kinds of “derivative works” in a similar way: If you don’t want the RIAA taking down your mashup, you might try to offer it as an algortithm specifying time segments of component tracks to be combined in a particular manner… an algorithm that might produce gibberish or Girl Talk depending on what files you feed it.

In a sense, it’s always the processing algorithm that determines whether a particular binary string is a “copy” of a work or not. Open an MP3 of a Lady Gaga track in a text editor and you’ll get a wholly original work of experimental literature—though not one anybody (except possibly Lady Gaga) is likely to be interested in reading. For that matter, Google’s database is just an enormous collection of ones and zeroes until some program processes it to generate human-readable output. I distinguished my hypothetical Google Concordance database from a collection of copied books, but if you point to a particular file and ask whether it contains the Concordance or copies of the books, there’s a very literal sense in which there just is no fact of the matter until you know what algorithm will be used to render it as alphanumeric text. This may sound like airy metaphysical hairsplitting, but the power of computers to rapidly aggregate and process dispersed information on a global network is likely to create genuine practical complications for a legal framework that takes discrete, physically contiguous chunks called “copies” as its fundamental unit of analysis. Legally speaking, it would seem to make an enormous difference whether books are scanned and stored as books, or as a comprehensive concordance database maintained by Google, or as a series of hundreds or thousands of complementary partial concordances dispersed across many servers (or even individual hard-drives linked by a p2p network). Given sufficient bandwidth and processing speed, it might make no difference at all in practice. Maybe we should take that as a hint to reexamine our categories.

The Constitution doesn’t explicitly use the term “copyright”: It empowers Congress to grant creators certain “exclusive rights” in their works, leaving open just which rights should be made exclusive. And indeed, many of the privileges lumped together under the rubric of “copyright”—such as public performance and the creation of “derivative works”—need not involve making a “copy” in any ordinary sense of the word. But as the word itself suggests, the way our system has traditionally worked is that if you owned a copy of a protected work, you could pretty much do what you wanted with it—read or view it when and where you like, sell it to someone else, chop it into bits and eat it with Hollandaise if that’s what turns you on—as long as you weren’t making a copy. And for a long time, this wasn’t a very restrictive limitation, because making copies of a work was generally an expensive and labor intensive business that ordinary people had neither the capability nor, indeed, any compelling reason to engage in. Pretty much the only people with motive to invest the substantial resources needed to print books or press vinyl were pirates who planned to sell a bootleg edition for profit. So we got a regime where most uses of a protected work were unregulated, but copying was infringing by default—with “fair use” exceptions to cover the legitimate uses that did require some amount of copying. These tended to be more partial and limited forms of copying—like quotation for commentary and criticism—that were open to people without printing presses, and fairly clearly not meant as market substitutes for the complete original work.

But now, as many writers on copyright have observed, just about every use of a work in digital form requires making a copy—because that’s how computers work. When you play a song stored on your hard drive—whether through local speakers or by streaming it to a mobile device—copies are created. We talk about “visiting” Web pages, as though there’s a text out there we’re all “going” to look at—but of course, every time you read something online, you’re making a copy of it. And again, the only reason we’re able to find content online relatively efficiently is because various companies make copies for indexing purposes—for profit, even!—without seeking permission from every person who’s got content online. (I hope it’s clear that it would beg the question to say that posting something online to be read by the public implies consent to copying for indexing purposes.) What was once a rare activity has become a ubiquitous concomitant of all sorts of ordinary uses of content. All these acts of copying are still presumptively regulated, but we assume they’re either tacitly permitted by the rights holder or shoehorned by the courts into one or another of the fair use “exceptions” that have been defined to accommodate the changing technological reality. (Ordinary people had no occasion or capability to “time shift” or “place shift” content until the past few decades.)

Instead of ginning up exceptions to a general prohibition on copying just to permit publicly valuable use of content, maybe we should just admit that “copying” no longer makes sense as a primary locus of intellectual property regulation. Fair use analysis typically employs a four factor test, but the upshot is usually to see how a particular type of copying would affect the market for the original work—which makes sense, given that the purpose of copyright is to give creators a financial incentive to produce and distribute new works. If that’s fundamentally what we care about, though, a default property-like right of control over copying, which now has to be riddled with exceptions to allow almost any ordinary use of content, looks like an increasingly circuitous Rube Goldberg mechanism for achieving that goal. I’m not sure what the alternative would be—or even whether rejiggering the basic categories would alter the the underlying analysis much. But—just off the top of my head—you could imagine a system where the core offense was not “copyright infringement” but some kind of tort of unfair competition with an original work. In many cases it would yield the same practical result, but at least we’d reorient the public discourse around “copyright” to focus on measurable harms to creators’ earnings—and ideally get away from the confused notion that copying without permission is somehow equivalent to “stealing” by default unless it fits some pre-established exception.

Addendum: This is implicit in much of the discussion above, but probably worth spelling out explicitly. Until the advent of consumer computing—and especially the Internet—almost all “processing” or “use” of copyrighted content took place in human brains, because… well, where else? An external “copy” or “derivative work” based on that content would generally appear (if at all) as the end product of whatever operations you performed on the temporary copy stored in your brain—which, mercifully, no government has yet held to be susceptible to takedown notices. Your mental copy of the public library book didn’t count for copyright purposes, and if you copied down a few passages in a notebook (the results of your search query), they’d pretty clearly be fair use in the unlikely event the owner even became aware of them. If you wrote a song with a baseline inspired by “Superstition,” the transformation would happen in your head rather than ProTools, and if it didn’t sound too exactly similar, people would call it “influence” rather than “sampling.” Myriad McLuhans manqué have written about how technology and the cognitive outsourcing it enables are transforming our habits of thought and learning—spurring a shift from the acquisition of memorized information to the acquisition of information search skills.

One way to frame my argument above is in terms of what Larry Lessig calls “translation“: Laws establish a balance between competing interests—intellectual autonomy versus cultural control, say—against a specific technological and legal background context. If you have legal rules that protect privacy mostly by means of property boundaries, and then technology (wiretapping, long-range mics) makes it possible to collect intimate information from within a home without physical intrusion, the balance will shift dramatically even if the formal rule remains exactly the same. In fact, if you want to preserve the previous balance between law enforcement and privacy interests, you may need to adopt an entirely different legal paradigm. (Which, indeed, is what we did—though it took 40 years.) Cognitive outsourcing, like long-distance communications, may change the balance in undesirable ways by shifting formerly unregulated mental tasks into the regulated space of digital copies, just as wiretapping temporarily shifted government searches out of the regulated space of property intrusions. The thing to bear in mind is that the particular regulatory trigger is usually a proxy for some more complicated underlying set of interests. If a technological change means a set of activities that were previously unregulated are now effectively highly regulated (or vice versa), we should think very hard about whether we want to preserve the existing formal architecture or—as I think will usually be the case—the same balance of interests.

Addendum II: As I mention above, I’m scarcely the first person to whom most of this has occurred: Ernest Miller in the comments points to a paper he and ~~Joel~~ Joan Feigenbaum wrote way back in 2001 called “Taking the Copy Out of Copyright,” which I’m looking forward to reading. At first glance, I’m reminded (and curse myself for having forgotten as I wrote this post) that American “copyright” law did not initially mention “copying” as such, but rather focused on publication, printing, and sale.

Tags: Law · Tech and Tech Policy

19 responses so far ↓

1 Larry Downes // Mar 24, 2011 at 4:58 pm

Exceptionally sound and sensible analysis, Julian.

It seems unlikely that Google will go back to litigating the case, but it would be great to get a “clean” restart of fair use (still very much a common law concept even though codified in the Copyright Act) in the digital age from the courts.
2 Adrian Ratnapala // Mar 24, 2011 at 6:08 pm

…the end product of whatever operations you performed on the temporary copy stored in your brain—which, mercifully, no government has yet held to be susceptible to takedown notices.

Tell that to Leon Trotsky!

I’m serious, governments try to regulate this all the time. But republics have developed ways to frustrate their efforts (freedom of expression, religion, association etc.)
3 Adrian Ratnapala // Mar 24, 2011 at 6:09 pm

(I hope it’s clear that it would beg the question to say that posting something online to be read by the public implies consent to copying for indexing purposes.)

Can you make this clearer?
4 Julian Sanchez // Mar 24, 2011 at 6:43 pm

So, it’s fairly clear that if you post something on a generally accessible webpage, you mean to make it available for people to read (and to create the local copy this entails) because… well, why else would you post something? Equally clearly, many people do not intend to authorize copying for ANY purpose (e.g. inclusion in a magazine or compilation sold for profit) when they put a work online. Many sites copy and “mirror” entire blog posts or articles, but we don’t say people “assume the risk” that their work will be copied in this way just because it’s a common practice. As lawsuits over image indexing show, at least some people emphatically do not intend that their work be copied for this purpose. So if you think the law should permit index copying without explicit advance permission, it won’t do to say it’s because people tacitly consent. Rather, you assume tacit consent BECAUSE we think index copying is a reasonable and socially beneficial thing to do, and therefore the onus is on someone who wants to exclude their online work from an index to take affirmative steps to prevent it. Hope that’s clearer…
5 Why Focus On Copying? | Sinting Link // Mar 24, 2011 at 6:55 pm

[…] Books is facing new trouble in the courts, which prompts Julian Sanchez to ponder the purpose of […]
6 Ernest MIller // Mar 24, 2011 at 7:07 pm

See, “Taking the copy put of copyright” http://cs-www.cs.yale.edu/homes/jf/MF.pdf. Others have also been exploring this idea in recent years, though the paper was written in 2001.
7 David Sanger // Mar 25, 2011 at 1:07 am

Also of course the regime under which Google reads and indexes web pages is ‘opt out’ rather than ‘optmin’, so unless they hear otherwise from the webpage owner (via robots.txt) they read and index the work.
8 op73 // Mar 25, 2011 at 9:46 am

in india, copyright is widely interpreted as ‘the right to copy’
9 Linkdump. « Signals Crossed // Mar 25, 2011 at 12:30 pm

[…] prescient summary of many of the issues that will (should?) direct the future of intellectual property law, written […]
10 Mark Dionne // Mar 25, 2011 at 9:20 pm

The owner of a web page can prevent it from being indexed by search engines by adding meta information telling “robots” to keep out. If books had a similar marking inside the front cover, most of the current problem would not be happening. Unfortunately, nobody foresaw this 50 or 100 years ago.
11 Julian Sanchez // Mar 26, 2011 at 2:26 pm

Mark-
Sure, though that’s a request depending on the voluntary compliance of the spider. And in general, you can’t limit fair use rights by sticking a notice in the cover that way (“All quotation for parody or criticism must be approved by the publisher!”)

Still, defaults matter! Even we were to regard that as legally binding, there’s a big practical difference between “permitted unless the owner takes explicit steps to disallow” and “forbidden without explicit permission.”
12 Dan // Mar 27, 2011 at 3:03 am

This formalistic fixation is the kind of thinking that led to cases such as Cartoon Network v. CSC, where Cablevision wanted to let customers record shows on “DVRs in the cloud” and it was legal because each time it was recorded a separate time (so Cablevision would end up with thousands of copies of American Idol on its servers, for example). See also Zediva, which is trying to avoid paying the license fees to stream movies by streaming them from individual physical DVD players.

Yes, they’re good because they’re loopholes in the overly restrictive copyright regime. But it’s just inefficient to require workarounds like this. (Perhaps the inefficiency is the point, though–perfectly efficient copyright-free streaming would be so cheap that it would not sufficiently reward rightsholders and producers).
13 Ridiculousness – Michael Alan Miller // Mar 29, 2011 at 7:58 pm

[…] shows such a fundamental non-understanding of how computers work that it is just […]
14 Not a ute – Michael Alan Miller // Mar 30, 2011 at 1:37 am

[…] Republican war on uteruses and those who possess them. But what seems to have thrown everyone — save for a handful of embittered and neglected […]
15 hxa7241 // Apr 2, 2011 at 11:04 am

All very perceptive and right. But I wonder if the implicit underlying gist — that we need more clarity and sense — leads away from appreciating the *actualities*; or at least, there is a complimentary angle worth considering here.

If we look at copyright as it actually works — that is, as a tool for corporations — clarity and sense do not serve that.

The intent of those copyright owners is control: avoidance of the market — effectively, free subsidy from the governement. Clarity is no use here, in fact the opposite. If you have *control* you do not need clarity: you are better served by obscurity and complexity.

If you look at various corporate crime — think Enron — you will probably find the deliberate creation of obfuscation. If people understand what you are doing, your control is endangered. And reading about TRIPS history, obscurity was one of the various tactics employed to get the corporate interests their desired results — i.e. to support their control.

When we have a few powerful organisations that yet still depend on compliance of the many (as with actual copyright), it seems likely that the system will naturally evolve to produce complexity and obscurity.
16 精力剤 // Jun 22, 2011 at 9:33 pm

媚薬
17 sac à main // Aug 26, 2011 at 3:19 am

Welcome to .Our company was founded in 2004 and was committed to internet marketing businesses in 2006.

Replica Handbags:http://www.replicabagsell.com
18 Minor Field Readings: Digital Historical Method & Pedagogy // Feb 10, 2014 at 9:09 pm

[…] Threatens Creativity (2003). 5 Ways The Google Book Settlement Will Change The Future of Reading Google Books, Fair Uses, and “Copyright” as Misnomer Mguel Helft, “Judge Rejects Google’s Deal to Digitize Books,” New York Times, March […]
19 ARL Policy Notes // Nov 5, 2014 at 6:57 pm

[…] Sanchez argues, contra Siva Vaidyanathan, that Google Book Search is fair use. March 24, 2011 Leave a […]