Tim Lee has a great analysis at Ars Technica of this week’s ruling invalidating the controversial Google Books settlement. Tim, like the court, focuses on aspects of the agreement that seem to give Google a unique advantage in the online book market—and hopes that instead Google will now simply defend its copying of books for indexing purposes as a fair use. Somewhat to my surprise, Siva Vaidyhanathan seems hostile to this approach in his writeup at Slate:
Back in 2004 Google shocked the publishing world by announcing that it had been for some time secretly scanning books from major university libraries. Some of these libraries were allowing Google to scan in books that were clearly still covered by copyright. Google tried to convince everyone that this was all just fine under U.S. copyright law by asserting that we readers would only get to experience “snippets” of the entire book that sat in Google’s servers.
This, to Google, was an example of “fair use,” a defense used against an accusation of copyright infringement that gives the public a way to deploy portions (or sometimes all) of a copyrighted work for some publicly valuable use. Critics, journalists, teachers, and scholars rely on fair use every day to quote from copyrighted sources. But what Google proposed was wholesale and revolutionary: It would have turned the copyright system on its head and redefined fair use in ways that were never intended.
I fear Siva’s general wariness of Google is overpowering his normally sound instincts on copyright in the digital era. It’s true, of course, that Google must scan the whole book in order to be able to provide users with search results. But that’s not exactly unprecedented—it’s how search engines work. Google’s computers need to make a copy of the text on all those Web pages in order to index them—how else would they know which pages contain text that matches your search query? And, of course, most of that text is copyright-protected by default. The same goes for image indexing, which courts have given a green light so far. I’d suggest that one reason people have a different intuition about Google Books is that Web content is already in digital form—which means everyone who reads it is “making a copy”—whereas Google was making digital copies of analog works. But as a policy matter, it’s not clear why this should make such an important difference.
Suppose I tweet that I’m trying to remember which Borges story has that line about how “mirrors and copulation are abominable, because they increase the number of men.” Some of my diligent friends hurry to their libraries, flip through their Borges collections, and tweet back the answer—along with a few sentences of the surrounding context. Clearly there’s nothing intrinsically objectionable about the search function, and a quotation of a sufficiently limited portion of the whole work in reply would normally be protected by fair use. The problem is just that Google’s search—and indeed, any computer search—technically requires that a copy be made. But to my mind, this just underscores how increasingly maladaptive it is to make “copying” the primary locus of regulation in our system of intellectual property.
Technology even complicates the question of just what constitutes a “copy”—an intriguing issue I explored in a few articles back in my days at Ars Technica. Imagine, for instance, that Google took a different approach to indexing in hopes of avoiding thorny copyright questions. Instead of storing “copies” of each book, suppose they created a huge database called Google Concordance, consisting of an enormous catalog of every word or short phrase someone might want to look up, followed by a long list, like a kind of super-index, specifying the location on every page of every book in which that word or phrase appears. (“Aardvark: Blackwell Guide to the Philosophy of Computing and Information, Page 221, Line 17, word 3…”) Obviously, the Google Concordance would be a very valuable and useful reference text, and nowhere in the database would you find anything resembling a “copy” of any of the cataloged works. But just as obviously, it would contain all the information a clever programmer would need to reconstruct an arbitrary portion of the original text on the fly, assuming the database could be queried fast enough. You can imagine someone creating certain kinds of “derivative works” in a similar way: If you don’t want the RIAA taking down your mashup, you might try to offer it as an algortithm specifying time segments of component tracks to be combined in a particular manner… an algorithm that might produce gibberish or Girl Talk depending on what files you feed it.
In a sense, it’s always the processing algorithm that determines whether a particular binary string is a “copy” of a work or not. Open an MP3 of a Lady Gaga track in a text editor and you’ll get a wholly original work of experimental literature—though not one anybody (except possibly Lady Gaga) is likely to be interested in reading. For that matter, Google’s database is just an enormous collection of ones and zeroes until some program processes it to generate human-readable output. I distinguished my hypothetical Google Concordance database from a collection of copied books, but if you point to a particular file and ask whether it contains the Concordance or copies of the books, there’s a very literal sense in which there just is no fact of the matter until you know what algorithm will be used to render it as alphanumeric text. This may sound like airy metaphysical hairsplitting, but the power of computers to rapidly aggregate and process dispersed information on a global network is likely to create genuine practical complications for a legal framework that takes discrete, physically contiguous chunks called “copies” as its fundamental unit of analysis. Legally speaking, it would seem to make an enormous difference whether books are scanned and stored as books, or as a comprehensive concordance database maintained by Google, or as a series of hundreds or thousands of complementary partial concordances dispersed across many servers (or even individual hard-drives linked by a p2p network). Given sufficient bandwidth and processing speed, it might make no difference at all in practice. Maybe we should take that as a hint to reexamine our categories.
The Constitution doesn’t explicitly use the term “copyright”: It empowers Congress to grant creators certain “exclusive rights” in their works, leaving open just which rights should be made exclusive. And indeed, many of the privileges lumped together under the rubric of “copyright”—such as public performance and the creation of “derivative works”—need not involve making a “copy” in any ordinary sense of the word. But as the word itself suggests, the way our system has traditionally worked is that if you owned a copy of a protected work, you could pretty much do what you wanted with it—read or view it when and where you like, sell it to someone else, chop it into bits and eat it with Hollandaise if that’s what turns you on—as long as you weren’t making a copy. And for a long time, this wasn’t a very restrictive limitation, because making copies of a work was generally an expensive and labor intensive business that ordinary people had neither the capability nor, indeed, any compelling reason to engage in. Pretty much the only people with motive to invest the substantial resources needed to print books or press vinyl were pirates who planned to sell a bootleg edition for profit. So we got a regime where most uses of a protected work were unregulated, but copying was infringing by default—with “fair use” exceptions to cover the legitimate uses that did require some amount of copying. These tended to be more partial and limited forms of copying—like quotation for commentary and criticism—that were open to people without printing presses, and fairly clearly not meant as market substitutes for the complete original work.
But now, as many writers on copyright have observed, just about every use of a work in digital form requires making a copy—because that’s how computers work. When you play a song stored on your hard drive—whether through local speakers or by streaming it to a mobile device—copies are created. We talk about “visiting” Web pages, as though there’s a text out there we’re all “going” to look at—but of course, every time you read something online, you’re making a copy of it. And again, the only reason we’re able to find content online relatively efficiently is because various companies make copies for indexing purposes—for profit, even!—without seeking permission from every person who’s got content online. (I hope it’s clear that it would beg the question to say that posting something online to be read by the public implies consent to copying for indexing purposes.) What was once a rare activity has become a ubiquitous concomitant of all sorts of ordinary uses of content. All these acts of copying are still presumptively regulated, but we assume they’re either tacitly permitted by the rights holder or shoehorned by the courts into one or another of the fair use “exceptions” that have been defined to accommodate the changing technological reality. (Ordinary people had no occasion or capability to “time shift” or “place shift” content until the past few decades.)
Instead of ginning up exceptions to a general prohibition on copying just to permit publicly valuable use of content, maybe we should just admit that “copying” no longer makes sense as a primary locus of intellectual property regulation. Fair use analysis typically employs a four factor test, but the upshot is usually to see how a particular type of copying would affect the market for the original work—which makes sense, given that the purpose of copyright is to give creators a financial incentive to produce and distribute new works. If that’s fundamentally what we care about, though, a default property-like right of control over copying, which now has to be riddled with exceptions to allow almost any ordinary use of content, looks like an increasingly circuitous Rube Goldberg mechanism for achieving that goal. I’m not sure what the alternative would be—or even whether rejiggering the basic categories would alter the the underlying analysis much. But—just off the top of my head—you could imagine a system where the core offense was not “copyright infringement” but some kind of tort of unfair competition with an original work. In many cases it would yield the same practical result, but at least we’d reorient the public discourse around “copyright” to focus on measurable harms to creators’ earnings—and ideally get away from the confused notion that copying without permission is somehow equivalent to “stealing” by default unless it fits some pre-established exception.
Addendum: This is implicit in much of the discussion above, but probably worth spelling out explicitly. Until the advent of consumer computing—and especially the Internet—almost all “processing” or “use” of copyrighted content took place in human brains, because… well, where else? An external “copy” or “derivative work” based on that content would generally appear (if at all) as the end product of whatever operations you performed on the temporary copy stored in your brain—which, mercifully, no government has yet held to be susceptible to takedown notices. Your mental copy of the public library book didn’t count for copyright purposes, and if you copied down a few passages in a notebook (the results of your search query), they’d pretty clearly be fair use in the unlikely event the owner even became aware of them. If you wrote a song with a baseline inspired by “Superstition,” the transformation would happen in your head rather than ProTools, and if it didn’t sound too exactly similar, people would call it “influence” rather than “sampling.” Myriad McLuhans manqué have written about how technology and the cognitive outsourcing it enables are transforming our habits of thought and learning—spurring a shift from the acquisition of memorized information to the acquisition of information search skills.
One way to frame my argument above is in terms of what Larry Lessig calls “translation“: Laws establish a balance between competing interests—intellectual autonomy versus cultural control, say—against a specific technological and legal background context. If you have legal rules that protect privacy mostly by means of property boundaries, and then technology (wiretapping, long-range mics) makes it possible to collect intimate information from within a home without physical intrusion, the balance will shift dramatically even if the formal rule remains exactly the same. In fact, if you want to preserve the previous balance between law enforcement and privacy interests, you may need to adopt an entirely different legal paradigm. (Which, indeed, is what we did—though it took 40 years.) Cognitive outsourcing, like long-distance communications, may change the balance in undesirable ways by shifting formerly unregulated mental tasks into the regulated space of digital copies, just as wiretapping temporarily shifted government searches out of the regulated space of property intrusions. The thing to bear in mind is that the particular regulatory trigger is usually a proxy for some more complicated underlying set of interests. If a technological change means a set of activities that were previously unregulated are now effectively highly regulated (or vice versa), we should think very hard about whether we want to preserve the existing formal architecture or—as I think will usually be the case—the same balance of interests.
Addendum II: As I mention above, I’m scarcely the first person to whom most of this has occurred: Ernest Miller in the comments points to a paper he and
Joel Joan Feigenbaum wrote way back in 2001 called “Taking the Copy Out of Copyright,” which I’m looking forward to reading. At first glance, I’m reminded (and curse myself for having forgotten as I wrote this post) that American “copyright” law did not initially mention “copying” as such, but rather focused on publication, printing, and sale.