Google is Reimagining Writing and Publishing (if the Supreme Court Will Let It)

3.8.2016

UPDATE: By a short order, without explanation, on April 18, 2016 the U.S. Supreme Court refused to hear the Authors' Guild's appeal of the Second Circuit's decision in this case. For that reason, and barring any request for rehearing which is seldom granted, the Second Circuit's decision is affirmed and final. Google is free to set information free, after all.

ORIGINAL POST:

Google, or its parent company Alphabet, is the most valuable company in the world. It doesn't want for much and, least of all, for ambition and imagination. So its revolutionary Google Library project, to catalog and archive the world's published works, and to make them searchable to anyone who wants to look, is no surprise. Google Books is the apotheosis of Stewart Brand's claim that "information wants to be free."

In a few months, the Supreme Court of the United States will decide if Google will be allowed to make good on that promise.

In a world where we're all accustomed to quick and easy access to the internet, what's so revolutionary about Google Library? In short, it proposes to make searchable, for free, every published work ever written. All of it. Completely. Forever. In Google's own words: "Our ultimate goal is to work with publishers and libraries to create a comprehensive, searchable, virtual card catalog of all books in all languages that helps users discover new books and publishers discover new readers."

So, yeah. All the books.

Google attempts to realize this promise through the integration of three brilliant technologies. First, it's devised a hardware solution to digitally scan millions and millions of books. Once books are loaded into them, Google's machines turn the pages and scan them auto-magically (The details of how they do that aren't clear, but people sure would like to know.) Since 2004, Google has scanned 20 million books. Second, it stores this massive store pile of data on servers it operates and controls, with security protocols consistent with its own internal security requirements. Third, having digitized the pages of these paper books, it uses software to make the text searchable.

If Google knows anything, it knows search.

But you don't get up to 20 million books without copying some that are protected by copyright. How does Google get to do that?

To answer that question, you have to know where Google got 20 million books. It didn't buy them. It's partnered with 40 libraries around the world to scan their collections. They include the Austrian National Library, the libraries at Columbia, Harvard, and the University of Virginia, and Keio University Library in Japan. The University of Michigan was an early leader in digitizing its own collection, but, at the rate of a paltry 5,000 volumes a year, it would've taken them 1,000 years to scan just their own collection.

Enter Google.

The scanned books, if they're out of copyright, are available to the public for viewing or download in their entirety. The problem the Supreme Court will soon address is what to do about copyrighted books.

Google attempts to skirt copyright limitations in a couple of ways. For copyright-protected works, users who search for a book may in some instances get only a description of the book. For others, searchers may see only a "snippet" or a few sentences before and after the search term they used. For example, a search for "establishment" and "free exercise" might turn up the whole First Amendment to the U.S. Constitution. Finally, if a publisher has granted Google permission, the search might show as much as a few pages of a book's text.

Why isn't this massive copying, storage, and re-publishing illegal? Why doesn't copyright law protect the writers and publishers of this work from Google's massive copying effort? Two words: "fair use."

"Fair use," under copyright law permits people to use copyrighted work for purposes such as criticism, teaching, scholarship, or research. To determine if a use is "fair use," a term the copyright statute does not itself define (thanks, Congress) the factors courts consider include the nature and purpose of the use, the nature of the copyrighted work, the amount of the whole work being used, and the effect of the use on the market.

The Authors Guild and a host of publishers sued Google to stop Google Library. The U.S. Court of Appeals for the Second Circuit declined the invitation.

The Court ruled that scanning copyrighted books to permit them to be searched was transformative of the original works. The digital images, available for key word searching, weren't just copies of the original works. By being used as a search tool, the images became a device for finding work, not reading the entire thing. In the same vein, Google's "snippet" views of copyrighted works are "fair use" because they permit searchers to see how key words fit, in context, into a work's text without seeing the whole published work. The Court also concluded that just because Google is a for-profit company, that by itself doesn't render the Library Project un-fair use. Lots of fair use work, like parody or satire or book reviews or criticism, is done with a profit motive.

Of course, the biggest hurdle for Google's fair use claim is the fact that it is copying not just parts of books, but whole books and all books. Google's copying is indiscriminate by design. It intends to capture the entirety of someone else's work in its unedited, unparsed, unannotated completeness. Google wants every word and comma from the cover to the index at the end.

The Second Circuit met this concern by relying on the technology limitations Google places on its treasure trove of data. Specifically, for copyrighted works where it doesn't have the author's or publisher's permission, Google will keep the whole digital work but will only permit searchers limited access – snippet views or just a bibliographic record of the work. Google promises that its data security is strong and that it won't negligently let entire works leak onto internet. Plaintiffs countered that when they used "snippet view," they were able to reconstruct 16% of a protected work. The Court was unimpressed. Even that amount, collected out of order and out of context, was no "substantial" revelation of the work.

The Second Circuit's decision is on appeal to the U.S. Supreme Court. The American Society of Journalists and Malcolm Gladwell, among others, have joined the Authors Guild's request for the High Court to review the Second Circuit's decision. The Supreme Court will likely decide sometime this spring or summer whether Google Library is free to continue in its "fair use" of copyrighted works, or if the matter requires another year or so of litigation.

I started to write that when the Google Library project is done, the world will have extraordinary access to an ocean's worth of information. That sentence is wrong for two reasons. First, the world already has that access today. Google Books is on-line right now. Search way fellow seekers.

Second, and more importantly, the project won't ever be done. It can't ever be done. Every moment of every day there's someone pecking out a few more pixels. A reader or researcher influenced by yesterday's work is making something new today which may, butterfly effect-like, affect someone's work tomorrow. Every book that's read ripples through a reader like a stone ripples a pond. Google Books is creating millions of stones and a continent's worth of ponds.

Indexing the world's accumulated knowledge is an endless, impossible, futile task. Here's hoping the Supreme Court agrees to let Google try anyway.