Thursday, December 04, 2008

Google and Fair Use

There's some background to the Google/AAP settlement that I believe is key to understanding the subtext around it. This won't be news to most folks, but I thought it would be good to re-articulate it in the context of the settlement, lest we forget.

Google's first business is that of indexing resources that are on the web. I'll talk about them as if they were all texts because it's easier, but the same thing could be said for images and other resources.

To do the indexing, Google must make a copy of the web page or document. Using this copy, it adds the page to its search engine. As a good citizen, Google pays attention to the robots.txt file, and does not index pages where the site owner has opted out of being included in search engines.

This is all fine and unremarkable until you look at it from the point of view of copyright law. Copyright is specifically about... making copies, and it gives the right to make copies, or to authorize the making of copies, to the copyright holder. That can be the author, or someone to whom the author has passed along the right. Copyright holders must opt in to the making copies: they have to give permission. The default in copyright law is that copies cannot be made unless the copyright holder gives approval.

So the big question is: Is Google violating copyright law by making copies of web pages without the permission of the copyright holders? There are two main ways of looking at this:
  1. The web is different from the print environment. Anyone who has put their works out on the web has agreed to copying because no one can even view the work without making a copy. If they don't want people copying, they need to hide their works behind a security screen. However, there is no such exception or wording in copyright law that would support this.
  2. The web is not different from the print environment. But Google is just producing an index and there is nothing in copyright law that would prevent someone from producing an index of words in texts. The incidental copies that Google makes in order to produce the index are allowed under the Fair Use aspects of the copyright law.
So then we move on to the Google Books project. Initially, Google claimed that it was doing the same thing with books as it does with the web: making incidental copies in order to create keyword indexes to the texts. In terms of copyright law, argument #1 is pretty much out because these works can be read without making a copy, so the copyright holders haven't agreed to let their works be copied. This leaves us with argument #2: it must be fair use.

In fact, Google did and does make the fair use argument. The libraries that partnered with Google also came to the fair use conclusion in at least some cases. The CIC project FAQ says:

University of Michigan said this in 2007:

Does this project comply with copyright law?

Yes. This project was undertaken with careful attention to the law and to the rights and responsibilities of the various parties involved. The purpose of copyright law is to promote progress in society. We are confident that the Books Library project is fully consistent with the fair use doctrine under U.S. copyright law and the principles underlying copyright law itself. Copyright law strikes a balance between rewarding creators of intellectual property for their creations and facilitating public access to these works in ways that do not create a business harm. For books, this means ensuring authors write books, publishers sell them and libraries lend them. By making books more discoverable, Google is enhancing the ability of authors and publishers to sell books to an audience beyond the traditional book market.

What was at stake with the AAP lawsuit was exactly this decision about Fair Use. If copying the books for the purpose of indexing were determined to not be fair use, then this decision could bleed over into the web. And of course it would mean the end of Google Book Search (which has now become Google Book Store). Although Google has always provided a confident posture to the public, declaring unwaveringly that what it does as a search engine is perfectly within copyright law, the idea of going to court over the issue would have put their entire operation at risk.

Now back to libraries. Fair use is not a list of things you can do but a judgment call relating to some complex factors. Some key factors have to do with whether your use is commercial in nature or could compete with the exploitation of works by the copyright holders. There are, in addition, exceptions in the copyright law relating to research and study, and special exceptions for libraries. In fact, in relation to copyright law, libraries and educational institutions get considerably more latitude in using works than do commercial enterprises. As an example, a teacher can make copies of an article for her students as part of a lesson, and that is generally considered fair use. A company manager who wants his staff to read an article cannot rely on fair use for copying, but must apply to the copyright holder (usually through an intermediary such as CCC) and pay a fee. (See the Texaco case.)

What happened with Google Book Search and the AAP is that the digitization of the libraries' books and subsequent use of those was judged not by the criteria that would be used normally for libraries, of course, but by the criteria that would be used for a commercial entity. That's totally logical, since although Google was partnered with the libraries, the primary use of the materials was to fuel Google Book Search, an obviously for-profit activity.

Libraries have gotten the short end of the stick because their use of their own materials became commercialized through their partnership with Google. If instead libraries had managed to digitize the books on their own, the outcome would have likely have been entirely different (if any lawsuit had been brought, which might not have happened). I believe that libraries could be found to have a fair use case for digitizing their works for the purposes of searching, and could be allowed to use those digitized copies for the exceptions spelled out in section 108 of the copyright law (such as providing access to the sight impaired, or for replacement of deteriorated originals). Unfortunately, the concept of digitization of the contents of libraries has now been tainted with the air of commercialization and has earned the wrath of the publishers and authors. The Google/AAP settlement has created a mechanism that ignores the inherent rights of the libraries, but also makes it more difficult for them to justify undertaking their own digitization project.

This is why I disagree heartily when I hear statements like:

We're delighted that this agreement creates new opportunities for libraries and universities to offer their patrons and students access to millions of books beyond their own collections. (from Google)
The settlement might look good from the point of view of a commercial entity facing copyright law, but it binds the non-profit educational and cultural heritage community to legal decisions designed for the for-profit sector. This is not only not a win for libraries, but it will hinder libraries in their efforts to make use of current technologies to further the arts and sciences.

No comments: