Access and Storage of Knowledge in the New Millennium: The Google Book Search Library Project and the Future of Libraries
by Jacob Rooksby - January 11, 2007
The search engine Google causes ripples of controversy within the academic and publishing communities every time it announces a new institutional partner in its Google Book Search Library Project--a massive undertaking aimed at digitizing millions of holdings at academic libraries across the world. A review of the history of libraries, however, reveals that Google's project may just be the next logical step in addressing the persistent problems libraries face in regards to the question of how best to store and access knowledge.
In December of 2004, the publicly traded search engine giant Google announced that it had completed deals with five major libraries to digitize all or parts of their collections.1 The Google 5, as these libraries came to be known, include four university libraries (Michigan, Stanford, Harvard, and Oxford) as well as one public library (the New York Public Library). Since the initial announcement, four other university libraries have joined the Google 5, including libraries within the University of California system, the Complutense University of Madrid, the University of Wisconsin at Madison, and the University of Virginia.2 The purpose of the mass digitization, or Google Book Search Library Project as Google calls it, is to allow anyone with Internet access the ability to search for and locate books online. Googles ultimate goal is to add over 15 million library volumes to its electronic index over the next decade, at an estimated cost of $150 million.3
Googles announcement of the Library Project immediately caused controversy in the academic and publishing communities. Even though searches for books under copyright will only display a snippet of text from the copyrighted work, detractors still claim that the mere act of digitizing entire books under copyrightregardless of the amount of text actually displayedconstitutes copyright infringement.4 Less than a year after the Library Project was announced, the Authors Guild, the largest society of published writers in the United States, filed a lawsuit against Google seeking damages and an injunction.5 The complaint alleged that the Library Project constitutes massive copyright infringement.6 A month later, the Association of American Publishers (AAP) sued Google for copyright infringement as well, this time on behalf of McGraw-Hill Companies, Penguin Group, Simon & Schuster, and other such publishers.7 Instead of an award of damages for the alleged copyright infringement, the AAP seeks a judgment declaring that Googles actions are unlawful.8 In advance of whatever decisions the federal courts in these two cases might make, legal commentators seem to agree that the legality of Googles Library Project is questionable.9 Although Google claims that its copying is protected by the fair use doctrine of copyright law, most legal scholars who have conducted the fair use analysis that courts employ when this defense is raised conclude that Googles claim to fair use is tenuous at best.10 After all, Googles Library Project involves the reproduction, distribution, and display of copyrighted worksall three of which are exclusive rights granted to the author (or publisher, if the rights are assigned) under the bundle of rights known as copyright.
Despite these two prominent lawsuits, the digitization of books at the eight academic institutions now participating in the Library Project has gone forward undeterred, and some materials are now available online through the beta version of the Google Books site.11 In addition, many professors and administrators at these institutions have been vocal in their support of the project. The University of Michigan has been the chief defender of the Library Project from the outset, as it is the first and only university library that has allowed Google complete access to its entire collections (in this case, 7.8 million volumes).12 Mary Sue Coleman, the President of the University of Michigan, has been a strong supporter of the Library Project, stating that the ability to perform Boolean searches of all the worlds texts will revolutionize research and scholarship.13 She has also furthered what has become a divisive argument in publishing circles, stating her belief that the book-scanning project will promote book sales as it all but guarantees increased exposure to scholarly presses with narrow audiences.14 Others echo Colemans enthusiasm. An associate university librarian at Michigan called the Library Project an important moment in the history of libraries, and an important moment in the history of scholarship, while the president of the New York Public Library has said that the project will be a huge benefit to researchers because it will make the process of finding materials more efficient.15 Meanwhile, the provost for academic, information, and instructional technology affairs at the University of Michigan believes that the project has enormous benefits for todays students, for [i]f you cant find it online, it wont be read.16
However controversial the Library Project may be among publishers, lawyers, universities, and librarians, when viewed from a historical perspective, one soon discovers that the problems presented by the Library Project are as old as the history of libraries themselves. Since the beginning of libraries in America, questions arose regarding how to best warehouse and access knowledge. These problems of storage and access are only slightly different for academic librarians today as they face the digital and online age. As library historian Fred Lerner aptly put it:
The anarchy of the Internet may be daunting for the neophyte, but it differs little from the bibliographical chaos that is the result of five and a half centuries of the printing press. The same science that produced the Anglo-American Cataloguing Rules and the Universal Decimal Classification can be applied to the World Wide Web.17
As Lerners work suggests, these two chief problems of library sciencestorage and accessintertwine into one salient question that has persisted and will continue to persist through the ages: how does one access what is stored?
As to be expected, university libraries and librarians have played important roles in addressing this question. College libraries before 1850 generally consisted of small and disorganized collections of books; the books were seldom used, but strictly guarded.18 After 1850, the growth and development of university libraries followed the influx of money and interest in higher education more broadly in the United States. In 1862, with the passage of the Morrill Land Grant Act, many new institutions of higher learning were created, and with them, their libraries soon became some of the finest in the land.19 As more universities became research centers along the lines of the German model, the role of the library in the university became increasingly important. The introduction of the seminar style of teaching, where students were asked to conduct independent study and research, caused library resources to be in high demand.20 Michael Harris, the authority on library history in the Western world, stated that book collections grew with such speed during this time period that it was soon accepted as a fact of life that these libraries could be expected to double in size every 16 years.21
The increase in collection sizes and the overall interest in using library resources created problems of organization for librarians before the turn of the century. Given the seemingly steady influx of new books, it was not uncommon for entire libraries to be re-catalogued and new classification systems implemented with some frequency.22 Melvil Dewey, the John Adams of the American library movement, emerged during this period of widespread growth of university libraries as the foremost leader dedicated to solving these new problems of storage and access.23 As a student at Amherst in 1876, Dewey wrote Decimal Classification, the fountainhead for his system of classification and organization that would soon become the standard for all American libraries.24 In that same year, Dewey, along with a few others, founded the American Library Association in order to give a professional backing to their efforts to overhaul and unify library methods in this period of heightened interest and growth.
When Dewey moved to Columbia University, he founded there in 1887 the first school of what was then called library economy. The peculiar name bespoke Deweys philosophy of the growing field. Librarianship was, in his view, a mechanical art; he often referred to the academic library as a machine and sought to have it work with similar efficiency.25 Although Deweys school was short-lived (it lasted only two years), his work in establishing library economy as a serious field of study influenced generations of librarians who would go on to use his decimal classification system and other models of efficiency to help make knowledge collections both organized and accessible. The graduate school of library science (now defunct) at the University of Chicago, founded in 1928, furthered Deweys vision of training librarians in best practices, although many small colleges did not have a trained librarian until the 1920s.26 As Harris writes, out of these changes, the library ceased being a museum and became a more active part of the academic program at colleges and universities in the twenty-first century.27
The 1930s and 40s ushered in new questions of access for librarians. Debates flourished as to whether collections should be stored in one central location or whether individual departments could house a collection of the works they used most often. Generally speaking, proponents of the departmental system prevailed.28 With fewer funds being spent on new acquisitions during the Depression and war years, many libraries started to rely more heavily on interlibrary loansbefore, a merely nascent ideain order to solve problems of access presented by dwindling budgets.29 After the war, with thousands of new students going to college on the GI Bill, colleges and universities again had to make more room in libraries, either by expanding existing structures or by building new ones. These challenges often resulted in the reorganization of library procedures and the reclassification of book collections. As new forms of media such as tapes, discs, films, and filmstrips entered the curriculum, new classifications had to be made in order to make these materials accessible as well.30 Faced with a lack of space for old newspapers and other periodicals, libraries in the 1950s turned to the new and more space-efficient technology of microfilm to store these materials and make them available to users.31
Although the Google Library Projects threat to traditional printed materials may seem novel, by as early as the 1960s some scholars were already predicting that libraries would be completely paperless by the year 2000.32 Their predictions stemmed from a concern over the increasing costs of acquisitions and the dwindling shelf space in which to store new materials. New technologies would be developed and implementedthese scholars hypothesizedto further enhance accessibility. The advent of the computer soon made these formerly fanciful ideas appear achievable. In 1967, the presidents of the colleges and universities in Ohio founded the Ohio College Library Center (OCLC) to develop a computerized system in which the libraries of Ohio academic institutions could share resources and reduce costs.33 In 10 years, the organization changed its name to the Online Computer Library Center, marking its shift from a statewide initiative to a worldwide network with vast implications. With its development of WorldCat, a comprehensive catalog index of the holdings of its constituent libraries, OCLC established the first vehicle through which intra- and interlibrary knowledge could be archived and accessed. Now approximately 42,000 libraries across the globe use OCLCs services and WorldCat is an indispensable tool for researchers of all kinds.34
The computer age also heralded other initiatives that sought to streamline access to knowledgeone of which occurred at two libraries now participating in the Library Project. In 1974, the New York Public Library and the libraries at Harvard, Columbia, and Yale banded together to form a sharing consortium called the Research Libraries Information Network (RLIN) that soon established a bibliographic database of holdings not available on OCLC. Through the establishment of OCLC and RLIN, which have since merged, the interlibrary loan process was immensely simplified and popularized.
In contrast to how both OCLC and RLIN were chiefly developed to help libraries confront access issues in the face of space limitations and the complexities of archiving millions of volumes, Project Gutenberg is more closely analogous to Googles Library Project due to its concern with making text not only accessible, but also retrievable. Project Gutenberg began in 1971 when a computer scientist at the University of Illinois was given $100,000,000 of computer time on the Xerox mainframe computer at Illinois materials research lab.35 He used his free time to encode the Declaration of Independence and send it to other computers on the network; thus, the first e-text was born.36 Similar in part to Googles aspirations, Project Gutenberg formed around the idea of wanting to freely provide to anyone electronic copies of classic texts in the public domain.37 To date, Project Gutenberg contains 19,000 books in over 50 languages in its online catalog, although unlike Googles Library Project, the works in Gutenbergs database are all in the public domain and appear in plain text as opposed to original form.38 However, the Internet Archive, a combined access and storage initiative founded in 1996, allows users to view public domain books in their original, as printed form.39 Currently, the Internet Archive offers nearly 36,000 texts in its database, 5,000 of which were scanned by Yahoo! (one of Googles rivals) from American libraries.40
Another space-saving initiative began in the late 1980s when the Carnegie Foundation began an effort to ease the increasing problems faced by libraries seeking to provide adequate stack space for the extensive back-files of scholarly journals.41 Using a process developed at the University of Michigan, over 750,000 pages of issues from 10 different journals were converted into an electronic format of bit-mapped images.42 Out of this effort, the search engine now known as JSTOR developed. This new service allowed libraries to save space by discarding old journals after they had been posted to JSTOR.
As this foray into the history of knowledge, storage, and access issues shows, Googles Library Project is perhaps not as groundbreaking as either its champions or detractors have proclaimed. Academic libraries have changed in various ways since the 1850s in expanding their structures, adopting new methods, and helping to develop new technologies that increase their efficiency and expand their utility. What the Library Project offers to this history of innovation is its chutzpah and scale. No initiative before Googles had ever considered it lawful to make electronic copiesand in Googles case, exact copiesof works still under copyright. JSTOR came the closest, but it decided to pay publishers fees in order to copy the journals it puts in its database, which it then offers to end users for a fee. Furthermore, no project to date has combined both goals of storage and access in such a widespread and far-reaching way. Although the OCLC and RLIN initiatives provided solutions to the persistent access question, neither even attempted to unify storage. The Google Library Project aims to do both, and on a scale that far surpasses Project Gutenbergs, JSTORs, and Internet Archives previous efforts in this field.
The Google Library Project is also different from the access and storage initiatives that have come before it in that the Library Project, although carried out in conjunction with university libraries, is spearheaded and nearly entirely funded by a private entity. Questions naturally arise as to whether society should cede control of such an ambitious undertaking to a private entity. As one detractor has put it: We want a public library system in the digital age, but what we are getting is a private library system controlled by a single corporation.43 What if a hacker were able to steal copyrighted works from the database, or if Google, of its own accord, decided to push the limits of fair use even further by making larger portions of copyrighted works available for free on the Internet? Furthermore, what happens if Google goes bankrupt and liquidates its assets? For the time being, these lingering fears seem to have been quelled (at least at participating institutions) by the hard economic fact that without Googles private funding, such an ambitious project could never have been undertaken. As one academic publisher and supporter of the project puts it: Googles model is more likely to help more people find library resources and publishers works than anything else on the horizon.44
The Google Library Project also tests the boundaries of copyright fair use unlike the access and storage initiatives that have come before it. By using an opt-out policy, whereby Google leaves it up to publishers to affirmatively contact Google should they not want their copyrighted works included in the project, Google has appeared as almost blasé to charges of copyright infringement. Until a court declares its mass digitization efforts unlawful, however, Google will continue its work with academic libraries to make the contents of millions of books searchable to anyone with an Internet connection. Although the Library Project may be controversial at this stage of its development, years from now it might rightfully be seen as the next logical step in the history of libraries struggle to warehouse knowledge in such a way as to make information both freely and easily accessible to others.
1 Press Release, Google, Google Checks Out Library Books, December 14, 2004, http://www.google.com/press/pressrel/print_library.html (last visited October 28, 2006).
2 Scott Carlson, U. of California Will Provide Up to 3,000 Books a Day for Google to Scan, Chronicle of Higher Education, September 8, 2006, A32 (hereafter U. of California); Stu Woo, Googles Book Project Gets Its First Non-English Partner, Chronicle of Higher Education, October 6, 2006, A29; Scott Carlson, U. of Wisconsin Joins Google Library Project, Chronicle of Higher Education, October 27, 2006, A37; Sarah Myers, Google to Digitize U.Va. Collections, The Cavalier Daily, November 15, 2006, http://www.cavalierdaily.com/CVArticle.asp?ID=28611&pid=1516 (last visited November 19, 2006).
3 The Next Great Library, Lost Angeles Times, December 18, 2004, B22.
4 Andrea L. Foster, Google Counters Critics of Library Project, Chronicle of Higher Education, March 10, 2006, A32.
5 See Authors Guild v. Google, No. 05 CV 8136 (S.D.N.Y. Sept. 20, 2005).
7 Association of American Publishers, Press Release, Publishers Sue Google Over Plans to Digitize Books, October 19, 2005, available at http://www.publishers.org/press/releases.cfm?PressReleaseArticleID=292 (last visited Oct. 28, 2006).
9 Emily Anne Proskine, Googles Technicolor Dreamcoat: A Copyright Analysis of the Google Book Search Library Project, 21 Berkeley Technology Law Journal 213 (2006). See also Michael Goldstein, Googles Literary Quest in Peril, 2005 Boston College Intellectual Property & Technology Forum 110301 [sic]; Elizabeth Hanratty, Google Library: Beyond Fair Use? 2005 Duke Law & Technology Review 10.
11 Jeffrey R. Young, Google Adds First Scanned Library Books to Search Index, and Says Copyrighted Works Will Follow, Chronicle of Higher Education, November 18, 2005, A34. See also http://books.google.com/ for the beta version of Google Books.
12 Carlson and Young, Google Will Digitize.
13 Mary Sue Coleman, Creating a Global Online Library Will Spread Knowledge in the Quickest Way to the Most People, The Oregonian, November 13, 2005, C1.
14 Andrea L. Foster, U. of Michigan President Defends Librarys Role in Book-Scanning Project, Chronicle of Higher Education, February 17, 2006, A40.
15 Carlson and Young, Google Will Digitize.
16 Scott Carlson, Publishers Sue Google to Prevent Scanning of Copyrighted Works, Chronicle of Higher Education, Oct. 28, 2005, A43.
17 Fred Lerner, The Story of Libraries, Continuum: New York, 1998, p. 211.
18 Michael H. Harris, History of Libraries in the Western World (4th ed.), Scarecrow Press: Metuchen, N.J., 1995.
19 Ibid., p. 250.
22 Ibid., p. 253.
23 Matthew Battles, Library, W.W. Norton: New York, 2003, p. 140.
24 Ibid.; Don Heinrich Tolzmann, Alfred Hessel, and Reuben Peiss, The Memory of Mankind, Oak Knoll Press: New Castle, DE, 2001.
25 Battles, Library.
26 Harris, History of Libraries in the Western World (4th ed.).
32 Tolzmann et al., The Memory of Mankind. See also Jeffrey R. Young, Googles New Deals Promise to Realize a 60-Year-Old Vision, Chronicle of Higher Education, January 7, 2005, A38.
33 OCLC, History of OCLC, http://www.oclc.org/about/history/default.htm (last visited October 28, 2006).
35 Gutenberg Project, About Us, http://www.gutenberg.org/wiki/Gutenberg:About (last visited October 28, 2006).
37 Although the first, Project Gutenberg was not the only initiative at the time pursuing the goal of making public domain texts available to end users. LexisNexis, the popular subscription database that combines access and storage to law cases and news articles, began operations in 1973 (for law cases) and 1980 (for news articles). Of course, unlike Project Gutenberg and Googles Library Project, users must pay for LexisNexiss services. LexisNexis Press Center, http://www.lexisnexis.com/presscenter/ (last visited November 19, 2006).
38 Gutenberg Project, About Us, http://www.gutenberg.org/wiki/Gutenberg:About (last visited October 28, 2006). In the early 1990s, other public projects developed similar to Project Gutenberg in their aim of combining access and storage, such as NetLibrary, Questia, and Project Xanadu. However, all of these failed. See Mark Y. Herring, Dont Get Goggle-Eyed Over Googles Plan to Digitize, The Chronicle Review, March 11, 2005, B20.
39 Internet Archive, About the Internet Archive, http://www.archive.org/about/ (last visited October 28, 2006).
41 JSTOR, The History of JSTOR, http://www.jstor.org/about/background.html (last visited October 28, 2006).
43 Carlson, U. of California.
44 Michael Jensen, Presses Have Little to Fear From Google, The Chronicle Review, July 8, 2005, B16.