Wednesday, December 31, 2008

Google Book Settlement for Librarians

Permission is granted to circulate this article, and to print and copy for non-commercial purposes.

Adam Corson-Finnerty is a senior library administrator at the University of Pennsylvania. He is the author of three obscure books that will be affected by the Google Book Settlement. The sentiments expressed in this article are purely his own and do not represent the views of his library or his university.

Comments are invited: corsonf@pobox.upenn.edu 215-573-1376


Google Book Settlement—For Librarians
By Adam Corson-Finnerty


Introduction

I think that the Google Book Settlement is good for publishers, good for authors, good for libraries, and good for the people.

The reading public will have millions of books available to enjoy, and copyright holders will get new revenue. It will bring us much closer to the Great Library in the Sky.

Once ratified by a judge, the privately negotiated settlement gives Google a green light to digitize virtually every English book in the world, not to mention a few million books in other languages. Whatever money Google makes from this enterprise will be split 37-63, with the lion’s share going to authors and publishers.

What’s Going to Happen

If the Settlement is approved by the presiding judge, as is likely, then here is what will result:

• We will be able to search for, and find, every book, and will be provided with information about its availability in the nearest library, and allowed to purchase online reading access, and be enabled to purchase a print copy at a fair price to be delivered to our office or home;

• We will be able to walk into any public library or college library, and read the full text online of every book in Google’s vast database, for free;

• Colleges and universities will be able to license community access to Google’s entire book database--allowing students and faculty to read and print every book.

This will happen because Google--at its own expense--will scan, process, and save a digital copy of every book it can get its hands upon. We readers will thereby benefit from the enlightened self-interest of our rich "Uncle Google."

Uncle Google will include all books in the “public domain,” that is, books that are out of copyright. There are a few million of those.

Google will include every in-copyright book that has gone out-of-print. In the Settlement, these are referred to as “commercially unavailable.” Out-of-print books are those titles that the publisher has decided to cease printing, cease stocking, and cease selling. These titles dwarf the size of the books that are currently “commercially available.” There are an estimated 20-30 million such titles in the US copyright arena.

Finally, we can predict that the Google database will also include virtually every in-print book, because publishers would be crazy not to make their stuff available through what will quickly become “the largest bookstore on earth.”

Google’s Folly?

My biggest concern about the GBS is that Uncle Google will decide it is a bad investment and back out of it. The Settlement envisions that possibility, and provides a mechanism for transferring the deal to another entity or entities, should Google back out.

The settlement establishes a non-profit organization to administer the deal on behalf of the rights-holders. It is called the Book Rights Registry. This Registry will be controlled by eight Directors—four selected by the American Association of Publishers, and four by the Authors Guild.

Why them? Because they were the ones who brought a class action suit against Google, on behalf of all US authors and all US publishers.

The economics work like this: Google collects its licensing fees, sales income, and whatever other revenue it manages to wrest from the book database, and sends 63% to the BRR. The BRR divides the lucre between various rights-holders. In this, the BRR will be very much like ASCAP, which collects royalties for music performance, and divides it among its "Composers, Authors and Publishers."

Under most book contracts, the rights to the work revert to the author once the publisher decides to stop selling the book. Under the terms of the Settlement, Google will be free to scan any book that is deemed not to be “commercially available.” A rights-holder can opt out of this arrangement, but there is very little reason to do so. Allowing your book to be brought back to life by Google will make you a little bit of money, make you “searchable,” and share your wit and wisdom with generations to come. And the deal is non-exclusive, so it doesn’t prevent you from making other deals.

Rights-holders do need to let the BRR know that they are out there, and give an address so the check can be sent. The agreement envisions that each rights-holder will receive a $200 per title "inclusion fee" for allowing Google to scan the book and make sections of it viewable on their site. Once Google launches its Subscription Service, rights-holders will get a small cut of the receipts. Should anyone pay for the privilege of viewing and “owning” an entire book, the author or publisher gets a piece. And should someone want to order a physical copy of a book, produced through a print-on-demand service like Lightning Source or Booksurge, then the rights-holder gets a cut of that too.

There are many ways that Uncle Google can make money from his book database. First, he will sell annual licenses to the full database, and to subsets of the base (Poetry, perhaps, or Self-Improvement). These licenses can be sold to school, college, and university libraries, as well as to printshops, corporations, and other commercial entities.

Public and college libraries get one free license for one machine in each library branch (or for every 4,000 – 10,000 students), the long lines at this one machine may cause them to purchase additional licenses—at a discount, one would hope, but a price will be paid.

Google will allow individuals to "purchase" digital access to its books, with the price being set either by the rights-holder or by Google. Google will also allow individuals to print out sections or an entire work, at a per-page fee. Advertisements will be placed alongside books— a microscope supply company next to Germ Hunter, for instance. A nanny service next to Jayne Eyre.

The draft settlement indicates that Google does not intend immediately to sell books through print-on-demand, but that it may decide to do so in the future. Similarly, the company may undertake sales of e-books for the Kindle, the Sony Reader, and other handheld devices.

In the strictest economic terms, Google is one of the few companies that has figured out how to monetize eyeballs. To Google, books are just more "content," another chunk of stuff that can be spidered and indexed and given an algorithmic massage, so that more people, spending ever more time in front of their screens, will keep on googling.

Disruptive Technology

New technologies are almost always disruptive. Once they are adopted, things begin to change. Sometimes whole industries change. Sometimes whole societies change. In this case, the combination of the Internet, e-commerce, and print-on-demand technology is upending the book publishing business. It will also up-end the library business.

While most librarians may be aware of POD, it is somewhat less likely that they will know about the Espresso Book Machine. Put this machine together with a book database, and you have the makings of yet another revolution.

The EBM does something very cool. It can print out a 300 page book, with color cover, in four minutes. The end product looks just like a “real” book—because it is a real book. Perfect bound, good paper, clear type. It’s a book.

The soon-to-be-released 2.0 version has a modest footprint, something like 6 ft. by 3 ft.. It will fit nicely in a bookstore, or the lobby of a library, or in a coffeeshop. The EBM has been called “an ATM for books.” It is not quite that yet, but the analogy is spot-on.

The materials cost for a 300-page book is just under $3.00. Amortize the cost of the machine itself, and you have a per-book cost of at most $6.00. Hook it up to the one-million-title Internet Archive and you can publish a lot of interesting and valuable titles—all free of copyright charges, because the books are in the public domain.

Now imagine hooking the EBM up to the Google database. If Google keeps on trucking, then what you will have is the ability to print pretty much every book ever written. Right there in your library.

From a business point of view, it makes perfect sense. The old model of “print, then distribute,” is completely flipped to “distribute, then print.” Very much like what has happened in the Music business—except that in the near term, the physical object (the book) will be the preferred outcome.

No doubt some librarians will worry about what these new books will cost? Indeed, what will it cost to buy a permanent “view” of a book on Google’s database? And what will libraries have to pay to have seven, then ten, then thirty million books available as a subscription?

The answer is, most likely, less than you might think. Remember that music didn’t really start to sell until Apple’s iTunes priced everything at 99 cents a song.

Google has proposed that initially the purchase of permanent e-access to a book will range from $1.99 to $21.99. Nothing higher, and a staggering 65% of the titles will cost less than $7.99. Google will seek permission from the BRR to adjust these rates after three years, and to price titles according to computer-driven algorithms that produce maximum revenues.

As for printed books from the Espresso Book Machine, one should expect that they would cost no more, and probably significantly less, than a book one buys at a Barnes & Noble store, or through Amazon.com.

Library subscriptions to the entire Google Book database are harder to predict. One of the most expensive databases in the academic library world is Science Direct. Each of America’s top research universities pays more than $1,000,000 per year to license its contents for their students, researchers, and faculty. They pay this astronomical sum because Science Direct provides access to more than 4,000 top journals in Science, Technology, and Medicine.

Google’s strategy will probably be quite different than Elsevier, the owner of Science Direct. Rather than “sell high” to a few institutions, Google will want to “sell low” so that its base becomes ubiquitous. After all, it wants customers for life, and college students will lose access to “their” books when they graduate—unless they purchase the individual books in print, or purchase permanent viewing rights.

An Alternate Universe

The Academic Universe is not the same as the Business World. The denizens think different. Differently. Different-ur.

Thus the academic “take” on the Google Book Settlement may be quite distinct from the Publishers’ take, or the Wall Street take, or the view at the Department of Justice. I know this because I live and work in the cat-bird seat of the Academy: the Library.

The Academic Library has its nose in everything that every scholar has done, is doing, or hopes to do. To us, Google is just another information feed, but it’s one hell of a game-changer.

Libraries keep things. And Research Libraries keep almost everything. Harvard keeps twelve million volumes. Stanford, Yale, UNC, and Penn all have between six and ten million books. At the Penn Libraries, we keep almost two million volumes in a high-density storage facility in the old printing floor of the Philadelphia Bulletin Building. Princeton, Columbia, and the New York Public Library have a shared storage facility in a large tract of land in central New Jersey. Books from these units can be retrieved within 48 hours.

But once Google comes fully online and virtually every book in any of these facilities is discoverable, readable, printable, even print-on-demandable at your local library or bookstore or coffeeshop—then do hundreds of libraries really need to keep copies?

The answer will certainly be that they do not, and the trend toward regionalization and consolidation of holdings will accelerate dramatically. One can see the day when only a few copies will be preserved, with one or two designated as “master” copies, to be held forever.

They will be held forever for at least two reasons. The first reason is that the best preservation method for the text that is contained in a book is still a book. Ink on paper outlasts every other text storage methodology that we have devised—certainly every digital technology.

Paper and parchment have proven to be very durable long-term storage devices. A book of Shakespeare’s sonnets, printed in the 17th century, can still be read by today’s reader without the aid of any device, save perhaps glasses.

The second reason that physical copies of book will be maintained is that the book itself is an object of scholarly interest. Dozens of centers for the “History of the Book” have developed around the world. To such scholars, the physical object itself—the book as artifact—in essential.
It is reasonable to expect that some vast library collections will become, in effect if not in name, Museums of the Book. And that people will travel to these museums to look at books, read books, and study books. Many of the great “special collections” libraries already play this role. One thinks of the Newberry Library in Chicago, the Morgan Library in New York, the Huntington Library in California, the Houghton Library at Harvard. And, of course, the Library of Congress.

But most research libraries will have the option of getting out of the massive book storage business.

The Digital Preservation Blues

Some of the “participating libraries” in the Google Books program have described it as having “preservation” as a major outcome. This is a dubious claim.

Oya Rieger is a senior librarian at Cornell. Her responsibilities include electronic scanning of books and manuscripts, the maintenance of a digital repository for the articles and papers that are produced by Cornell faculty, "e-scholarship" programs, "e-publishing," and digital preservation.

Long before Google decided to hoover up the world's books, Cornell, Michigan, Penn, and a handful of other institutions had begun the slow, careful process of scanning print materials, putting digital "facsimiles" up on line for all the world to read, and worrying about how these digital records would be maintained for future generations of scholars.

These efforts look rather puny in comparison to the Google operation. Cornell was digitizing about 1.5 million pages a year. That sounds like a lot, until you realize that this represents only 5,000 books. In contrast, the Google-University of Michigan initiative is scanning 30,000 books per week.

Rieger was asked to undertake a study of such large-scale digitization initiatives, and to ask whether they served the need for digital "preservation." Her conclusion:

[T]here is no evidence to suggest that the corporate and non-profit partners have any long-term business plans for maintaining access to digitized collections or for migrating delivery platforms through future technology cycles.

In other words, No.

Google has not undertaken its enormous scanning project with preservation in mind. The goal is current online access. Therefore, the initial scans were considered "good enough" if they could be easily read easily on a screen, and if 95-99% of the words could be "recognized" by OCR software (which converts pictures of words to machine-readable—searchable—text).

This is a perfectly understandable decision, from a business point of view. And the cooperating libraries—Michigan, Stanford, Harvard, Oxford—are to be commended for allowing Google to create a very good thing, even if it is not the ideal thing.

Rieger's study, "Preservation in the Age of Large-Scale Digitization," sets out what "the ideal thing" might require. A true preservation program requires very high quality-control standards. It may sound downright unappetizing, but a preservationist must deal with such things as ingest workflow, file format migration, and bit corruption. Suffice it to say that a complete re-scanning of every volume—under strict quality control standards-- may be just the start of a true digital book preservation program.

GBS Questions for Librarians


Broadly speaking, the Google Book Settlement is a good thing. However, I have some very specific questions that I have not seen addressed by the Library community.

1. The Book Rights Registry is a non-profit entity that plays a critical role in administering the agreement. The BRR is controlled by four author representatives and four publisher representatives, with five votes needed for decisions. Why aren’t there any voting library representatives on this board? Or “public” representatives.

2. This question is made more important by my reading of what happens if Google decides that the book-scanning business is a money sinkhole. We saw Microsoft bail out of the LiveSearch business, so this is not moot. If Google bails, then the BRR takes over the business (with some library participation) and must seek new commercial or non-commercial partners. All the more reason to have some “public” directors.

3. Here is something cool to think about. In-print books and public domain books appear to be the tip of the iceberg. The greatest number of titles are out-of-print but still in copyright. I have seen estimates of 20-30 million titles. In most cases, the rights to such works may have reverted to the author. Google is going to have a green light to scan these titles for inclusion in its database, and for selective display, and commercial use, unless the author formally objects. The author will get a cut of any revenue, through the distributive mechanism of the BRR. All well and good. But this also opens an interesting opportunity. Allowing your book to be in the GBS is non-exclusive. Therefore, authors could also give publication rights to a non-profit entity, perhaps their university library, perhaps to a coalition of libraries.

4. I have heard that Google’s scans are not preservation quality, and perhaps not even print-on-demand quality. That mass scanning and machine-only OCR cause many quality problems. This includes missed pages, pages that are blurry, pages that are cut off, foldouts that are skipped or distorted, meaningless word translations, and so on. Google itself is at pains to say something about this in the draft settlement:

17:10 Scan Quality. Google will strive to detect and eliminate errors in the Digitization quality or Metadata. Google makes no guarantees, however, regarding the Digitization quality or Matadata quality of any Book or Insert….


5. A related question: the Google Agreement is between the company and authors and publishers. Artists, photographers, and illustrators are not included. I have heard that this will mean the images in an in-copyright book will be blanked out. This would be a terrific lose to general readers as well as scholarly readers. One hopes that Google is pursuing a comparable “deal” with these groups.


6. A different sort of question is this: If Google is successful, then virtually every book ever printed in English, and millions of titles in other languages, will be available to read, print out, and purchase through print-on-demand. So most academic research libraries can get out of the book storage business, right?

You can see what I mean: save a few preservation master copies, and a dozen circulating copies for those who want to study the book as artifact, and dump the rest. For most of our patrons, if they want to read the book on paper, a printed facsimile should do just fine. Are we ready to crawl out on that limb?


7. I am really puzzled by what is said about “mining” the GBS database. Only “non-consumptive” research will be permitted, and scholars must apply for permission to use the database, stating their intent.

What the heck is "non-consumptive" research? Here is how the draft settlement describes the term:

"Non-Consumptive Research" means research in which computational analysis is performed on one or more Books, but not research in which a researcher reads or displays substantial portions of a Book to understand the intellectual content presented within the Book.
Got that?

This might appear to mean that you can count words and analyze patterns, but you cannot see the words or phrases in context, if seeing is indeed "consuming". Take this bit of possible research: Suppose you wanted to study how widely the term “fulsome praise” has transmuted from having a negative connotation to having a positive one. You would have to see the phrase in context, which means that you have to "consume" some additional words, maybe even a paragraph or two.

I have been assured by a representative of one of the chief library partners, that such research will be allowed. And, indeed, the settlement indicates "Linguistic Analysis" will be allowed. This is defined as "Research that performs linguistic analysis over the Research Corpus to understand language, linguistic use, semantics and syntax as they evolve over time and across different genres or other classifications of Books."

OK. So we can eat a few words, but not a "substantial" amount of words. One hopes—and assumes—that "consumptive" research which allows reading to "understand the intellectual content" of a work will be provided for under the "subscription" service. However, the subscription database would have to be organized to aid massive analysis by computer, and the ability to jump out to the text, and back in to the data. There is no indication in the settlement that the subscription database will be optimized for scholarly inquiry. Indeed, it appears quite the opposite—that there will be, in effect, two databases—one for substantial reading, and one for "non-consumptive" research.

Google will allow the establishment of two outside research bases, both of which are restricted to “non-consumptive” research. It is likely that a coalition of libraries led by the University of Mishigan will manage one such database. And I wouldn't be surprised to learn that Stanford will have first dibs on managing the other (but see my blog on the shakeup at the Stanford Library: http://musingsofcorsonf.blogspot.com/2008/12/shakeup-at-sul-stanford-university.html )

If my conjecture is correct, then the settlement represents a very significant loss to the academic community--the loss of true "consumptive" research.





The full text of the draft Google Book Settlement can be downloaded from
http://books.google.com/booksrightsholders/agreement-contents.html
See Also:
1. ALA/ARL Overview of Settlement:
http://www.arl.org/bm~doc/google-settlement-13nov08.pdf

2. Principles and Recommendations for the Google Book Search Settlement, by James Grimmelman
http://laboratorium.net/archive/2008/11/08/principles_and_recommendations_for_the_google_book

3. "Preservation in the Age of Large-Scale Digitization," by Oya Y. Rieger, A report to the Council on Library and Information Resources, February 2008. http://www.clir.org/pubs/abstract/pub141abst.html


Permission is granted to circulate this article, and to print and copy for non-commercial purposes.

Adam Corson-Finnerty is a senior library administrator at the University of Pennsylvania. He is the author of three obscure books that will be affected by the Google Book Settlement. The sentiments expressed in this article are purely his own and do not represent the views of his library or his university.

Comments are invited: corsonf@pobox.upenn.edu 215-573-1376

Sunday, December 14, 2008

Shakeup at SUL (Stanford University Library)

[Note that this article represents the opinion of the author only; that I have drawn my information from documents themselves and not from any contact with Stanford faculty or Library staff; and that I really do love books.]


Shakeup at SUL (Stanford University Library)


No, I’m not talking about personnel changes. This is much more serious than that.


The Stanford University Libraries just got what many of us in academic libraryland have prayed for: Massive Attention from its faculty. The results may prove that librarians should be careful what they wish for.


On November 13, 2008, the Stanford Faculty Senate was presented with a major report from its Committee on Libraries (C-Lib). The report was accepted, thus resulting in what appears to be a stinging whack on the head to both the Provost and the Library Director.


The Provost is currently reeling from a far more serious blow: a reduction of perhaps $100M in the General Funds budget over the next two years. This shortfall has caused Provost Etchemendy to require his administrators to present three plans that cut their budgets by 5%, and 7%, and 10% for the next year. (See: “President, Provost, Deans Ax Own Salaries” at http://www.stanforddaily.com/cgi-bin/?p=1801 )


As painful as these cuts may be, they may not affect the Stanford Library as profoundly as will the Faculty resolution. This unanimous decision sets a two-year process in motion—one that will steer away from a “virtual library” course toward what the C-LIB calls a “hybrid library.” And the hands on the tiller will be faculty hands, student hands, librarian hands, and old hands. Old as in “old-fashioned.”


OK, so maybe I am being overly dramatic. Turns out that a “Hybrid” library is what we are all aiming for: a library system that delivers the best digital resources *and* the best paper resources. Not a big deal—as an ideal—until you get down such things as money, personnel, shelf space, remote storage, browseability, money, Stanford’s Google Books partnership, physical space allocation, money and the positing of a model that’s supposed to be just dandy for the next 50 years.


The Back Story


Our tale ostensibly begins in 2007, when it was announced that SUL’s Meyer Library was going to be torn down and replaced with an academic computing center. Meyer Library happens to house the main corpus of the East Asian Library—about 350,000 volumes. The plan was that these books would be moved to the larger Green Library, and a large chunk of Green’s holdings would be moved to an off-campus high-density storage site. As C-LIB reported, “Faculty in many of the affected areas were alarmed by the specter of a good part of their research material leaving campus…”


So a Town Meeting was called. Provost Etchemendy presided. As faculty aired their objections and posed their questions, “the outline became clear of an already existing high-level decision not to build any expansion of bookshelf space on the central campus. This direction took most of those in attendance by surprise, for the faculty had not been polled widely nor publicly on the issue.”


The plot thickens. High-level decisions made in closed-door meetings; Faculty ignored; Bookshelves in stasis. Big mistake.


Mistakes don’t usually make for an interesting story. One needs an evil villain. Fortunately our tale has just what the story-doctor ordered: Google.


Yes, Google. Do-no-evil Google. Rich Uncle Google, who only wants to gather up every little bit of stray information, like little lost sheep, and place them in a nice, neat corral—so they can be easily found.


Stanford, we now recall, is one of the two keystone libraries in the Google Book Scanning program. The libraries at Stanford and U. Michigan agreed to open their stacks for Google’s massive digitization program, including volumes still in copyright. The Author’s Guild and the American Association of Publishers didn’t like this idea. They claimed Copyright was being violated. So they sued.


By the time a tentative settlement was reached, in October 2008, over 7,000,000 books had been digitized. Google agreed to split revenues with authors and publishers, and received a green light to continue. Stanford and Michigan were vindicated in taking the risk, and everyone lived happily ever after. Not.


Stanford’s C-Lib committee seems to have become unhappier and unhappier, the closer they looked into the implications of the Stanford-Google deal. “Our committee has concluded that a headlong leap into Google Books does not hold out equal promise to all disciplines within the University and threatens, in fact, to stifle research in some of them. We have tried to ask: what type of planning would we be doing today if Google Books had not come along?”


Indeed, the committee looked again at the Library’s formal, strategic plan, and discovered a wolf in sheep’s clothing: Our effort will be to maximize desktop access to content. Little had they realized what this phrase might portend.


With Google clocking at 7M books and heading for at least 15M, maybe 30M, the real possibility exists that the Stanford Library—and every other library—could dump their books and point customers to their screens.

Of course, somebody should keep a few master copies of every book, both as a preservation mechanism, but also so that the book-as-artifact can be studied. But it doesn’t have to be Stanford, does it?


Why can’t Stanford eschew the book storage business and Go Completely Digital? Isn’t this the inevitable future for most and/or all information-providing institutions?


Perhaps, but not in the lifetimes of the current Stanford faculty, or the next generation of scholars, or the one after that:

The sub-committee believes it will require at least two generations of

faculty renewal—something like 50 years—before electronic media take precedence over paper support in some fields of inquiry. Even then, serious research libraries will need to be hybrid institutions, able to fulfill seamlessly, and at the highest level, the needs of scholars working on both sides of the electronic/paper divide.


The electronic/paper divide. Who knew?


Having a Good Browse


If there is anything that characterizes the tone of the subcommittee’s report, it is an appreciation of browsing.


Browsing is a spatial practice within a physical domain described by an immediate research question. It is a process of discovery intimately shaped by the structures of a vertically-integrated library: at once human in scale (a reader’s body moves physically through a library), and psychologically satisfying for its moments of insight. Libraries of the future, whatever technologies they embody, should be mindful of this tradition and be designed to enhance the benefits of browsing, not render it obsolete.


Let me say right here that I adore browsing. What could be more satisfying that wandering through the stacks of a great library, seeing what surrounds a book that you have gone to fetch? Since my office is in the library, I often find myself returning with 10 books when I thought I was only needing one. The phrase “happier than a pig in mud” comes to mind.


But I am 64 years old. In the world of my youth, a library was a building. Now it’s a screen.


To read the C-Lib report is to engage in a wander down memory lane. To get us in the proper nostalgic mood, the authors begin with the 2008 Oxford Dictionary Online definition of a library: LIBRARY. A building, room, or set of rooms, containing a collection of books for the use of the public or of some particular portion of it….


The subcommittee notes that the majority of Stanford’s faculty (presumably meaning tenured faculty) “were trained using libraries of the sort just described.” And, further, that the art of browsing was highly important to their intellectual development, and should be important to undergraduate students, graduate students, and junior faculty: “Browsing is not a search through a vast panorama of knowledge. Indeed, the qualitative differences between browsing and searching are non-trivial.”


Non-trivial. A good term to keep in mind, since the subcommittee’s recommendations, adopted by the Faculty Senate, are certainly non-trivial.


But back to browsing. Much as I adore browsing, I have to note that in my 16 years at an academic library, I don’t see a lot students keeping me company. The aisles are not crowded. And I certainly do not see a lot of faculty. If you want to see a lot of browsing, go to a Barnes & Noble. Go to a supermarket. Go to a hardware store. There, you will see some browsing. You may even bump into people, even find yourself reaching for the same head of lettuce. Seriously, go someplace where active browsing takes place, and then go to the open stacks of a large academic library. I promise that you will not feel crowded.


Sorry, C-Lib, the campus answer to “how do I get to the Library?” is increasingly “Log On.”


Our mental picture of the college library has to be expanded. If we think of an ivy-covered building, or a building of any sort, we are being very twentieth century. The twenty-first century image of an academic library might better be an LCD screen. This could be a computer screen, a PDA screen, a phone screen, even an image on the inside of our glasses. Try this mental exercise: What is the first image that pops to mind when I write: CNN. It might be the concept, “news.” It might be a picture of a face on a screen. But I bet it’s not a mental image of the CNN headquarters building in Atlanta.


The academic library need no longer be thought of as a building or a set of buildings where service is provided and academic work gets done. This radical shift has caused some deep soul-searching about “the library as place.” Librarians and campus planners ask: Just how much physical space does an academic library need, and what should the space be used for?


This is not an easy matter. The physical library has a strong hold on our imaginations, representing more than the sum of the activities that may be conducted within its walls. We can trace this back to ancient Greek and Roman libraries, which not only stored scrolls and later, codices, but also included galleries, reading rooms, and gardens. Thus ancient libraries functioned as social spaces, as places for contemplation but also as places for conversation. Today’s public libraries continue this tradition.


In the United States, the main campus library building is often referred to as the “heart” of the university. It is described as a “crossroads,” an “oasis,” as a “center for intellectual exchange,” and even as “a great place to find a date.” It has been observed that ever since Thomas Jefferson placed the library at the center of his plan for the University of Virginia, campus planners have followed this model.


But the 21st century academic library is “anywhere and everywhere.” It is 24/7/365. At least in theory, it will sustain a scholar who never sets foot in a library building and never borrows a (physical) book. This does not mean that colleges and universities should start closing their library buildings. It does not mean that new library buildings should not be built. But it does mean that the size and the co-located services in a library building are increasingly a matter of choice, and not a matter of necessity.


The “old” library was centered on the storage and accessibility of physical objects containing information: books, magazines, scholarly journals, newspapers, printed theses, manuscripts, VHS tapes, CDs, DVDs. Patrons of the old library had to physically show up in order to achieve their goals. Therefore it made sense to locate reading rooms, study lounges, study carrels, seminar rooms, reference services, research consultation, information skills training, group study rooms, group viewing rooms, classrooms, and even “learning cafes” on its premises.


The “new” library will increasingly be liberated from physical objects. It will still have and make available physical objects but increasingly its services will be digital, distributed, and largely self-managed by its patrons.


This makes possible what I think of as the “minimalist” library. That is, a library with a very small footprint, and perhaps with no “public” spaces. Just computers and offices, and the staff who feed them, and the staff who go out to the classroom, the study center, the lab, the student center, the dorm common room, the faculty offices—to provide coaching and training and to learn about new information needs.


On the other hand, there is nothing that requires a minimalist approach. Campus administrators can decide that they want a crossroads library, one that plays a broad social function for the institution. They can fold in a cafĂ©, a bookstore, they can even design a campus mall of which the library is part. Academic computing can be relocated to the Library, as can courseware support and instruction. An “Information Commons” can be created that brings together various student advisors and high-tech instructors, along with cutting edge equipment and collaborative spaces. Lectures, exhibitions, musical programs, academic symposia, lunch talks, dinners with the President, cocktail parties, film festivals—all this and more can be offered in the library-as-campus-crossroads.

The decision is a choice, and the choice can be different for different units and for different campuses.


The Bookless Library?


At the heart of the Stanford flap is the question of the book. Are printed books essential to the 21st century academic library? Do we have to keep them on campus? Is it important to have open stacks, so that collections can be browsed?


Libraries keep things. And Research Libraries keep almost everything. Harvard keeps twelve million volumes. Stanford, Yale, UNC, and Penn all have between six and ten million books. At the Penn Libraries, we keep almost two million volumes in a high-density storage facility in the old printing floor of the Philadelphia Bulletin Building. Princeton, Columbia, and the New York Public Library have a shared storage facility in a large tract of land in central New Jersey. Stanford has similar high-density facilities. Books from these units can be retrieved within 24-48 hours.


The C-Lib subcommittee clearly finds such facilities distasteful:


High-density storage facilities, like Stanford’s at Livermore, are truly remarkable

machines designed to ensure that no materials are lost. But they have nothing in common with libraries. Rather, they employ procedures used by companies like Toyota to store spare parts: it makes no difference to the filing system what a book is about, because the only parameters that matter are a book’s physical dimensions and coordinates in a grid of shelves and aisles.


This is a far cry from the sylvan groves of learning. Too far for the scholars who agreed to serve on the subcommittee, and whose report was adopted unanimously by the Faculty Senate.


Here are the key recommendations:


  • We recommend that SUL modify its emphasis when explaining its primary mission. We believe it is not “to maximize desktop access to content,” but to provide the most supple and flexible support in a hybrid environment of print and electronic materials.

  • We believe that 5.5 million volumes is a useful and practical figure for the size of a core collection based on center campus.

  • To house this collection, we recommend building a structure in close proximity to Green, possibly underground, featuring state-of-the-art compact shelving with books arranged by call number and accessible for browsing by users. It should hold about 4 million volumes….

The subcommittee recognizes that Stanford will not be able to keep all of its volumes on campus. Stanford Librarian Mike Keller notes that the University purchases between 100,000 and 150,000 new books every year, and Stanford, like most Research Libraries, is already full. If 100,000 books come in, then 100,000 books need to go out. No one at Stanford is suggesting that these books simply be thrown away—although that would be an option (see argument below).


Since the subcommittee wants to keep all of its cake, keep as much of it nearby as possible, and be able to get the rest of its cake quickly and efficiently, it also recommends improvements in finding every slice. That is, improvements to the catalog.


  • No book is to be transferred to SAL3 [high-density storage] until its cataloging has been updated and deepened when necessary, nor before its title page, table of contents, and index are scanned and fully searchable.

There is nothing wrong with this recommendation, and indeed it represents a nice expansion of the catalog as a research tool. It is lucky that Stanford is already in the mass scanning business, since for most research libraries, having to scan and mount this information for each book being transferred would create a severe case of constipation.


Non-Trivial Matters


The C-Lib subcommittee is not retrograde—despite my occasional poke at them. They recognize that digital resources are increasingly important, and in some fields are the predominant form of scholarly exchange. When they ask their readers to imagine a world without Google Book Search, they are not inviting a sojourn in the pre-Internet past, but looking forward to the many changes that will transpire in the years ahead. Will Google always be around, they ask? That question seems unthinkable to the average Netizen, but then, who would have thought that General Motors would sink like a stone?


What the subcommittee wants for Stanford is what we all want for our universities: an excellent digital library combined with an excellent paper-and-artifact library.


Not a problem. Except for one non-trivial item. It’s called money.


The estimated cost to fulfill the subcommittee’s recommendations is $200,000,000. That seems like a huge figure, but the committee notes that Stanford is prepared to raise $200M for new dorms, and $400M for a new business school. It also notes that the Provost was happy to include the latter two projects in his Capital Campaign priority list, but that “libraries were not on the list.” The cover memo for their report goes on to say that, “if Stanford is to imagine itself the top, or a top, university in the world, that ambition will remain no more than a fancy until library infrastructure matches that available in other areas of the university on the Provost’s list.”


And there’s the rub. Money. And in Stanford’s case, space—because the local municipality has placed severe restrictions on how much the University can construct on its campus. But, as the subcommittee notes, the University has managed to build other new facilities. It is all a matter of institutional will and “the determination of Stanford’s leaders.” Clearly this determination has been focused elsewhere, since “in our view, the libraries at Stanford have not been funded with the largesse and vision showered on the research laboratories of the scientists and engineers in our midst.” (My emphasis)


So the President and Provost at Stanford don’t care about the library. If so, they are not alone. Over the past 15 years, Research-1 universities have steadily given a smaller piece of their budget to their libraries. Year after year. Drip, drip, drip—like a Chinese water torture.


Meanwhile, the cost of academic materials has been going up each year; something like 6% per year. Further, a small group of publishers has gotten control of the core academic journals in Science, Technology, and Medicine. This semi-monopoly has allowed them to increase their rates by an average 10% a year.


So let’s look at the academic library in this light. Let’s anthropomorphisize and imagine the library is Mike Keller, or another one of the Directors of Research Libraries around the country. There he is, his forehead getting the drip, drip, drip of reduced resources, while his body is being squeezed in the vise by the big STM publishers, and the book-lovers among the faculty are pulling both arms, while the tech-lovers are pulling both legs as they ask for more licensed databases.


And is Keller supposed to also be taking a 10% salary cut while all this is happening? Geeessshhh!


I am going to stop here. The faculty-in-revolt have done us all a great favor. They have pointed out that the academic library is under-appreciated, under-funded, and overwhelmed with demands—including theirs.


Now what?


**************************************************************************


The report:

http://facultysenate.stanford.edu/2008_2009/reports/SenD6136_c_lib_dig_info.pdf

The Subcommittee’s Cover memo, which is quite pithy:

http://facultysenate.stanford.edu/2008_2009/reports/SenD6152_C_Lib_Cvr_Memo.pdf

The Faculty Senate Resolution:

http://facultysenate.stanford.edu/2008_2009/reports/SenD6160_c_lib_digital_it.pdf

Monday, December 08, 2008

Seven Questions About the GBS Deal

GBS Questions for Librarians

By Adam Corson-Finnerty
December 8, 2008


I believe that, broadly speaking, the Google Book Settlement is a good thing.
However, I have some very specific questions that I have not seen addressed by the Library community.

1. The Book Rights Registry is a non-profit entity that plays a critical role in administering the agreement. The BRR is controlled by four author representatives and four publisher representatives, with five votes needed for decisions. Why aren’t there any library representatives on this board? Or “public” representatives.

2. This question is made more important by my reading of what happens if Google decides that the book-scanning business is a money sinkhole. We saw Microsoft bail out of the LiveSearch business, so this is not moot. If Google bails, then the BRR takes over the whole operation. All the more reason to have some “public” directors.

3. Here is something cool to think about. In-print books and public domain books appear to be the tip of the iceberg. The greatest number of titles are out-of-print but still in copyright. I have seen estimates of 20-30 million titles. In most cases, the rights to such works will have reverted to the author. Google is going to have a green light to scan these titles for inclusion in its database, and for selective display, and commercial use, unless the author formally objects. The author will get a cut of any revenue, through the distributive mechanism of the BRR. All well and good. But this also opens an interesting opportunity. Allowing your book to be in the GBS is non-exclusive. Therefore, authors could also give publication rights to a non-profit entity, perhaps their university library, perhaps to a coalition of libraries. Authors could sign a “creative commons” license for their out-of-print titles, thus adding immeasurably to the Open Access corpus. Shouldn’t we get organized and go after this opportunity?

4. I am really freaked by what is said about “mining” the GBS database. Only “non-consumptive” research will be permitted. That appears to mean that you can count words and analyze patterns, but you cannot see the words or phrases in context. This seems so outrageous that I hope that I am mis-interpreting. A simple example will suffice: Suppose you wanted to study how widely the term “fulsome praise” has transmuted from having a negative connotation to having a positive one. You would have to see the phrase in context. Google will allow the establishment of three research bases, all of which are restricted to “non-consumptive” research. OK, but will the “institutional subscription” then allow datamining *with* context? If not, this is a scandal and academic librarians should be shouting from the rooftops.

5. A different sort of question is this: If Google is successful, then virtually every book ever printed in English, and millions of titles in other languages, will be available to read, print out, and purchase through print-on-demand. So can most academic research libraries get out of the book storage business? You can see what I mean: save a few preservation master copies, and a dozen circulating copies for those who want to study the book as artifact, and dump the rest. For most of our patrons, if they want to read the book on paper, a printed facsimile should do just fine.

6. And yet, I have heard that Google’s scans are certainly not preservation quality, and perhaps not even print-on-demand quality. Does anyone know?

7. Finally, the Google Agreement is between the company and authors and publishers. Artists, photographers, and illustrators are not included. I have heard that this will mean the images in an in-copyright book will be blanked out. Is this true? Has anyone heard of Google pursuing a “deal” will these groups?