Monday, April 06, 2009

Let's Just Do It Ourselves

Presented for discussion purposes in “Proposal” form.


That academic libraries and university presses undertake a comprehensive program to scan books of academic value that are out-of-print. The goals of this program would be preservation, access, rationalization of collection storage, electronic repurposing of content, and the creation of a research database.

Linked with a print-on-demand and e-publishing program, this project could be self-sustaining.

This concept could be combined with the idea of creating a shared publishing entity for the production of new work, including the creation and maintenance of Online Academic Resources (Scholarly Hubs) which by themselves are currently not economically sustainable.


The impending Google Book Settlement shines a spotlight on the universe of out-of-print books (OP). In the Settlement, these titles are described as “not commercially available.” There are an estimated 20-30 million titles in the OP category, and the vast majority of these books are in publishing Limbo. The OP figure dwarfs the number of books in the Public Domain (2-3M), and the number of books considered “in-print" (2-3M)

Once the Settlement is finalized, Google will have a green light to scan OP titles for inclusion in its database, and for selective display, and commercial use--unless the rights-holder formally objects. This is one of the key concessions by the Author's Guild and the American Publishers Association—that is, allowing the "default" to be an assumed permission to scan.
Google's right to scan and sell OP titles is non-exclusive. This means that any other entity, whether commercial or non-profit, could undertake a similar program.

The Settlement will create a new entity, called the Book Rights Registry (BRR). The BRR will oversee the implementation of the program, and will have the power to enter into additional non-exclusive publishing arrangements on behalf of the authors and publishers who have registered with the BRR.

This would enable a consortium of academic libraries and university presses to undertake a major initiative to digitize OP books of interest to the scholarly community and make them available through POD sales, e-reader sales, and sales within each library through a device like the Espresso Book Machine. The income generated thereby could go to the partners and/or to a central entity which manages the program.

Scanned books can also be OCR'd to allow for individual word search across the entire database of books. This will allow new research opportunities.

This initiative could also include public domain titles. And it could include in-print titles. All that would be required would be the permission of the publisher or rights holder, and a mechanism for paying royalties from sales—presumably provided by the Book Rights Registry.

Beyond the Google Books Program:

At first glance, it may seem that the proposed initiative will duplicate the Google Books Program. In some aspects, it will. However, it is important to realize that there have been biases in the Google program which have created significant—and perhaps insurmountable—deficiencies in their scan corpus.

1. The primary purpose of the Google Books Program was to capture the text for inclusion in Google's massive search database.

2. The program was not undertaken as a preservation program.

3. The program was not developed with the requirements of print-on-demand in mind.

4. The program did not have academic research optimization as a goal. In fact, Google sought through its agreements with partner libraries to restrict broad research access to its scans. In the Settlement, Google continues to restrict access to the research corpus.

5. The program emphasized speed and throughput, often at the expense of quality.

6. While the Settlement addresses the author "class" and the Publisher "class," it does not address the rights of photographers, illustrators, and artists whose work may be contained in the books being scanned.

As a result, Google seems to have created a database of over 7 million books that will often yield low-quality images, poor files for print-on-demand, and empty pages where illustrations, charts, and other important material are blanked out. And the files themselves are not of "preservation quality," nor it is clear that other preservation needs (metadata, etc) have been addressed.

It is important to stress that Google does not have the exclusive right to digitize and re-purpose public domain and out-of-print books. Any enterprise, whether for-profit or not-for-profit, can gain the same rights that Google has, through agreement with the Book Rights Registry.

Critics have argued that Google has a true monopoly or a de facto monopoly in this arena. This is not true. In fact, Google has not negotiated any agreement for undertaking print-on-demand, but has only flagged its interest in possibly undertaking such a program. This would appear to leave the field quite open.

I suggest that academic libraries and university presses have the opportunity to focus on a subset of the book universe: books that are of scholarly interest. This does not mean only those works issued by university presses, but whatever individual works are deemed to be of scholarly value.

I further suggest that the economics of this situation allow us to undertake a program that will pay for itself. That is to say, a sustainable program to provide access, preserve books, and share knowledge.

Disruptive Innovation:

The book publishing industry is going through a period of "disruptive innovation," a term popularized by Clayton Christensen in his book The Innovator's Dilemma. A set of new technologies have rocked the foundations of this mature, $25 billion industry. These technologies also allow the entry of new enterprises which can take advantage of new pathways to production and profitability. This includes academic libraries and university presses.

Some of the most significant disruptive technologies are:

  • Internet & Web
  • Digital Download & Upload
  • IPod & ITunes
  • E-Readers & IPhone
  • Google Word Search
  • Mass Scanning Technologies
  • Print On Demand
  • Espresso Book Machine (miniaturized printer/binder)

When these new technologies are linked together, they create dramatic new economies and efficiencies. They also allow for the printing of one copy of a book, perhaps a completely unique copy, at a reasonable price and at a profit., for example, is able to sell books online, e-books, book downloads for the Kindle and IPhone, audio downloads, and so on. has created a do-it-yourself publishing entity that has enabled an explosion of user-generated content in book form (180,000 books, 25,000 e-books).

The Google Book Settlement, through its creation of the Book Rights Registry, bridges a major IP hurdle to scanning and re-purposing book content. The Settlement is expected to be finalized in September, 2009. However, Google has already provided $12M to begin the process of rights registry. (See: )

We can imagine a group of academic libraries, university presses, and cultural institutions, working with a major foundation or foundations—and with commercial vendors and suppliers--to create a non-profit entity that would take advantage of these new capabilities. Preservation and access would be first among the goals.

Using mass scanning techniques, and manual scanning technology, combined with post-processing software, storage, and quality control, this coalition could reissue every book that is registered with the BRR. (It is likely that most authors and most publishers will indeed register, since this is necessary to receive their proper share of Google revenues.)

The coalition could also re-purpose the scanned content for multiple formats and uses. One can image that book sales and license income would underwrite the entire cost of the operation, and generate a surplus that could be applied to new academic publishing, including experimental e-publishing programs.

Interested? Contact me at:


Adam Corson-Finnerty has worked in the research library and non-profit world for over 30 years. He is currently undertaking a two-year study of income-producing opportunities for the Penn Library system. His title is Director of Special Initiatives.

This paper represents his own views, and not necessarily those of his library or his university.

No comments: