There is a lot of fuss about this Google/Libraries project announced the other day. More fuss than there needs to be. A new generation of the usual suspects are claiming this is the end of libraries. Been there, done that. Leadership within the library professions seem to view this project as a threat. This is luddite and/or Philistine blah blah about exposing the unwashed to too much information, etc, that I just don't want to get into here.
Quite frankly, I read some articles about this deal and shrugged. Big deal. Libraries do these projects all the time. This is just one of scale. It reads like a big, well-intentioned project, but the devil is in the details. Perhaps a few basic facts would do before people start pronouncing (once again) the end of libraries:
Michigan and four more of the world's top libraries -- Harvard, Stanford, New York Public Library and the Bodleian in Oxford -- announced this week a deal with Google to digitise millions of their books and make them freely available online.
Big deal. A good research library like the graduate library at University of
Maryland contains at least 5 million volumes, usually the number is
near 10 million. If you take the entire University of Maryland
system it has to be near 30 million. A project that scans millions of books from a large research collection may only be scanning 20 or 30 percent. If it's not the best part of the collection, it's worthless.
This just changes the landscape so completely," [John Wilkin, associate librarian at Michigan University] said.
"The research library, which was not very accessible before, will be available to everybody. The focus will start to shift to electronic space for all of our scholarly communications," he said.
This is bullshit. Scholarly research is by nature an exclusive exercise. The extensive studies of tree worms in Borneo, or, whatever, is always going to be the work of a few, brave souls.
The project says it is going to make old and rare books available by digitizing them. I have no problems with this, but I was once trained in paleography and analytical bibliography. The study of old books is by its nature the study of physical objects. When you study a specific book or manuscript you pick it up (usually while wearing gloves), you turn its pages, and you may even shine a special light through it if you're studying a palimpsets. Someone on the internet will get to look at pretty pictures of an old book. This is fine, but if you're really studying the book, you will have to go to the book.
Michigan and Stanford are planning to digitise their entire library collections -- totalling some 15 million books -- while the Bodleian is offering around one million books published before 1900.
Nothing in this article says that Michigan and Stanford are making the full-text of their libraries available for free on the internet. And it clearly says that the Bodleian is only offering books published before 1900, and thus choosing books safely within stringent international guidelines that define public domain.
The Harvard and New York Public Library contributions are smaller, but the entire project is still expected to take up to 10 years, with cost estimates ranging from 150 million to 200 million dollars.
Ten years? Not likely. It'll be longer. The libraries still have to get permission from publishers, and then they have to coordinate collection access in order to avoid leaving large chunks of subject areas out of use for their patrons. Quite frankly, a library isn't going to close access to large parts of their collection because the books are being scanned. Instead, they're going to be scanning books people aren't using, ie. the dust collectors and crap of the collection.
"This is a great leap forward," said Michael Keller, librarian at Stanford University which has been digitising texts on a far smaller scale for several years.
"This new arrangement catapults our effective digital output from the boutique scale to the truly industrial," Keller said.
I wonder who's paying for it? Google? If so, then the libraries made out. Digitization projects are long, slow, labor intensive and very expensive endeavors. All the technological advances of the last decade have not made the implementation of these projects any easier. They're worthy, but very difficult endeavors.
The project will grant global access to landmark publications and other rare out-of-print titles that previously were only available to specialised researchers on an appointment-only basis.
Read here: access to manuscripts and old, out of date books.
The access issue is as much about scope as price, and the Google project may ruffle some feathers in countries like China which still have lengthy lists of banned books.
Fuck China.
"Once you have the research library available to anyone with an internet connection, it's going to be very hard to influence what people can see and what they can't see," said Michigan's Wilkin.
Books which have passed out of copyright and into the public domain will be available in their entirety, while the reproduction of newer titles will require the publishers' permission.
Wow. So those 1930s essays speculating on the life-span of Martians may be available full-text! Or, wait, are they saying they're going to make 20s-era medical studies available? I'm going to swoon.
For Google, the move allows the company to get a jump on its competitors in what can only be an expanding field, and observers say the company will boost advertising revenue through increased user volume.
The only thing this project is doing is adding structured information to Google's capabilities. It's a good business move, but this is not enough to inspire the purchase of Google stock. The full-text of Kansas Territory corn crop records are not going to add much value to Google, folks. Calm down.
Publishers should also benefit, as excerpts of books still under copyright will be accompanied by purchase links.
A ha! Here's the heart of it. The best parts of research libraries are their comprehensive collections of current, contemporary thought on a broad spectrum of issues. ALL of this information is copyrighted, and unless publishers decide they don't need to make money off of book sales, none of this information is going to make it full-text into this project. There's too much at stake for the publishers.
In general, libraries don't own the content of the books on their shelves. They simply provide access to the information. And so the only guaranteed content for this project are books that are clearly in the public domain, much of which is not younger than 75 years old. Google will have full-text of books no one wants to read, and will allow you to purchase recent and current books like...amazon.com.
Where's the innovation in this?
This project is a boondoggle for Google. Five years from now, the project will either be rolling along nicely, or it will be a disaster.