Google Hacks Together a Shakespeare Site

June 15, 2006

The eWeek headline was actually Google Launches Shakespeare Site, but like so many of Google’s efforts, this is thrown together. I had heard a presentation recently about the flaws in Google’s scanning processes. It was done by Lofti Belkhir, whose company, Kirtas Technologies, has amazing book scanning equipment that Google does not use. (Watch the video here, if you have never seen this kind of technology at work. It is very cool.)

Belkhir showed some woefully bad examples of scanned pages at Google Books. I have written about this before, but Belkhir’s arguments were really good and his examples were hilarious—especially the visible thumbs on scanned pages. So I decided to take a quick look at the Shakespeare titles in the Google site, and the work is very poor. See the following examples, found in only a few minutes of browsing:

— Check out the smeared type at the bottom of this page, where the book was clearly not placed on the scanner properly.
— Look at the faint type in several points on this page. You can find hundreds of pages like this, as they clearly have no method of ensuring consistent quality in the scanning. Note the smeared type at the bottom of this page as well.
— In fact, just keep advancing through that book, and pretty much all the pages have the same problems.
— Then you get about ten more pages into it and you have this page, which is much more grey than black and white, as if they made a one-time adjustment in the darkness setting and then went back to the setting where the type is barely legible in places.
— Check out this page, also with the darkness setting set to high, where you can also see the outline of the text from the opposite side of the page.
— Flip through Othello starting about here and notice the switch back and forth on brightness controls.
— What is at the bottom of this page? Fingers?
— I like this page. What kind of QA process allows that to slip through?
— Look at the right-hand margin of this page, and, yes, I think that is a finger at the bottom.
Ouch. Keep browsing forward; it’s bad.

Want a better collection of Shakespeare? Just go here. Or here. Or here. Or here. Or here.

Lots of people do far better work than Google at this kind of thing.

Posted by Bill Trippe at June 15, 2006 1:50 PM

Comments

I did a post basically stating the same thing that you are. I have copied/pasted it for your viewing below.....

The problems that are confronting Google as it attempts to digitize Shakespeare's works (as well as other books) is best summed up by the blog "Reflective Librarian" and his posting at http://reflectivelibrarian.blogspot.com/2006/06/google-botches-shakespeare-ebooks.html

His posting explains the many problems with Google books, its short comings, and its many problems……sloppiness being the biggest one.

I should know because I have been involved in producing ebooks on the web for a number of years. (see http://www.bookyards.com ). When Google announced their digital project, I felt that there was no longer any purpose in doing our digital project. This perception was wrong....Google's project is too ambitious, resulting in the confusion and mess that it finds itself presently in.

If you want Shakespeare (for free), just go to
http://www.bookyards.com/search_results.html?type=books&author_id=596&author_name=Shakespeare%2C%20William

We have also compiled a good collection of other digital libraries with books available for downloading. Just go to Bookyards “Library Collections - E Books” at http://www.bookyards.com/links.html?type=links&category_id=1780
There are approximately 350 digital libraries separated alphabetically and by category, with over 200,000 ebooks......ebooks that are vastly superior and better organized than what Google books is offering.

Posted by victor at June 21, 2006 2:52 PM

I know someone who was working at Internet Archive -- and understand that a lot of the book scanning there and at Google is done pretty quickly. Quantity vs. quality seems to be an issue.

Having been the grad student photocopying an entire book at the copy machine, I can attest to the fact that it is really NOT DIFFICULT to properly line up a book on a plate of glass and to hold it still long enough to be properly scanned (or copied).

Posted by jenn at July 18, 2006 2:26 PM

Post a comment

Comments for this entry have been closed.

support this blog