inside the man

Friday, May 27, 2005

A Google hack for libraries

In this age of hacking Google Maps and other Google services, why shouldn't libraries get into the game? Kenton Good posted a great idea to deep link the University of Alberta's OPAC to Google Print, allowing searchers to view a few images of actual pages of the actual book. However, this may not be simple to pull off.

"I am a little annoyed with the lack of hooks into their interface. At first glance at least they seem not to have indexed any ISBNs so linking in via ISBN seems to be a no go. They also don’t seem to have a predictable way to link to the ‚“main record‚” for an individual item. Take this example of a book called Da Vinci Deception. The URL syntax ends up looking like this: http://print.google.com/print?id=Ht8cIwrFEgkC . Anybody have any idea how Da Vinci Deception = Ht8cIwrFEgkC?"

It is hard to say whether the value of this id field is a feature or an oversight. It does look suspiciously like intentionally unpredictable (and quite possibly cryptographic) 12-character unique identifier. That means that we may have to do this the hard way.

One idea would be to create an HTTP agent that queries Google Print, one title at a time (yes, I know you have millions), and screen scrapes the results page to harvest the right URL, possibly after performing a match on the author. You could, of course, make the agent's request rate "polite" or Google might shut you down. Then, with data in hand, you could load the Google Print URLs into a junk MARC field, and make the necessary OPAC tweaks to display the link. On a go forward basis, whenever new titles are loaded, you could perform smaller batches of scraping.

Having said that, hopefully there is an easier way. If you find one, please share. Have you thought of asking Google for help?

No comments:

Blog Archive

About Me

My photo
Edmonton, Alberta, Canada
Returned to working as a Management Consultant, specializing in risk, security, and regulatory compliance, with Fujitsu Canada after running the IT shop in the largest library in the South Pacific.

CC Developing Nations
This work is licensed under a Creative Commons Developing Nations license.

Site Meter