A Google hack for libraries
In this age of hacking Google Maps and other Google services, why shouldn't libraries get into the game? Kenton Good posted a great idea to deep link the University of Alberta's OPAC to Google Print, allowing searchers to view a few images of actual pages of the actual book. However, this may not be simple to pull off.
"I am a little annoyed with the lack of hooks into their interface. At first glance at least they seem not to have indexed any ISBNs so linking in via ISBN seems to be a no go. They also dont seem to have a predictable way to link to the main record for an individual item. Take this example of a book called Da Vinci Deception. The URL syntax ends up looking like this: http://print.google.com/print?id=Ht8cIwrFEgkC . Anybody have any idea how Da Vinci Deception = Ht8cIwrFEgkC?"
It is hard to say whether the value of this id field is a feature or an oversight. It does look suspiciously like intentionally unpredictable (and quite possibly cryptographic) 12-character unique identifier. That means that we may have to do this the hard way.
One idea would be to create an HTTP agent that queries Google Print, one title at a time (yes, I know you have millions), and screen scrapes the results page to harvest the right URL, possibly after performing a match on the author. You could, of course, make the agent's request rate "polite" or Google might shut you down. Then, with data in hand, you could load the Google Print URLs into a junk MARC field, and make the necessary OPAC tweaks to display the link. On a go forward basis, whenever new titles are loaded, you could perform smaller batches of scraping.
Having said that, hopefully there is an easier way. If you find one, please share. Have you thought of asking Google for help?
inside the man
Subscribe to:
Post Comments (Atom)
Blog Archive
-
▼
2005
(228)
-
▼
May
(48)
- Einstein's cosmosPBS Religion and Ethics has a tho...
- Today's public service announcementThe loon (Quick...
- The trump card falls: copyright infringement linke...
- A Google hack for librariesIn this age of hacking ...
- Security through human visual discriminationSpurre...
- 30th annual German Protestant Convention has recor...
- Today's public service announcementThe harlequin d...
- If you use PGP, you may be a pervertA disturbing r...
- A list of real web application hacking storiesJere...
- White bison for sale!A white bison, a great omen t...
- greasemonkeyI know that I have been slow to get to...
- Publishers protest Google Library projectFirst Eur...
- OpenID PingPong As a follow up to my earlier post...
- The advantage of redThis CBC Quirks and Quarks pod...
- Today's public service announcementThe snow goose ...
- What to do about "Real ID?" Does RealID make you ...
- Proof of concept: browser-based field encryption w...
- A real (beta) example of an Ajax enabled security ...
- What We Want From Our ILS Vendors Mr. Good points...
- Sikh asylum detainee sues US prison authorities ov...
- Green light for the open-ils to proceed to complet...
- Canadian court rejects music industry's quest for ...
- Go in the English language press!It is exciting fo...
- More on Ajax and secure web communicationsIt has b...
- The web is boring, Google can have itThe register ...
- Today's public service announcementThe great blue ...
- Newsweek apologises for flawed Koran desecration r...
- Hacking is good, and now piracy is good too!What's...
- Celebrating over a year of Bailey the buffalo on t...
- Today's public service announcementThe Caribou (Qu...
- Ajax and secure web communicationsUpdated May 13, ...
- VPN crypto flaw The Register reports an IPSec fla...
- Uproar over US Koran desecrationThe Muslim world i...
- Today's public service announcementThe oft malined...
- Bruce Schneier has posted a scathing critique of t...
- Today's public service announcementThe black bear ...
- Free Comic Book Day 2005 Free comic book day is S...
- FUD at slashdot over Google Web Accelerator Assor...
- Teacher-librarian's lamentA StatsCan report was re...
- US federal court rules that universities do not ha...
- The most amazing toilet in CanadaThe Edmonton Japa...
- Today's public service announcementThe cougar (Qui...
- The SANS Top 20 Vulnerabilities consensus list upd...
- Europe resists Google Print Germany, Hungary, Ita...
- Today's public service announcementThe American Ro...
- Go in the Washington Post The April 28, 2005 issu...
- 2005 Alberta Go Tournament Liang Yu has put toget...
- Photographer, I am notLawrence Lessig speaking at ...
-
▼
May
(48)
About Me
- thrashor
- Edmonton, Alberta, Canada
- Returned to working as a Management Consultant, specializing in risk, security, and regulatory compliance, with Fujitsu Canada after running the IT shop in the largest library in the South Pacific.
No comments:
Post a Comment