Wednesday, February 20, 2008

Google Scholar and commercial publishers

We're currently reviewing our options for federated searching and link resolution services. We've opted to identify possible (within resource constraints) scenarios. One possible scenario is to opt for Google Scholar as the federated search tool (Some universities have gone down this path e.g. University of Pretoria).

Arguably, if we knew which providers of full text Google Scholar crawled we could use it as a federated search tool and let our institutional subscription provide access to content (via IP address restriction).
There's the rub, though. Google are remarkably tight-lipped about what and who they are indexing. It's not clear if that's anything more than apathy.

As background for our review I asked the web4lib list if anyone had seen or built a canonical list. This generated some discussion about Google recalcitrance. Bill Drew wondered if anyone had actually asked Google for a list of what was indexed. Roy Tennant confirmed that he had asked Anurag Acharya (Google Scholar's lead engineer) that question directly and 'got nowhere'. Corey Murata confirmed that and provided a link to Google's Librarian Central's transcript of a Tracey Hughes interview with Acharya:
TH: Why don't you provide a list of journals and/or publishers included in Google Scholar? Without such information, it's hard for librarians to provide guidance to users about how or when to use Google Scholar.
AA: Since we automatically extract citations from articles, we cover a wide range of journals and publishers, including even articles that are not yet online. While this approach allows us to include popular articles from all sources, it makes it difficult to create a succinct description of coverage. For example, while we include Einstein's articles from 1905 (the “miracle year” in which he published seminal articles on special relativity, matter and energy equivalence, Brownian motion and the photoelectric effect), we don't yet include all articles published in that year.

That said, I’m not quite sure that a coverage description, if available, would help provide guidance about how or when to use Google Scholar. In general, this is hard to do when considering large search indices with broad coverage. For example, the notes and comparisons I have seen about other large scholarly search indices (for which detailed coverage information is already available) provide little guidance about when to use each of them, and instead recommend searching all of them.
Will Kurt suggested that we could create our own wiki list of publishers - if someone could set it up ... and then realised he could, through his site:

No comments: