Printer Friendly

Googling DSpace.

These days, every gesture by the scholarly publishing camps can easily become a reason for applause, anguish, or angst (or at least a position paper). But one of the biggest surprises of all came recently when Google, the current ruling house of Internet-dom, came calling on the youthful but relatively arcane DSpace project (https://

These two are now involved in an intriguing but vague courtship in which neither party seems to know exactly where this is all going. But what does a pilot project between Google and DSpace mean for scholarly communication? Is this a tipping point or merely a blip on the radar?

Although this is an arguable point, I think DSpace is possibly the digital depository movement's best chance to gain a serious foothold--at least in the U.S. DSpace is a digital repository designed to capture, store, index, preserve, and redistribute the research output of a university's faculty. Launched in 2002, the $1.8 million project was jointly developed by MIT Libraries and Hewlett-Packard. This open source system has a flexible storage and retrieval architecture that can handle multiple data formats and discipline needs. OCLC's OAICat is used to make the records available for harvesting. Currently, 125 institutions are running DSpace repositories.

News of the Google/DSpace joint project emerged from a DSpace Federation User Group Meeting held in mid-March. Initially, 17 institutions, including MIT, will take part in the pilot project. MIT currently has full-text holdings of 3,565 records, the largest of the group. Most have much smaller holdings, typically less than 1,000, and a handful have 100 records or fewer.

Although no specific date has been set, one projected goal is to have a resource that will search across the super-archives at the Google Web site either as an advance feature or perhaps in a designated "intellectual zone."

A Friendly Harvest

OCLC Research also plays a role in the DSpace/Google pilot project. The company intends to periodically harvest OAI-compliant metadata from the institutional repositories of interested DSpace users and then convert it into a format that's suitable for reharvesting by non-OAI services.

According to an OCLC press release issued April 9: "Much of the scholarly material on the Web is missed by harvesters. This includes metadata in OAI-PMH repositories, which DSpace uses. Google has several problems harvesting OAI repositories, which are different from standard Web pages."

For Google, the problem with DSpace is that it uses the Handle system. This is

defined as "a comprehensive system for assigning, managing, and resolving persistent identifiers, known as 'handles,' for digital objects and other resources on the Internet. Handles can be used as Uniform Resource Names (URNs)."

OAI uses nonpersistent URLs to link metadata. Both approaches can interfere with standard harvesting methods. OCLC is going to periodically harvest DSpace metadata and transform it into a "harvest-friendly" format that will be usable by Google and other search services. Google anticipates that it will also search the document's full text.

The Unexpected Craftsman

Until this project announcement was made, few people probably would have considered that a company like Google would have much, if any, interest in helping to steer the course of scholarly communication. If you visit the Google Web site and click on "more," you'll see the many pies the company has its googly little fingers in, with even more pies to come (click on "Google Labs").

Even so, it's hard to know where Google is headed. Indeed, many media and financial analysts have observed that Google is advancing into its initial public offering without a strategic plan. But few companies enter their IPO basking in what can best be described as a "Google glow" that would allow them to pull this off. You can't help but smile when hearing that the company's filing with the U.S. Securities and Exchange Commission has a section labeled with its in-house motto, "Don't Be Evil." Coincidentally, Google announced on May 4 that it intends to establish a foundation that will take on "the largest problems of the world."

Google presents us with some unique challenges. For instance, we can only speculate why an organization that's best known for commercial advertising would embrace the cause of institutional repositories. As the company is lacking a strategic plan, it's not surprising that, at the time of this writing, Google officials have offered a "no comment" on their involvement in the pilot project or how it might fit into the overall company plan.

For an organization that prides itself on having a social conscience, Google's participation could merely be an act of social kindness. Or perhaps the project is just a convenient source of content on which to experiment. Or it could be a stepping stone to territory that has not yet been explored. Since the commercial scholarly publishers have clearly indicated that there's profit to be had in scholarly communication, why shouldn't Google test the waters?

For its part, Google has given institutional repositories a potentially important attention-getting device. Researchers who otherwise might not have bothered putting their work into a repository could find that the Google/ DSpace relationship has enough cachet to give the institutional repositories a look. And although it's not solving one of the "largest problems in the world," the project may help advance the cause of open-access scholarship. While Google's popularity could overshadow better (or even superior) specialized search tools, even those are not yet developed.

On the other hand, Google's commercial nature could also be seen as a drawback. Let's say that Microsoft, not Google, offered the hand of kindness to DSpace. Would we react the same way? In the heat of the current Google-mania, the company has a shiny glow. That's the nice thing about young companies: They don't have much of a track record. But could our collective love affair with Google end? Or worse, could Google become evil?

Robin Peek is an associate professor at Simmons College's Graduate School of Library and Information Science. Her e-mail address is
COPYRIGHT 2004 Information Today, Inc.
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2004 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:Focus on Publishing; pilot project between Google and DSpace, digital repository of academic research
Author:Peek, Robin
Publication:Information Today
Geographic Code:1USA
Date:Jun 1, 2004
Previous Article:Appreciating what we've got.
Next Article:The great American e-mail quiz.

Terms of use | Privacy policy | Copyright © 2020 Farlex, Inc. | Feedback | For webmasters