IEEE TCDL Bulletin
space space

TCDL Bulletin
Volume 3   Issue 2
Summer 2007


Pathways Core

A Data Model for Cross-Repository Services

Jeroen Bekaert
Ghent University
Faculty of Engineering
Jozef Plateaustraat 22
9000 Gent, Belgium
+32 9 264 3911
Xiaoming Liu, Herbert Van de Sompel
Digital Library Research & Prototyping
Research Library
Los Alamos National Laboratory
+ 1 505 667 1267,
Carl Lagoze, Sandy Payette, Simeon Warner
Cornell Information Science
301 College Ave
Ithaca, NY 14850
USA +1 607 255 9555

As part of the NSF-funded Pathways project, we have created an interoperable data model to facilitate object re-use and a broad spectrum of cross-repository services. The resulting Pathways Core data model is designed to be lightweight to implement, and to be widely applicable as a shared profile or as an overlay on data models currently used in repository systems and applications.

At the heart of the Pathways Core data model (Figure. 1) are the entity and datastream elements. Entity elements model the abstract aspects of digital objects and align with works and expressions in FRBR. An entity can model anything from a digital object to a collection of digital objects (other entities), to a node created merely to express abstract properties. Core properties of entities are hasIdentifier, hasProviderInfo, hasLineage, and hasProviderPersistence. If a repository attaches providerInfo to an entity, it provides a handle to access the entity from the repository, supporting its use and re-use. Persistence of this handle may be indicated with providerPersistence. The hasLineage property is used to indicate the entity (or entities) from which the entity to which the hasLineage is attached was derived. Other properties, such as hasSemantic, that convey the intellectual genre of the entity (i.e. journal article), can be added. Datastream elements model the concrete aspects of a digital object; these align with items in FRBR, and can be thought of as aspects at the level of bitstreams. An entity may have any number of datastreams. Two properties of datastream have been defined as part of the Pathways Core: hasLocation conveys a URI that can be resolved to yield a bitstream; and hasFormat conveys the digital format of the bitstream. If a datastream has multiple hasLocation properties, resolution of the conveyed URIs yields bit-equivalent bitstreams.

Thumbnail image of poster

Figure 1 - UML structure diagram of the Pathways Core data mode
For a larger view of Figure 1, click here.
Thumbnail image of poster

Figure 2 - An example of Pathways Core data model
For a larger view of Figure 2, click here.

The Pathways Core data model can be serialized in a variety of ways, and, an RDF serialization has been created as reference implementation. We have also conducted the following experiment to illustrate the power of the Pathways Core. A number of heterogeneous repositories implemented an OpenURL-based obtain interface from which, given the providerInfo of an entity, an RDF serialization of the entity compliant with the Pathways Core could be retrieved. Using this interface, an overlay journal can collect serializations of some entities (scholarly papers) from the different collaborating repositories, and assemble those into a new issue of the journal (Figure. 3). The overlay journal then itself implemented the same obtain interface, and as a result, an RDF serialization of the entire journal, an issue, and an article could be extracted. This interface could then, for example, be used by a preservation repository to collect content from the overlay journal for ingest and mirroring. This experiment illustrates how cross-repository services and workflows can be facilitated through support of an interoperable data model (the Pathways Core) and an interoperable service interface (the OpenURL-based obtain interface).

Thumbnail image of poster

Figure 3 - Overlay journal experiment
For a larger view of Figure 3, click here.


© Copyright 2007 Jeroen Bekaert, Xiaoming Liu, Herbert Van de Sompel,
Carl Lagoze, Sandy Payette, Simeon Warner
Some or all of these materials were previously published in the Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital libraries, ACM 1-59593-354-9.

Top | Contents
Previous Article
Next Article
Home | E-mail the Editor