IEEE TCDL Bulletin
 
space space

TCDL Bulletin
Current 2006
Volume 3   Issue 1

 

A Metadata Schema Registry as a Tool to Enhance Metadata Interoperability

Mitsuharu Nagamori and Shigeo Sugimoto
Graduate School of Library, Information and Media Studies
University of Tsukuba, Japan
{nagamori, sugimoto}@slis.tsukuba.ac.jp

 

Abstract

Interoperability is one of the most crucial issues for the metadata and digital library communities. Metadata registries are formal systems that can disclose authoritative information about semantics and data elements to realize semantic interoperability of metadata across domains and cultures. Registries typically store the semantics of metadata elements, maintain information about any local extensions, and provide mappings to other metadata schemas. This article describes the basic requirements and functions for a metadata schema registry. The primary function of a metadata schema registry is to provide reference descriptions of metadata terms for both human users and machines. Based upon our experiences in developing software tools with metadata schema registries, e.g., subject gateways and metadata databases, we have learned that a metadata schema registry has the potential to provide a wider range of services based on metadata schemas. This article also describes some functional extensions to our metadata schema registry in Tsukuba, Japan.

 

Introduction

Since the emergence of the World Wide Web in the mid-1990s, metadata has been recognized as a key technology for digital libraries. Metadata is typically defined as "data about other data" – a simple definition that embraces a broad range of resources from library catalogs and indexes to thesauri, ratings, reviews, and terms and conditions for use. On the Internet, metadata is designed for tasks ranging from resource description and discovery to archiving, trading, content filtering, resource syndication and information management. This diversity of purpose reflects the variety of information resources available on the Internet, ranging from personal web pages to huge portals for government information, digital libraries, and shopping catalogs, as well as the variety of users ranging from young children to businesses and professionals.

There are various communities that use metadata on the Internet – such as digital libraries, museums and e-governments. Each community has defined its own metadata standard, e.g., Dublin Core Metadata Element Set (DCMES) [4], Metadata Encoding and Transmission Standard (METS) [12], Metadata Object Description Schema (MODS) [13], and IEEE Learning Object Metadata (LOM) [6] – according to its purpose. A domain-specific metadata standard fills a community's demands. However, the standard may compromise discovery and reuse of metadata across communities. It is necessary to satisfy both interoperability and domain specificity. Thus, interoperability is one of the important issues for the metadata and digital library communities.

A metadata schema registry (or simply, metadata registry) is a formal system that provides services over metadata vocabularies to users and machines. A metadata schema registry is widely recognized as an important tool not only to share information about metadata vocabularies but also to enhance reusability of metadata vocabularies. It is also known that a metadata registry has important roles for semantic metadata interoperability among communities speaking different languages and over time [20]. Achieving metadata interoperability is fundamental to making information resources shareable and discoverable. This article describes the basic requirements and functions of metadata schema registries. The authors have been developing a metadata schema registry since 1998, as well as developing some software tools for using metadata schema registries. From our experiences, we have found that metadata schema registries have the potential to provide various services based on metadata schemas. This article also describes a functional extension of a metadata schema registry for metadata applications.

Background

Metadata Schemas

A metadata schema defines a framework for representing metadata. In general, a metadata schema includes semantic definitions of terms used in the schema, structural constraints and data structure definitions, and bindings to physical description syntax (such as XML).

A metadata schema consists of the following components:

  1. a set of terms defined to express properties of a resource, e.g., Dublin Core elements such as Title, Creator, and Alternative;
  2. a set of terms that expresses types of property values and/or terms that are used as property values, e.g., in DCMES, ISO-8601, DCMI Type Vocabulary, LCSH and DDC;
  3. a set of rules that defines structural constraints and syntactic features neutral to any implementation-specific description scheme, e.g., mandatory levels, repeatability/cardinality, order, and so on (the set of rules is called an Application Profile, which is described in the next section); and
  4. a set of binding rules for a specific description language, such as XML.

Library catalogs are one type of metadata. Cataloging rules, in general, include guidelines for catalogers to extract values from resources to create catalogs in addition to the semantic and syntactic components listed above. The definition of metadata schema in this article does not include such guidelines.

Metadata schema descriptions are generally given in RDF (Resource Description Framework) Schema language. RDF is a language for representing information about resources, and RDF Schema is a language for defining RDF properties that express metadata terms [17] [18]. In RDF Schema, every metadata term is given a unique identifier that works as its primary name.

Application Profiles

Application Profiles are defined as schemas consisting of data elements drawn from one or more namespaces, combined together by implementers, and optimized for a particular local application [5]. In this article, a set of rules that defines structural constraints and syntactic features of a metadata schema is called an Application Profile. An Application Profile provides a framework to adopt one or more element sets in accordance with application requirements. The Dublin Core Metadata Element Set (DCMES) defines the vocabulary of metadata, i.e., terms and their meanings. But in general, DCMES does not specify encoding nor syntactic characteristics. An exception is the feature included in Simple Dublin Core that states "Any of the 15 elements is optional and repeatable". In addition, local applications may have domain-specific requirements appropriate to a given domain or application, for example:

  • Title, Creator and Description might be mandatory elements but others are optional,
  • Use only Title, Creator, Description, Date and Language elements,
  • Use the 15 elements of DCMES and some elements from other metadata sets such as the IEEE LOM, and so forth.

These requirements can be defined independently of the vocabulary definitions. Any application can have its own application profile, which specifies a set of metadata vocabulary terms used in the application as well as syntactic or structural features of the particular application. The vocabulary terms could be borrowed from one or more source schemas. More importantly, the application profile could be used to define a mapping from the particular local application's scheme to a global metadata scheme, or schemes, which is crucial for metadata interoperability.

Metadata Interoperability

In the digital library and Semantic Web communities, achieving metadata interoperability is fundamental for making information resources shareable and discoverable. The following paragraphs describe the requirements for enhancing metadata interoperability.

(1) Interoperability among different metadata standards
Resource discovery across communities is an important issue. A simple way to realize cross-community resource discovery is to adopt a core metadata schema that is commonly adopted by different communities. DCMES is the most widely known metadata schema for resource discovery on the Internet. However, there are many other kinds of metadata schemas that have been developed by different communities. Each community develops their schema in accordance with their requirements, which means it is not realistic to expect that all communities use a common core metadata schema. On the other hand, a community-specific metadata schema may lose interoperability with other schemas. In order to discover information resources across communities, it is necessary to develop crosswalks that define relationships between metadata terms across the different metadata schemas.

(2) Versioning
It is important to manage revision history of metadata schemas to enhance metadata longevity. Looking at metadata schema versioning from the viewpoint of long-term preservation and access, applications should be able to handle legacy metadata, which may include properties, qualifiers, or value vocabularies that have changed in meaning or approval status over time. For example, the Creator, Publisher and Contributor elements have been merged under the Agent element, a vocabulary for the Subject element has changed according to communities' demands, etc. Thus, the versioning of entire metadata schemas and individual metadata terms or value vocabularies to understand metadata over time is required.

(3) Multiple Languages
Describing metadata schemas in multiple languages is recognized as an important issue for broader usage of metadata within the global communities on the Internet. A huge amount of information resources on the Internet is described in languages other than English, such as Japanese, Chinese, Thai and so forth. For example, DCMES, which was originally defined in English, has been translated into 24 languages in order to make information resources understandable, shareable, and discoverable across languages. The translations are useful to non-English speaking communities for understanding metadata schemas.

Related Works

A goal of metadata schema registries is to make metadata schemas understandable by both humans and machines, and shareable among user communities. Metadata schema registries have captured the interest of broad metadata communities because of the strong requirements of interoperability and longevity of metadata schemas. ISO/IEC 11179 addresses the semantics of the data, the representations of data, and the registration of the descriptions of the data [7]. ISO/IEC 11179 allows the creation of a shared data environment. The Universal Description Discovery and Integration (UDDI) registries act as reference points for Web Services that allow for common descriptions and discovery of those services [16]. UDDI is based on XML standards and is platform-independent. ISO IEC JTC1 SC32 WG2 has been organizing a series of workshops on metadata registries [8].

The white paper reported by the DELOS Working Group on Registries [1] describes basic concepts of metadata schemas, i.e., metadata vocabulary, layers for metadata interoperability, data model, and so forth. The layered model discussed in the white paper gives a framework for metadata vocabularies.

Beginning in January 2004, the JISC IE Metadata Schema Registry (IEMSR) project started development of a metadata schema registry as a pilot, shared service within the JISC Information Environment [9]. The IEMSR will act as the primary source for authoritative information about metadata schemas recommended by the JISC IE Standards framework. Metadata within the JISC IE is based on the Dublin Core and IEEE LOM standards.

The SchemaWeb is a repository for RDF Schemas expressed in the RDF Schema, OWL and DAML+OIL schema languages [19]. It provides a simple directory of RDF Schemas for both human users and machines to search and browse metadata schemas.

Functional Requirements for a Metadata Schema Registry

A metadata schema defines a framework for representing metadata. In order to make metadata shareable and discoverable across user communities and languages, it is necessary to improve on interoperability of metadata schemas. A metadata schema registry stores metadata schemas and serves information about them (e.g., definitions of the metadata schemas, and relationships between metadata terms) to users. A metadata schema registry provides its services not only for human users but also for machines. Human users and machines may have the following requirements:

For human users
A metadata schema registry provides human users with functions to find and browse reference descriptions of metadata schemas, e.g., reference descriptions of a metadata element and an application profile. A person who is engaged in the design of a new metadata schema can use a metadata schema registry as a dictionary to find existing metadata schemas. A person who writes and/or edits metadata, e.g., a cataloger, can use a metadata schema registry as a reference source to understand metadata schema.

For machines
Software tools should be able to access metadata schemas stored in a metadata schema registry through application program interfaces over global or local networks. Software tools can use metadata registries as software components. For example, a metadata schema registry can serve as an information data source of metadata schemas, provide relationships between metadata terms to realize crosswalking, and transform metadata, etc. A metadata schema registry has the potential to provide a wider range of services over metadata schemas. A metadata schema registry should be able to return a result set encoded in a machine-understandable and application-independent format such as RDF. The authors have been working on a metadata schema registry and have been developing a framework to extend functionality of the registry, which will be discussed later.

The extended functional requirements of a metadata registry are summarized below.

(1) Searching
Human users should be able to search for information, e.g., for the name of a metadata schema, a definition of metadata terms, related terms and so forth, stored in a metadata schema registry in various ways. Search results are a set of metadata schema, metadata terms, value vocabularies, etc.

(2) Browsing
A metadata schema registry has to provide user interfaces for human users to browse information stored in the metadata schema registry via a Web browser. Users should be able to browse metadata schemas in different ways.

(3) Schema Mapping
A Metadata schema registry should provide information about the relationships between metadata schemas, terms, and vocabularies. This function is required for the searching and browsing function.

(4) Version Management
It is a crucial function to manage revision history of metadata schemas to enhance metadata interoperability over time. The revision history will support migration and transformation of metadata. This function enhances metadata usability over time.

(5) Multilanguage User Interfaces
A metadata schema registry stores metadata schemas in multiple languages in order to support broader usage of metadata in the global communities on the Internet. A variety of users may use a metadata schema registry across languages. Multilingual user interfaces are useful for users who speak non-English languages. This function enhances the usability of a metadata schema registry for the international community.

(6) API for software tools
Metadata schemas should be provided not only for human users but also for software tools. The application program interface should be provided based on Web Services, i.e., SOAP or REST.

Implementations

The ULIS metadata schema registry developed by the authors provides reference descriptions of metadata terms in multiple languages encoded in RDF Schema [15]. The ULIS metadata schema registry stores DCMES in 22 languages, e.g., English, Japanese, Chinese, Korean and others. We have experimentally stored metadata elements of the Internet Public Library Asia (IPL-Asia) [11] and those of the Nippon Cataloging Rules (NCR) in the ULIS metadata schema registry.

The DCMI Registry Working Group, which was established in December 1999, has been discussing and developing a metadata schema registry [2]. The authors have been involved in the working group since 1998. The DCMI registry, which is in operation, provides authoritative reference descriptions of metadata schema (Figure 1). The reference descriptions are internally encoded in RDF Schema and translated into 24 different languages. The reference descriptions are presented in a user friendly form for human users and in RDF Schema for machines. The application program interface is provided based on Web Services protocols, i.e., both SOAP and REST [3]. Description of each metadata term includes a unique name of the term, language-dependent labels, definition statement of the term, date(s) of issue, type of the term, etc. The DCMI registry is provided as open source software for use by broader communities. As of summer 2005, the DCMI registry has been made available in Germany, China, New Zealand and Tsukuba, Japan in addition to OCLC.

Screen shot of the DCMI Metadata Schema Registry

Fig. 1 The DCMI Metadata Schema Registry

Functional Extension of the Registry

The primary function of a metadata schema registry is to provide reference descriptions of metadata terms both for human users and machines. From our experiences in developing software tools, e.g., subject gateways and metadata databases, we have learned that a metadata schema registry has the potential to provide a wider range of services based on metadata schemas [14] [21]. We have experimentally developed a few functions to evaluate the feasibility of functional extension of the metadata schema registry. The functions presented below are to be incorporated with the basic functions of the metadata schema registry. The functions are materialized in software tools to support information access across metadata schemas, a software generator based on metadata schemas, and a support tool for developing and maintaining metadata vocabularies.

Experimental Study: A Metadata Schema-Driven Software Tool Generator

From our experiences in developing software tools for metadata applications, we have learned that basic software tools such as a metadata editor and a search tool can be (semi-)automatically derived from metadata schemas. Based on this idea, we have been developing an experimental software tool generator for metadata application systems, which uses schema descriptions of metadata vocabularies and application profiles [10] [14]. This experimental system has a set of built-in primitive functions, e.g., to load/store texts from/to a database, to search text in a database, and so on.

The system produces a software tool from a set of XML documents that specifies the functions and the user interfaces of the software tool. The XML document set is named Application System Description (ASD). An ASD is composed of the following four elements.

  • Element Syntax Definition (ESD): definition of syntactical features of the metadata schema required to the application software tool.
  • User Interface Definition (UID): definition of logical structures of user interfaces of the application software tool.
  • System Interface Definition (SID): definition of relationships between user interfaces and built-in functions of the application. SID defines a flow of data to built-in functions prepared for the application.
  • Association Definition: association description of ESD, UID and SID for the application.
Figure 2 shows an overview of the generation process. The generator reads an ASD and definitions of metadata vocabularies. A set of XML texts are created from the UID and SID using syntactic constraints defined in the ESD. The XML texts created from ASD include interfaces to call the built-in functions.

Flow chart showing the generation process

Fig. 2 An overview of the generation process

Figure 3 is a diagram that shows how the software tool generates a metadata-driven software, such as a subject gateway. The software tool generates a metadata-driven software based on an application profile. The metadata instances created and used in the application software tools conform to the syntactic definition. A function repository, shown in Figure 3, stores primitive functions that are used in a metadata-driven software, e.g., editing, browsing and searching metadata. These functions in the function repository are commonly used by metadata-driven software tools. Since user interfaces are derived from a metadata schema that includes class definitions of domain and range of a metadata element, we can choose user interface widgets and built-in functions for the element in accordance with the class definitions.

Diagram of the software tool generator process

Fig. 3 A diagram of software tool generator

Summary and Future Work

The experimental system shown above is a rather straightforward extension of the metadata schema registry. We have found that the separation of syntactic and semantic features is useful to understand the functionality of the extended functions.

From this study and other related studies, we have learned the following lessons:

  • A metadata schema registry can serve not only as an authoritative information source of metadata schemas but also as a center that provides software tools defined in association with the schemas.
  • It is crucial to organize a network of collaborating metadata schema registries in order to share metadata schemas.
  • We need to establish a process model for long-term maintenance of metadata schemas that would allow us to manage the life cycle of metadata schemas across languages.
  • We need to develop a process model for enhancing reusability of metadata schemas across communities. XML-based ontology technologies seem to be useful for developing the process model.

References

1. Baker, T., et al. Principles of Metadata Registries. Available at <http://delos-noe.iei.pi.cnr.it/activities/standardizationforum/Registries.pdf>.

2. DCMI Registry Working Group. Available at <http://dublincore.org/groups/registry/>.

3. DCMI Registry. Available at <http://dublincore.org/dcregistry/>.

4. Dublin Core Metadata Initiative. Available at <http://dubincore.org/>.

5. Heery, R. and Patel, M. "Application profiles: mixing and matching metadata schemas." Ariadne 25, September, 2000, <http://www.ariadne.ac.uk/issue25/app-profiles/intro.html>.

6. IEEE Learning Object Metadata (LOM). Available at <http://ltsc.ieee.org/wg12/>.

7. ISO/IEC 11179. Part1: Framework. Available at <http://metadata-standards.org/11179/#11179-1>.

8. ISO/IEC JTC1 SC32 WZG2. Available at <http://metadata-standards.org/>.

9. Johnston, P. JISC IE Metadata Schema Registry. Available at <http://www.ukoln.ac.uk/projects/iemsr/>.

10. Lee, M. Development of a Software Tool Generator based on Declarative Descriptions of Metadata Schemas and Applications. Master Thesis, Graduate School of Library, Information and Media Studies, University of Tsukuba, Japan, 2005.

11. Lee, W., et al. "A Subject gateway in Multiple Languages: a Prototype Development and Lesson Learned." Proceedings of DC-2003, pp.59-66, Seattle, 2004.

12. Metadata Encoding and Transmission Standard (METS). Available at <http://www.loc.gov/standards/mets/>.

13. Metadata Object Description Schema (MODS). Available at <http://www.loc.gov/standards/mods/>.

14. Nagamori, M. and Sugimoto, S. "A Metadata Schema Framework for Functional Extension of the Metadata Schema Registry." Proceedings of DC-2004, pp.3-11, Shanghai, 2004.

15. Nagamori, M., et al. "A Multilingual Metadata Schema Registry Based on RDF Schema." Proceedings of DC-2001, pp.209-212, Tokyo, 2001.

16. OASIS UDDI. Available at <http://www.uddi.org/>.

17. RDF Vocabulary Description Language 1.0: RDF Schema. Available at <http://www.w3.org/TR/rdf-schema/>.

18. RDF/XML Syntax Specification (Revised). Available at <http://www.w3.org/TR/rdf-syntax-grammar/>.

19. SchemaWeb. Available at <http://www.schemaweb.info/>.

20. Sugimoto, S., et al. "Versioning the Dublin Core Across Multiple Languages and Over Time." Proceedings of SAINT 2001 Workshop, pp.151-156, San Diego, 2001.

21. Sugimoto, S. "Metadata Schemas, Models and Tools - Metadata-Oriented Projects at Tsukuba and Lessons Learned for Interoperability." Proceedings of ICDL 2004, pp.690-699, India, 2004.

 

© Copyright 2006 Mitsuharu Nagamori and Shigeo Sugimoto

Top | Contents
Previous Article
Calendar of Events
Home | E-mail the Editor