Volume 4 Issue 1
Spring 2008
ISSN 1937-7266

Dryad: A Data Repository for Evolutionary Biology

Jed Dube, Sarah Carrier, Jane Greenberg, and Hollie White

School of Information and Library Science
University of North Carolina at Chapel Hill
216 Lenoir Drive, CB#3360, 100 Manning Hall
Chapel Hill, NC 27599-3360
{jdube, scarrier, janeg, hcwhite1}@email.unc.edu

The Dryad repository project, formerly called DRIADE, supports the preservation, discovery, use, reuse, and manipulation of scientific data objects supporting published research in the field of evolutionary biology.  Dryad development involves a collaboration between the National Evolutionary Synthesis Center (NESCent) and the Metadata Research Center (MRC) at the School of Information and Library Science, University of North Carolina at Chapel Hill.  This poster provides a succinct definition for evolutionary biology and presents the goals underlying the Dryad initiative.  Dryad’s functional requirements are outlined, and key aspects of project development and implementation are described.

The Dryad repository is addressing challenges facing researchers in the field of evolutionary biology regarding data preservation, discovery, and use.  Evolutionary biology is an interdisciplinary field, and therefore collaboration, access and use of heterogeneous datasets, and interoperability among datasets are important for advancing research. Dryad’s system development has included defining the repository’s functional requirements.  As part of this work, the Dryad development team conducted a survey of selected leading digital data and resource repository initiatives.  Among the systems examined were: Global Biodiversity Information Facility (GBIF), Knowledge Network for Biocomplexity (KNB), Science Environment for Ecological Knowledge (SEEK), National Science Digital Library (NSDL), Interuniversity Consortium for Political and Social Research (ICPSR), and Marine Metadata Initiative (MMI).   The scope, goals, and functions of each project were examined with a specific focus on parameters essential for Dryad.  Dryad’s key parameters are: heterogeneous digital datasets, long-term data stewardship, tools and incentives for researchers, minimized technical expertise and time requirements for data deposition and use, open intellectual property rights, and published datasets.  In addition to the survey of similar initiatives, the Dryad team met with stakeholders via a focus group meeting and a workshop.  As a result of this work, the Dryad team was able to better characterize data and metadata associated with evolutionary biology research.  We also employed a multi-method approach to develop a metadata application profile.

As a result of these efforts, we established Dryad’s functional requirements.  The Dryad repository is being constructed to support the following functionalities:

  • Computer-aided metadata generation and augmentation
  • Specialized modules linking data submission and manuscript review
  • Data and metadata quality control by integrating human and automatic techniques
  • Support for identity, authority and data security
  • Support for basic metadata repository functions, such as resource discovery, sharing, and interoperability

Dryad’s functional requirements, in addition to details about the application profile, have been published on the Dryad project Wiki as part of the system documentation.  The Dryad development team has also designed a functional model based on the Reference Model for an Open Archival Information System (OAIS) that incorporates Dryad’s functional requirements.

Current research efforts include a survey on scientist’s behaviors and attitudes toward data sharing and a use-case study gathering data on evolutionary biologists’ experiences with and perceptions about open data repositories.  This work will further inform the development of Dryad’s architecture.

A Data Repository for Evolutionary Biology
For a larger view of the poster click here