Volume 4 Issue 1
Spring 2008
ISSN 1937-7266

Cooperative Collection Building in NSDL MatDL Pathway
through iVia Data Fountains

Laura M. Bartolo, Cathy S. Lowe

Materials Informatics Lab
Kent State University
Kent, OH 44242-0001
+1 330 672 {1691, 0021}
{lbartolo, clowe}@kent.edu

Johannes Ruscheinski, Diane Bisom

The iVia Project
Science Library, University of California
Riverside, CA 92517-5900
+1 951 827 {2080, 2279}
{ruschein@ivia., dbisom@}ucr.edu

The National Science Foundation established the National Science Digital Library (NSDL) as an online library of resources for STEM education and research to catalyze and support continual improvements across STEM at all levels. Two NSDL projects, the Materials Digital Library Pathway (MatDL) and the iVia Data Fountains Project, are working together on a pilot involving an undergraduate materials science research program and automatic metadata generation.

The NSDL MatDL Pathway is implementing an information infrastructure for stewardship of significant content and services to support the integration of education and research in materials science (MS). The iVia Project is developing a suite of tools that provide automated and/or semi-automated Internet resource discovery (collection development), metadata generation and rich, full-text extraction. This service offers a means of building or augmenting collections that can keep pace with growing numbers of significant resources on the Internet and can help to mitigate the high costs of expert, manually created metadata. MatDL is providing feedback to help refine the iVia tools while streamlining its metadata assignment process by testing iVia with undergraduate research papers from the NSF-funded Cornell Center for Materials Research (CCMR), a Materials Research Science and Engineering Center (MRSEC). This work is resulting in improvements to the iVia tools, in suggested templates for adoption by CCMR with its student research papers, in shaping MatDL’s submission guidelines, and in impacting automatic metadata assignment quality in MatDL to ensure broad dissemination of high quality materials resources.

Cooperative Collection Building in NSDL MatDL Pathway through iVia Data Fountains

For a larger view of the poster click here

The poster presents the results of the pilot study which involved MatDL testing iVia metadata generation on a set of 83 undergraduate research papers in portable document format (PDF) from the Research Experience for Undergraduates (REU) program of CCMR. MatDL and iVia conducted a series of tests to refine the metadata generation capabilities of the iVia tools with PDF documents, improving the results set over time. The most recent version of iVia was able to extract description metadata from multi-columnar text. Good accuracy was obtained for title generation with content-word precision of 91.09% and recall of 89.30%. The exact title was assigned in 84% of cases and an additional 7% of titles were partially correct. Automatic keyphrase generation was also evaluated. For each document, each of five automatically assigned keyphrases was manually rated regarding how well the phrase described document contents. Of the automatically assigned keyphrases, 39% were found to be highly descriptive, 41% acceptable, and 20% unacceptable. IVia continues to improve existing results as well as add additional functionality such as automatic extraction of author metadata from text.