IEEE TCDL Bulletin
 
space space

TCDL Bulletin
Current 2005
Volume 2   Issue 1

 

Real-Time Genre Classification for Music Digital Libraries

J. Stephen Downie
Graduate School of Library and Information Science
University of Illinois at Urbana-Champaign
jdownie@uiuc.edu
Andreas F. Ehmann
Department of Electrical and Computer Engineering
University of Illinois at Urbana-Champaign
aehmann@uiuc.edu
David Tcheng
National Center for Super Computing Applications
University of Illinois at Urbana-Champaign
dtcheng@ncsa.uiuc.edu
 
 

Our system was developed within the Music-to-Knowledge (M2K) visual programming environment. M2K uses a modular approach to rapidly develop and evaluate prototype MDL subsystems (Figure 1). M2K is being developed as part of the International Music Information Retrieval Systems Evaluation project (IMIRSEL).

In this work, a set of 40 audio-based features is used, based on 20 time-varying signal and spectral measurements. These features consist of: spectral centroid, spectral roll-off, spectral flux, zero crossing rate, and a 16-dimension spectral envelope representation. The features are extracted using 25 ms frames with 50% overlap. From each of the 20 signal measurements, means and variances are computed over a 10 second segment producing a 40-dimension representation. This allows the system to continuously update and display classifications in real-time (Figure 2).

Thumbnail of a poster from JCDL 2005

Figure 1. M2K itinerary used to generate the classifications.

For a larger view of Figure 1, click here.

Thumbnail of a poster from JCDL 2005

Figure 2. Sample genre histogram output.

For a larger view of Figure 2, click here.

A continuous decision tree with bagging (90% sampling with 10 models in ensemble) was chosen for computational performance. The system was trained and tested on 388 hours of music audio representing 14 musical genres. The genres were equally represented with approximately 28 hours of examples for each. With 14 genres the baseline classification success rate for this experiment was 7.1% (random guessing). An ensemble of 10 depth-13 trees was able to correctly classify 72.9% of the test data into the proper music genre. This compares favorably with the 70% success rate of humans attempting to classify audio into 10 genres.

Acknowledgements

Project funders: Andrew W. Mellon Foundation and the National Science Foundation (IIS-0340597 and IIS-0327371).
 

© Copyright 2005 J. Stephen Downie, Andreas F. Ehmann, and David Tcheng
Some or all of these materials were previously published in the Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital libraries, ACM 1-58113-876-8/05/0006.

Top | Contents
Previous Article
Next Article
Home | E-mail the Editor