Real-Time Genre Classification for Music Digital Libraries
Our system was developed within the Music-to-Knowledge (M2K) visual programming environment. M2K uses a modular approach to rapidly develop and evaluate prototype MDL subsystems (Figure 1). M2K is being developed as part of the International Music Information Retrieval Systems Evaluation project (IMIRSEL).
In this work, a set of 40 audio-based features is used, based on 20 time-varying signal and spectral measurements. These features consist of: spectral centroid, spectral roll-off, spectral flux, zero crossing rate, and a 16-dimension spectral envelope representation. The features are extracted using 25 ms frames with 50% overlap. From each of the 20 signal measurements, means and variances are computed over a 10 second segment producing a 40-dimension representation. This allows the system to continuously update and display classifications in real-time (Figure 2).
A continuous decision tree with bagging (90% sampling with 10 models in ensemble) was chosen for computational performance. The system was trained and tested on 388 hours of music audio representing 14 musical genres. The genres were equally represented with approximately 28 hours of examples for each. With 14 genres the baseline classification success rate for this experiment was 7.1% (random guessing). An ensemble of 10 depth-13 trees was able to correctly classify 72.9% of the test data into the proper music genre. This compares favorably with the 70% success rate of humans attempting to classify audio into 10 genres.
AcknowledgementsProject funders: Andrew W. Mellon Foundation and the National Science Foundation (IIS-0340597 and IIS-0327371).
© Copyright 2005 J. Stephen Downie, Andreas F. Ehmann, and David Tcheng