The AcousticBrainz Genre Dataset
Different communities talk differently about music, even in terms of genres. Is this track dub, reggae or maybe world fusion?
One of the challenging problems in music informatics is how to map genre taxonomies across different music databases.
For example, in the context of music information retrieval, classification tasks typically rely on an agreed answer for ground truth. What should we do, if we can’t find agreement between our ground truth? What if different sources use a label, but source has a different definition?
We’ve been mining metadata for AcousticBrainz for some years, resulting in a new research dataset containing genre annotations from different online sources for up to 2 million tracks with the extracted music audio features.
Read our ISMIR 2019 paper for more details:
The AcousticBrainz Genre Dataset: Multi-Source, Multi-Level, Multi-Label, and Large-Scale. Bogdanov, D., Porter A., Schreiber H., Urbano J., & Oramas S. In 20th International Society for Music Information Retrieval Conference (ISMIR 2019), 2019.
We present the AcousticBrainz Genre Dataset, a large-scale collection of hierarchical multi-label genre annotations from different metadata sources. It allows researchers to explore how the same music pieces are annotated differently by different communities following their own genre taxonomies, and how this could be addressed by genre recognition systems. Genre labels for the dataset are sourced from both expert annotations and crowds, permitting comparisons between strict hierarchies and folksonomies. Music features are available via the AcousticBrainz database. To guide research, we suggest a concrete research task and provide a baseline as well as an evaluation method. This task may serve as an example of the development and validation of automatic annotation algorithms on complementary datasets with different taxonomies and coverage. With this dataset, we hope to contribute to developments in content-based music genre recognition as well as cross-disciplinary studies on genre metadata analysis.