ChoraleBricks: A Modular Multitrack Dataset for Wind Music Research
Authors/Creators
Description
ChoraleBricks is a dataset designed to support research in Music Information Retrieval (MIR) with a focus on wind and brass instruments. It features multitrack recordings of ten different chorales, each arranged in four musical parts: soprano, alto, tenor, and bass.
A key feature of ChoraleBricks is its modular structure: the dataset provides isolated recordings of individual parts performed by a diverse selection of wind instruments, including flute, oboe, clarinet, trumpet, saxophone, baritone, trombone, and tuba. These isolated recordings act as building blocks or "bricks" that can be flexibly combined to create full ensemble mixes with varying instrumentations.
Audio demonstrations are available on the accompanying website: https://audiolabs-erlangen.de/resources/MIR/2025-ChoraleBricks
Included Components:
-
Isolated recordings for individual wind instruments (2 hours and 10 minutes)
-
Full ensemble recordings with different instrumentations (all ensemble permutations result in 52 hours and 18 minutes)
-
Sheet music for all chorales in various formats
-
Time-aligned symbolic music representations
-
Conducting videos
-
Reference annotations (fundamental frequencies, note events)
-
Python software tools for parsing, mixing, annotation, and modular combination (available via GitHub: https://github.com/stefan-balke/choralebricks)
Applications:
ChoraleBricks is an open-source dataset that facilitates systematic experimentation and evaluation in various research areas, including:
-
Multi-pitch estimation
-
Note transcription
-
Audio alignment
-
Source separation
-
Music education and interactive applications
The dataset is provided under an open-access license to support the MIR and wind music research communities.
If you use ChoraleBricks in your research, please cite the following paper:
Stefan Balke, Axel Berndt, and Meinard Müller
ChoraleBricks: A Modular Multitrack Dataset for Wind Music Research
Transactions of the International Society for Music Information Retrieval, 2025.
Further information and additional resources are available on the accompanying website: https://audiolabs-erlangen.de/resources/MIR/2025-ChoraleBricks
Technical info (English)
01_AudioAndAnnotations:
- Each chorale is in a single folder, we take Crueger_AufAufMeinHerzMitFreuden as an example to explain the structure:
- alignments/: (for each track)
- Custom CSV export for audio-score alignments, most import columns:
- t_start: Start of the note in seconds
- t_dur: Note duration in seconds
- start_meas: Start measure position, encoded as a real number with a fixed precision of three digits. The integer component indicates the measure, while the fractional component represents the relative position within the measure. An example can be found in the corresponding TISMIR paper.
- end_meas: End position of the note, same encoding as start_meas.
- annotations/: (for each track)
- 01_as.sv: Original Sonic Visualiser (SV) file
- 01_as_f0.csv: Raw SV export of the F0 annotations
- 01_as_notes.csv: Raw SV export of the note annotations
- 01_as_f0_filled.csv: Processed F0 annotations where the unvoiced frames are represented by zeros (recommended F0 annotation file)
- tracks/: Raw tracks as exported from the Digital Audio Workstation (Apple Logic) *)
- tracks_normalized/: Tracks after normalization with Reaper (cf. for details: https://audiolabs-erlangen.de/resources/MIR/2025-ChoraleBricks/dataset-extension)
- Crueger_AufAufMeinHerzMitFreuden.mei: Original score engraving
- Crueger_AufAufMeinHerzMitFreuden.musicxml: Export from MEI to MusicXML
- Crueger_AufAufMeinHerzMitFreuden.mid: MIDI representation of the score
- Crueger_AufAufMeinHerzMitFreuden.csv: Custom CSV export, same contents as the MIDI but as CSV
- Crueger_AufAufMeinHerzMitFreuden.RPP: Reaper project used for track normalization
- metadata_performers.csv: Additional information about the performers (e.g., birth year)
- metadata_songs.csv: Additional information about the songs (e.g., composer)
- metadata_tracks.csv: Additional information about the tracks (e.g., performer, instrument, etc.) <-Main metadata table
- metadata_video_offsets.csv: Offsets to combine audio and conducting videos (preparatory beats are not in the audio, but in the videos)
- alignments/: (for each track)
*) The Apple Logic project files can be made available upon request.
02_ConductingVideos:
- Conducting video for each chorale
- Encoded with H264
- No audio track attached
- The offsets (in seconds) to the audio recordings are available in 01_AudioAndAnnotations/video_offsets.csv.
Notes (English)
Files
01_AudioAndAnnotations.zip
Additional details
Related works
Funding
Dates
- Available
-
2025-03-25
Software
- Repository URL
- https://github.com/stefan-balke/choralebricks
- Programming language
- Python
- Development Status
- Active