BPSD: A Coherent Multi-Version Dataset for Analyzing the First Movements of Beethoven's Piano Sonatas
Creators
Description
This repository contains the Beethoven Piano Sonata Dataset (BPSD), a multi-version dataset focusing on the first movements of Beethoven's 32 piano sonatas. Recognized as pivotal works in classical music, Beethoven's piano sonatas have profoundly shaped Western classical music, holding a significant place in cultural history.
The BPSD includes sheet music in different machine-readable formats and audio recordings from eleven performances, with four of them being in the public domain and freely accessible for research purposes. A key feature of BPSD is its coherence, ensuring alignment of all versions on a unified musical timeline and enforcing consistent structures through careful editing of both score and audio representations.
From a technical perspective, BPSD facilitates the assessment of algorithmic approaches in tasks like harmony analysis, structure analysis, music transcription, beat and downbeat estimation, and score following. The dataset's coherence makes it an ideal platform for systematically training and evaluating deep learning methods, shedding light on their robustness and uncovering data biases across different data splits using cross-version strategies for evaluation.
On a musicological level, BPSD enables the systematic analysis and exploration of Beethoven's piano sonatas, providing insights into their influence on the development of harmony and structure in Western classical music. Beyond research applications, the dataset also holds educational potential, aiding in the preparation and presentation of Beethoven's work to a broader audience through interactive multimedia experiences.
Table of contents
- 0_RawData |
Raw audio and symbolic data |
| - audio_ripped | Audio files as ripped from the CD |
| - AS35 | Recordings by Artur Schnabel |
| - FG58 | Recordings by Friedrich Gulda |
| - FJ62 | Recordings by Fritz Jank |
| - WK64 | Recordings by Wilhelm Kempff |
| - score_pdf_scan | Scanned score from IMSLP |
| - score_pdf_repetitions | Symbolic score in PDF format with repeat signs |
| - score_pdf_unfolded | Symbolic score in PDF format with unfolded repetitions |
| - score_sibelius_repetitions | Symbolic score in Sibelius format with repeat signs |
| - score_sibelius_unfolded | Symbolic score in Sibelius format with unfolded repetitions |
| - score_xml_repetitions | Symbolic score in MusicXML format with repeat signs |
| - score_xml_unfolded | Symbolic score in MusicXML format with unfolded repetitions |
| - score_midi | MIDI export of the symbolic score |
- 1_Audio | Audio files with coherent structure |
- 2_Annotations | Annotations with musical and physical timelines |
| - ann_score_note | Note events with start and end given in musical time |
| - ann_score_chord | Harmony annotations given in musical time |
| - ann_score_localkey | Local key annotations given in musical time |
| - ann_score_globalkey | Global key annotations |
| - ann_score_structureFine | Fine structure annotations given in musical time |
| - ann_score_structureCoarse | Coarse structure annotations given in musical time |
| - ann_audio_note | Note events with start and end given in physical time |
| - ann_audio_midi | Note events in physical time in MIDI format |
| - ann_audio_beat | Beat annotations given in physical time |
| - ann_audio_measure | Measure annotations given in physical time |
| - ann_audio_startEnd | Start and end of audio recordings (for removing silence/applause) given in physical time |
| - ann_audio_syncInfo | Alignment tuples for converting between musical and physical timeline |
| - ann_audio_modifications | Annotations for structural modifications of recordings |
| - ann_audio_chord | Harmony annotations given in physical time |
| - ann_audio_localkey | Local key annotations given in physical time |
| - ann_audio_structureFine | Fine structure annotations given in physical time |
| - ann_audio_structureCoarse | Coarse structure annotations given in physical time |
- 3_Scripts | Phyton scripts to convert raw data into the structured format. Maintained code is available on GitHub |
Other
Audio Versions
ID | Performer | Recording Year | Label | Release year | EAN code | MusicBrainz ReleaseID |
AS35 | Artur Schnabel | 1935 | Warner Classics | 2016 | 0190295975050 | 7bd7338c-2acc-49f4-b262-122085a3e694 |
FG58 | Friedrich Gulda | 1958 | Decca | 1958 | 028948514519 | n.a. |
FJ62 | Fritz Jank | 1962 | Instituto Piano Brasileiro | 2021 | n.a. | available at IMSLP |
WK64 | Wilhelm Kempff | 1964 | Deutsche Grammophon | 1995 | 028944796629 | 38864449-d1e9-4b4f-b5a6-e73acc954e27 |
FG67 | Friedrich Gulda | 1967 | Amadeo/Decca | 1968 | 028947687610 | 83f869ea-fc64-4fe9-b424-52d4282f706f |
VA81 | Vladimir Ashkenazy | 1981 | London Records | 1995 | 028944370621 | 36fcb34f-59ab-3e4d-a066-3067ed82ed33 |
DB84 | Daniel Barenboim | 1984 | Deutsche Grammophon | 1984 | 028941375926, 028941376626 | 261a38ba-9c56-458e-9a4d-c7b6b4acb3a3, b4b49c3d-f86a-4701-b967-d4e726ab8ef0 |
JJ90 | Jeno Jando | 1990 | NAXOS | 1990 | 730099150224 | 2f94e0a3-be66-4894-9c9d-83d5890081da |
AB96 | Alfred Brendel | 1996 | Philips | 1996 | 028941257529 | 6f419224-c6fb-4c38-871e-5799b755a387 |
MB97 | Malcolm Bilson et al. | 1997 | Claves | 1997 | 7619931970721 | 718bac94-7c2c-48ea-8f30-dc230aab019d |
MC22 | Muriel Chemin | 2022 | Odradek | 2022 | 855317003615 | n.a. |
Notes
Files
Beethoven_Piano_Sonata_Dataset_v1.zip
Files
(2.1 GB)
Name | Size | Download all |
---|---|---|
md5:646bd41186de6fc0b47466b830c32556
|
2.1 GB | Preview Download |
Additional details
References
- Johannes Zeitler, Vlora Arifi-Müller, Christof Weiß, and Meinard Müller, "BPSD: A coherent multi-version dataset for analyzing the first movements of Beethoven's piano sonatas." Transactions of the International Society for Music Information Retrieval (TISMIR), submitted 2024.