TLDR: The paper introduces a novel approach to Optical Music Recognition (OMR) for handwritten jazz lead sheets. It presents a new dataset of 293 handwritten jazz lead sheets, along with synthetic versions, and develops an OMR model based on an encoder-decoder architecture. The research highlights the challenges of recognizing chords and handwritten variability, demonstrating the benefits of pretraining, synthetic data, and a specific “medium-level” tokenization strategy for improved accuracy in converting these unique musical scores into a digital format.
Optical Music Recognition (OMR) has long been a fascinating field, aiming to convert musical scores into a digital format that computers can understand and process. While OMR has made significant strides, one area has remained particularly challenging: handwritten jazz lead sheets. These unique musical scores, which primarily encode melody and chords, present a complex puzzle due to the inherent variability of handwriting, quality issues, and the presence of chord symbols – a component often overlooked by existing OMR systems.
A recent research paper, titled “OPTICAL MUSIC RECOGNITION OF JAZZ LEAD SHEETS,” by Juan C. Martinez-Sevilla, Francesco Foscarin, Patricia Garcia-Iasci, David Rizo, Jorge Calvo-Zaragoza, and Gerhard Widmer, tackles this challenge head-on. The authors introduce a groundbreaking two-fold contribution to the field: a novel dataset specifically designed for jazz lead sheets and a dedicated OMR model capable of interpreting these intricate scores.
The Unique World of Jazz Lead Sheets
Unlike classical music, where composers meticulously detail every note, jazz lead sheets offer musicians a framework for improvisation. They typically contain the melody, chord symbols, and sometimes lyrics. Historically, many unofficial collections, known as Fake Books, were created by manually transcribing performances, becoming essential tools for jazz musicians. However, these paper-based scores have limited utility compared to their digital counterparts, which allow for easy editing, transposition, sonification, and integration into analytical systems.
The process of digitizing these handwritten scores is where OMR becomes invaluable. Yet, the task is fraught with difficulties. Handwriting styles vary wildly, and scores often contain “dirty” notation like cross-outs and corrections. Crucially, accurately recognizing and aligning chord symbols, which are often vertically misaligned with notes in handwritten scores, poses a significant hurdle that current OMR systems struggle with.
A New Dataset for Jazz OMR
To address the lack of suitable training data, the researchers compiled a new dataset of 293 handwritten jazz lead sheets, covering 163 unique pieces. These scores were collected from jazz school students and professional musicians in Spanish institutions. Participants were instructed to copy existing digital scores, maintaining the layout, and then either scan or photograph their work, simulating real-world usage conditions. The dataset also includes 2021 total staves aligned with Humdrum **kern and MusicXML ground truth scores.
Recognizing that real-world data can be scarce, the team also generated 326 synthetic score images from the ground truth using MuseScore 4, including a “MuseJazz” font designed to mimic handwritten styles. This blend of real and synthetic data is crucial for robust model training.
The dataset also accounts for quality issues common in handwritten scores, such as strike-throughs, hard-to-read calligraphy, and note-chord misalignments. The authors even adapted digital ground truth scores to match the layout of handwritten versions where discrepancies occurred. A notable challenge was the use of equivalent chord symbols (e.g., “maj7” vs. “∆7”), which jazz musicians understand interchangeably but pose a problem for precise transcription. The paper formalizes a restricted Harte syntax for chords to ensure consistency.
Developing the OMR Model
The OMR model developed for jazz lead sheets is based on an encoder-decoder architecture, specifically adapting the Sheet Music Transformer, a state-of-the-art system for polyphonic music. The encoder, a ConvNext network, processes the image into a hidden representation, while the decoder, a transformer, then generates a sequence of musical symbols in Humdrum **kern format.
A key aspect of their model development involved exploring different tokenization strategies: word-level, character-level, and a novel medium-level approach. The medium-level tokeniser aims for a bijective mapping between graphical symbols and tokens, separating pitches, chord roots, types, extensions, and bass notes, while treating other elements at a word level. This strategy resulted in a balanced vocabulary size and proved to be the most effective.
Also Read:
- Unveiling PianoVAM: A New Multimodal Dataset for Piano Performance Analysis
- AnalysisGNN: A Unified Framework for Comprehensive Music Score Analysis
Key Findings and Future Directions
The experiments revealed several important insights:
- Pretraining is fundamental: Initializing the model with weights from a pretrained checkpoint on polyphonic piano music significantly improved performance across all metrics and tokenizers. Without pretraining, the model struggled to learn effectively from handwritten data alone.
- Synthetic data helps: Including synthetic data during training was beneficial, especially for non-pretrained models, and consistently improved results for pretrained models.
- Medium-level tokenization excels: For the current model and dataset size, the medium-level tokeniser yielded the best results, demonstrating the advantage of a music-informed tokenization procedure.
While the model shows promising results, particularly in chord-note alignment, a qualitative analysis highlighted that most errors were related to chord symbols. This suggests that the model, potentially biased by pretraining on melody-focused data, has fewer opportunities to learn correct chord behavior. Semantic errors, such as missing implicit accidentals based on the key signature, were also observed.
This research marks a significant step towards practical OMR systems for handwritten jazz lead sheets. The authors have publicly released all code, data, and models to foster further research and development. Future work includes expanding the dataset, exploring data augmentation techniques, and incorporating more advanced transformer components and broader OMR tasks in the pretraining phase to develop a more general and robust OMR model. You can read the full paper here: Research Paper.


