TLDR: HNote is a novel music notation system that extends YNote by using hexadecimal encoding and a fixed 32-unit measure structure to represent pitch and duration. This design provides a consistent and aligned format, making it highly suitable for training large language models (LLMs) in music generation. Researchers fine-tuned LLaMA-3.1 with HNote using a dataset of Jiangnan-style songs, achieving an 82.5% syntactic correctness rate and strong stylistic and structural similarity in the generated music, demonstrating its potential for AI-based music composition.
The world of symbolic music generation is undergoing a significant evolution, largely driven by the impressive capabilities of large language models (LLMs). However, integrating these advanced AI models with existing music formats like MIDI, MusicXML, and ABC Notation has presented considerable challenges. These traditional formats often suffer from complexity, structural inconsistencies, or a lack of the precise alignment necessary for effective token-based learning by LLMs.
To overcome these hurdles, researchers have introduced HNote, an innovative hexadecimal-based notation system. HNote builds upon its predecessor, YNote, by incorporating a fixed 32-unit measure framework. This thoughtful design allows for both pitch and duration to be encoded using a unified hexadecimal vocabulary, ensuring precise alignment and structural regularity. This makes HNote exceptionally well-suited for LLM architectures, enabling models to learn rhythmic structures more effectively and significantly reducing ambiguity in the music generation process.
Each traditional music format comes with its own set of limitations. MIDI, while widely adopted, often generates lengthy and ‘noisy’ sequences, making it difficult for LLMs to grasp the overarching structure of a musical piece. MusicXML, though comprehensive, is excessively verbose, leading to token sequences that can easily exceed an LLM’s context length. ABC Notation, simple and human-readable for monophonic melodies, lacks strict formatting standards and the expressive power required for more complex compositions. Even YNote, which aimed for simplification, did not provide a precise measure-level alignment mechanism, potentially leading to rhythmic inconsistencies in generated music.
HNote addresses these issues head-on by implementing a fixed 32-unit measure structure. This means every measure maintains a consistent length, allowing all note durations to be accurately represented as integer unit counts. This fixed alignment is critical for LLMs to learn and maintain stable rhythmic patterns, thereby substantially improving the quality of the generated music. For instance, a whole note is represented by 32 units, a dotted half note by 24 units, and a half note by 16 units, all fitting seamlessly within the 32-unit measure. Pitches are encoded using two-digit hexadecimal values from “00” to “7F”, while note durations utilize the range “80” to “FF” to clearly differentiate between a note’s onset and its continuation.
To validate HNote’s effectiveness, the research team converted a dataset of 12,300 Jiangnan-style songs, originally in YNote format, into HNote. They then fine-tuned LLaMA-3.1 (8B), a large language model, using a parameter-efficient technique known as LoRA. The training process strategically guided the model with the first and last notes of each line, helping to maintain stylistic and structural coherence in the generated compositions.
The experimental results are highly encouraging. HNote achieved an impressive syntactic correctness rate of 82.5% in the generated pieces, indicating that the outputs largely adhered to the system’s structural rules. Furthermore, evaluations using BLEU and ROUGE metrics, which assess the similarity between generated and reference compositions, yielded strong scores. These results confirmed that the generated music not only preserved local symbolic details but also maintained global structural continuity, demonstrating a high fidelity to the Jiangnan music style. This consistency was observed across both songs from the training dataset and entirely new, unseen Jiangnan pieces.
This study firmly establishes HNote as a robust framework for integrating LLMs with cultural music modeling. It provides a stable and consistent symbolic foundation for music generation, leading to structurally reliable and stylistically coherent outputs. Future work aims to expand HNote’s expressive capabilities by incorporating richer musical annotations such as chords, dynamics, and tempo variations, moving closer to the intricate complexity of human-composed music.
Also Read:
- Unveiling Music AI Decisions: Introducing MUSE-Explainer for Interpretable Music Analysis
- HiStyle: Enhancing Speech Synthesis with Hierarchical Style Prediction
For more details, you can read the full research paper available at this link.


