homr

homr is an Optical Music Recognition (OMR) software designed to transform camera pictures of sheet music into machine-readable MusicXML format. The resulting MusicXML files can be further processed using tools such as musescore.

Prerequisites

Python 3.11
Poetry
Optional: NVidia GPU with CUDA 12.1

Getting started

Clone the repository
Install dependencies for:
- GPU (requires CUDA): poetry install --only main,gpu
- CPU: poetry install --only main
- Development: poetry install
Run the program using poetry run homr <image>
The resulting MusicXML file will be saved in the same directory as the input image
To combine the MusicXML results from multiple images, you can use relieur

Example

The example below provides an overview of the current performance of the implementation. While some errors are present in the output, the overall structure remains accurate.

Original Image	homr Result

The homr result is obtained by processing the homr output and rendering it with musescore.

Limitations

The current implementation focuses on pitch and rhythm information on the bass or treble clef, neglecting dynamics, articulation, double sharps/flats, and other musical symbols.

Technical Details

homr uses a two-stage pipeline: segmentation for structural analysis followed by semantic symbol recognition via transformer models.

Stage 1: Image Segmentation and Structural Analysis

homr employs UNet-based segmentation models (adapted from oemer) to extract structural components from the sheet music image:

Staff lines and symbols: Detected via trained segmentation networks that identify:
- Staff line fragments
- Note heads
- Stems and rests
- Bar lines
- Clefs and key signatures

The segmentation process generates bounding boxes for each detected element. These predictions serve as inputs for the staff detection algorithm.

Stage 2: Staff Detection and Merging

Using the segmentation outputs, homr constructs staffs through the following steps:

Staff Anchor Detection: The algorithm identifies "staff anchors" (clefs and bar lines) that serve as reference points for accurate staff localization, even when symbols partially obscure staff lines.
Unit Size Estimation: For each staff, the algorithm calculates the "unit size" (distance between staff lines). This accommodates camera perspective variations and non-uniform staff spacing.
Staff Reconstruction: Around each anchor, five staff lines are located and the remaining staff structure is reconstructed using the estimated unit size.
Grand Staff Merging: Braces and brackets are identified to merge related staffs, supporting:
- Grand staffs (piano, organ)
- Multiple voices on a single staff
- Mixed instrument groups

Stage 3: Semantic Symbol Recognition via Transformer

Each staff is dewarped (perspective-corrected) and passed through a transformer-based model (based on Polyphonic-TrOMR) that performs end-to-end symbol sequence recognition. The model outputs:

Rhythm symbols: Note durations, rests, and tuplet information
Pitch information: Absolute pitch values with accidentals (sharps, flats, naturals)
Articulation marks: Accents, staccato, tenuto, and slur markers
Performance annotations: Dynamic expressions and other musical notation

The transformer model generates these predictions in sequence, processing the dewarped staff image to understand the spatial and temporal relationships between musical symbols.

Note: The transformer output provides the sequence of symbols but does not include explicit positional information (horizontal or vertical coordinates). However, the model computes the center of attention as a byproduct of the attention mechanism, which can be used to estimate the focus point on the staff image.

Stage 4: MusicXML Output

The symbol sequence is converted into MusicXML format and saved to disk. The resulting file can be processed with tools like musescore or relieur (for multi-image combinations).

Citation

If you use this code in your research work, please cite oemer and Polyphonic-TrOMR.

Name

The name "homr" stands for Homer's Optical Music Recognition (OMR), leaving the interpretation of "Homer" to the user's discretion, whether referring to the ancient poet Homer or the iconic character from The Simpsons.

Thanks

This project builds upon previous work, including:

The segmentation models of oemer
The transformer model of Polyphonic-TrOMR
The starter template provided by Benjamin Roland

Name		Name	Last commit message	Last commit date
Latest commit History 354 Commits
.cspell-dictionaries		.cspell-dictionaries
.github		.github
.vscode		.vscode
docs		docs
figures		figures
homr		homr
tests		tests
training		training
validation		validation
.cspell.json		.cspell.json
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
Changelog.md		Changelog.md
Dockerfile		Dockerfile
IDEAS_FOR_CONTRIBUTORS.md		IDEAS_FOR_CONTRIBUTORS.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
Training.md		Training.md
Vocabulary.md		Vocabulary.md
colab.ipynb		colab.ipynb
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

homr

Prerequisites

Getting started

Example

Limitations

Technical Details

Stage 1: Image Segmentation and Structural Analysis

Stage 2: Staff Detection and Merging

Stage 3: Semantic Symbol Recognition via Transformer

Stage 4: MusicXML Output

Citation

Name

Thanks

About

Uh oh!

Releases 3

Packages

Uh oh!

Uh oh!

Contributors 4

Languages

Folders and files

Latest commit

History

Repository files navigation

homr

Prerequisites

Getting started

Example

Limitations

Technical Details

Stage 1: Image Segmentation and Structural Analysis

Stage 2: Staff Detection and Merging

Stage 3: Semantic Symbol Recognition via Transformer

Stage 4: MusicXML Output

Citation

Name

Thanks

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Uh oh!

Contributors 4

Languages

Packages