Inference-only version of SheetSage for music transcription with vendored Jukebox modules.
AI-powered music transcription system that converts audio to lead sheets (melody + chord symbols) using deep learning models.
SheetSage-Infer is an inference-only version of SheetSage for music transcription, optimized for easy deployment with vendored Jukebox modules.
- β Vendored Jukebox Modules - No external Jukebox dependency needed
- β CPU & GPU Support - Handcrafted features (CPU) or Jukebox embeddings (GPU)
- β Multiple Export Formats - LilyPond notation, MIDI files, PDF generation
- β Audio from URLs - Support for YouTube, Bandcamp, and other sources
- β
Simple API - High-level
sheetsage()function
From PyPI:
# Using pip
pip install openmirlab-sheetsage-infer
# Using uv (recommended - faster)
uv pip install openmirlab-sheetsage-infer
# Or add to your project with uv
uv add openmirlab-sheetsage-inferFor Development:
git clone https://github.com/openmirlab/sheetsage-infer.git
cd sheetsage-infer
pip install -e ".[dev]"- Python: β₯3.10 (tested on 3.10, 3.11, 3.12)
- LilyPond (optional, for PDF generation)
- Linux:
sudo apt-get install lilypond - macOS:
brew install lilypond - Windows: Download from lilypond.org
- Linux:
from sheetsage.infer import sheetsage
from sheetsage.utils import engrave
from sheetsage.align import create_beat_to_time_fn
# Transcribe audio URL
lead_sheet, segment_beats, segment_beats_times = sheetsage(
'https://example.com/audio.mp3',
use_jukebox=False, # Use fast CPU-based features
segment_start_hint=30, # Start at 30 seconds
segment_end_hint=60, # End at 60 seconds
beats_per_minute_hint=120 # Hint for BPM (improves accuracy)
)
# Export to LilyPond
lily_code = lead_sheet.as_lily()
print(lily_code)
# Export to MIDI
beat_to_time_fn = create_beat_to_time_fn(segment_beats, segment_beats_times)
midi_bytes = lead_sheet.as_midi(beat_to_time_fn)
# Save MIDI file
with open('output.mid', 'wb') as f:
f.write(midi_bytes)
# Generate PDF (requires LilyPond)
pdf_bytes = engrave(lily_code, out_format='pdf')
with open('leadsheet.pdf', 'wb') as f:
f.write(pdf_bytes)from sheetsage.infer import sheetsage
# Requires GPU with >=12GB VRAM
lead_sheet, beats, beat_times = sheetsage(
'audio.mp3',
use_jukebox=True, # Use Jukebox embeddings (vendored)
segment_start_hint=0,
segment_end_hint=30,
beats_per_minute_hint=100
)Note: Jukebox features require GPU with β₯12GB VRAM. Vendored modules work without external installation.
# Basic transcription
python -m sheetsage.infer audio.mp3
# With options
python -m sheetsage.infer audio.mp3 \
--segment_start_hint 30 \
--segment_end_hint 60 \
--beats_per_minute_hint 120 \
--output_dir ./output
# See all options
python -m sheetsage.infer --help- Python: β₯3.10
- PyTorch: β₯2.0.0
- GPU: Optional, but recommended for Jukebox features (12GB+ VRAM)
- OS: Linux, macOS, Windows
Transcription speed depends on audio length and feature extraction method:
- Handcrafted features (CPU): ~1-5 seconds per minute of audio
- Jukebox features (GPU): ~30-60 seconds per minute of audio (requires GPU with β₯12GB VRAM)
Note: Performance depends on audio length, hardware, and feature extraction method. Jukebox features provide higher quality but are slower.
See examples/ directory for usage examples:
basic_transcription.py- Basic usagejukebox_transcription.py- GPU-based transcriptionhooktheory_example.py- Working with Hooktheory data
sheetsage-infer/
βββ sheetsage/ # Main package
β βββ infer.py # Main transcription pipeline
β βββ align.py # Beat-to-time alignment
β βββ beat_track.py # Beat detection
β βββ utils.py # LilyPond engraving, audio I/O
β βββ assets.py # Asset management
β βββ assets/ # Asset JSON files
β β βββ hooktheory.json
β β βββ jukebox.json
β β βββ rwc.json
β β βββ sheetsage.json
β β βββ test.json
β βββ modules/ # Neural network models
β β βββ modules.py # Transformer architectures
β βββ representations/ # Feature extractors
β β βββ handcrafted.py # CPU-based mel-spectrograms
β β βββ jukebox.py # Jukebox embedding interface
β β βββ jukebox_modules/ # Vendored Jukebox code
β βββ theory/ # Music theory classes
β βββ lead_sheet.py # LeadSheet class with export methods
β βββ basic.py # Basic music theory primitives
β βββ internal.py # Internal theory classes
β βββ theorytab.py # TheoryTab integration
β βββ utils.py # Theory utilities
βββ examples/ # Example scripts
β βββ basic_transcription.py # Basic usage
β βββ jukebox_transcription.py # GPU-based transcription
β βββ hooktheory_example.py # Hooktheory data examples
β βββ hooktheory_simple.py # Simple Hooktheory example
β βββ transcribe_hooktheory_segments.py # Hooktheory segment transcription
βββ hooktheory_data/ # Test data
β βββ Hooktheory_Test_MIDI.tar.gz
β βββ Hooktheory_Test_Segments.json
βββ docs/ # Documentation
β βββ generated/ # Generated documentation
βββ .github/ # GitHub configuration
β βββ workflows/
β βββ publish.yml # PyPI publishing workflow
βββ pyproject.toml # Project configuration
βββ requirements.txt # Python dependencies
βββ uv.lock # UV lock file
βββ LICENSE # MIT License
βββ README.md # This file
SheetSage-Infer has been modified from the original SheetSage to make it more suitable for library use and easier to maintain.
| Feature | Original | This Version |
|---|---|---|
| Jukebox Dependency | External, complex install | Vendored, works out of box |
| Test Coverage | Limited | Test suite included |
| Python Support | 3.12+ only | 3.10, 3.11, 3.12 |
| Build System | Hatch | Setuptools (standard) |
| Dependency Pins | Loose | Explicit versions |
- β All core transcription functionality
- β Same neural network models
- β Same output formats (LeadSheet, LilyPond, MIDI)
- β
Same API interface for
sheetsage()function - β Same theory classes (Note, Chord, Melody, Harmony, etc.)
- Vendored Jukebox Modules: Eliminates complex external dependency
- Library-First Design: Optimized for
pip installand programmatic use - Better Dependency Management: Explicit version pins and compatibility
SheetSage-Infer is built upon the excellent work of SheetSage by Chris Donahue. The original SheetSage represents a major advancement in music transcription, achieving state-of-the-art results through hierarchical transformer architectures.
SheetSage: A Hierarchical Transformer for Audio to Lead Sheet Transcription
This work introduced hierarchical music transcription with melody and harmony extraction, enabling high-quality lead sheet generation from audio.
- Chris Donahue - Original SheetSage creator
This package was created to continue the excellent work by providing easier deployment and vendored Jukebox modules, while preserving 100% of the original model quality and algorithms.
What we maintain:
- PyTorch 2.0+ compatibility
- Modern dependency management
- Inference-only packaging
What remains unchanged:
- All model architectures (100% original)
- All transcription algorithms (100% original)
- All model weights (100% original)
- All output formats (100% original)
Please cite using the following bibtex entry:
@inproceedings{donahue2024sheetsage,
title={SheetSage: A Hierarchical Transformer for Audio to Lead Sheet Transcription},
author={Donahue, Chris},
booktitle={ISMIR},
year={2024}
}If you use SheetSage-Infer in your research, please cite the original SheetSage paper above. This package is a maintenance fork to ensure easier deployment and continued compatibility - all credit for the models, algorithms, and research belongs to the original author.
MIT License (same as original SheetSage)
Copyright (c) 2024 Chris Donahue (Original SheetSage) Copyright (c) 2025 (SheetSage-Infer modifications)
See LICENSE for details.
This project includes code adapted from SheetSage (MIT License, Copyright 2024 Chris Donahue).
- Inference only - No training capabilities
- Jukebox features require GPU - 12GB+ VRAM recommended for Jukebox embeddings
- LilyPond required for PDF - Optional dependency for PDF generation
- Time signatures - Currently supports 4/4 and 3/4 only
- Audio length - Best results with segments 30-300 seconds
We welcome contributions! Please:
- Follow the code style (ruff/black)
- Add tests for new features
- Submit PRs with clear descriptions
# Install dependencies
pip install -e ".[dev]"
# Run tests
pytest tests/
# Format and lint code
ruff format . && ruff check .For issues and questions:
- GitHub Issues: github.com/openmirlab/sheetsage-infer/issues
- Examples:
examples/directory
- Original SheetSage: https://github.com/chrisdonahue/sheetsage
- This Repository: https://github.com/openmirlab/sheetsage-infer
- PyPI Package: https://pypi.org/project/openmirlab-sheetsage-infer/
Made with β€οΈ for the ML community
Based on the excellent work by Chris Donahue and the SheetSage project.