Inference-only version of SheetSage for music transcription with vendored Jukebox modules.
AI-powered music transcription system that converts audio to lead sheets (melody + chord symbols) using deep learning models.
SheetSage-Infer is an inference-only version of SheetSage for music transcription, optimized for easy deployment with vendored Jukebox modules.
- ✅ Vendored Jukebox Modules - No external Jukebox dependency needed
- ✅ CPU & GPU Support - Handcrafted features (CPU) or Jukebox embeddings (GPU)
- ✅ Multiple Export Formats - LilyPond notation, MIDI files, PDF generation
- ✅ Audio from URLs - Support for YouTube, Bandcamp, and other sources
- ✅ Simple API - High-level
sheetsage()function
From PyPI:
# Using pip
pip install openmirlab-sheetsage-infer
# Using uv (recommended - faster)
uv pip install openmirlab-sheetsage-infer
# Or add to your project with uv
uv add openmirlab-sheetsage-inferFor Development:
git clone https://github.com/openmirlab/sheetsage-infer.git
cd sheetsage-infer
pip install -e ".[dev]"- Python: ≥3.10 (tested on 3.10, 3.11, 3.12)
- LilyPond (optional, for PDF generation)
- Linux:
sudo apt-get install lilypond - macOS:
brew install lilypond - Windows: Download from lilypond.org
- Linux:
from sheetsage.infer import sheetsage
from sheetsage.utils import engrave
from sheetsage.align import create_beat_to_time_fn
# Transcribe audio URL
lead_sheet, segment_beats, segment_beats_times = sheetsage(
'https://example.com/audio.mp3',
use_jukebox=False, # Use fast CPU-based features
segment_start_hint=30, # Start at 30 seconds
segment_end_hint=60, # End at 60 seconds
beats_per_minute_hint=120 # Hint for BPM (improves accuracy)
)
# Export to LilyPond
lily_code = lead_sheet.as_lily()
print(lily_code)
# Export to MIDI
beat_to_time_fn = create_beat_to_time_fn(segment_beats, segment_beats_times)
midi_bytes = lead_sheet.as_midi(beat_to_time_fn)
# Save MIDI file
with open('output.mid', 'wb') as f:
f.write(midi_bytes)
# Generate PDF (requires LilyPond)
pdf_bytes = engrave(lily_code, out_format='pdf')
with open('leadsheet.pdf', 'wb') as f:
f.write(pdf_bytes)from sheetsage.infer import sheetsage
# Requires GPU with >=12GB VRAM
lead_sheet, beats, beat_times = sheetsage(
'audio.mp3',
use_jukebox=True, # Use Jukebox embeddings (vendored)
segment_start_hint=0,
segment_end_hint=30,
beats_per_minute_hint=100
)Note: Jukebox features require GPU with ≥12GB VRAM. Vendored modules work without external installation.
# Basic transcription
python -m sheetsage.infer audio.mp3
# With options
python -m sheetsage.infer audio.mp3 \
--segment_start_hint 30 \
--segment_end_hint 60 \
--beats_per_minute_hint 120 \
--output_dir ./output
# See all options
python -m sheetsage.infer --help- Python: ≥3.10
- PyTorch: ≥2.0.0
- GPU: Optional, but recommended for Jukebox features (12GB+ VRAM)
- OS: Linux, macOS, Windows
Transcription speed depends on audio length and feature extraction method:
- Handcrafted features (CPU): ~1-5 seconds per minute of audio
- Jukebox features (GPU): ~30-60 seconds per minute of audio (requires GPU with ≥12GB VRAM)
Note: Performance depends on audio length, hardware, and feature extraction method. Jukebox features provide higher quality but are slower.
See examples/ directory for usage examples:
basic_transcription.py- Basic usagejukebox_transcription.py- GPU-based transcriptionhooktheory_example.py- Working with Hooktheory data
sheetsage-infer/
├── sheetsage/ # Main package
│ ├── infer.py # Main transcription pipeline
│ ├── align.py # Beat-to-time alignment
│ ├── beat_track.py # Beat detection
│ ├── utils.py # LilyPond engraving, audio I/O
│ ├── assets.py # Asset management
│ ├── assets/ # Asset JSON files
│ │ ├── hooktheory.json
│ │ ├── jukebox.json
│ │ ├── rwc.json
│ │ ├── sheetsage.json
│ │ └── test.json
│ ├── modules/ # Neural network models
│ │ └── modules.py # Transformer architectures
│ ├── representations/ # Feature extractors
│ │ ├── handcrafted.py # CPU-based mel-spectrograms
│ │ ├── jukebox.py # Jukebox embedding interface
│ │ └── jukebox_modules/ # Vendored Jukebox code
│ └── theory/ # Music theory classes
│ ├── lead_sheet.py # LeadSheet class with export methods
│ ├── basic.py # Basic music theory primitives
│ ├── internal.py # Internal theory classes
│ ├── theorytab.py # TheoryTab integration
│ └── utils.py # Theory utilities
├── examples/ # Example scripts
│ ├── basic_transcription.py # Basic usage
│ ├── jukebox_transcription.py # GPU-based transcription
│ ├── hooktheory_example.py # Hooktheory data examples
│ ├── hooktheory_simple.py # Simple Hooktheory example
│ └── transcribe_hooktheory_segments.py # Hooktheory segment transcription
├── hooktheory_data/ # Test data
│ ├── Hooktheory_Test_MIDI.tar.gz
│ └── Hooktheory_Test_Segments.json
├── docs/ # Documentation
│ └── generated/ # Generated documentation
├── .github/ # GitHub configuration
│ └── workflows/
│ └── publish.yml # PyPI publishing workflow
├── pyproject.toml # Project configuration
├── requirements.txt # Python dependencies
├── uv.lock # UV lock file
├── LICENSE # MIT License
└── README.md # This file
SheetSage-Infer has been modified from the original SheetSage to make it more suitable for library use and easier to maintain.
| Feature | Original | This Version |
|---|---|---|
| Jukebox Dependency | External, complex install | Vendored, works out of box |
| Test Coverage | Limited | Test suite included |
| Python Support | 3.12+ only | 3.10, 3.11, 3.12 |
| Build System | Hatch | Setuptools (standard) |
| Dependency Pins | Loose | Explicit versions |
- ✅ All core transcription functionality
- ✅ Same neural network models
- ✅ Same output formats (LeadSheet, LilyPond, MIDI)
- ✅ Same API interface for
sheetsage()function - ✅ Same theory classes (Note, Chord, Melody, Harmony, etc.)
- Vendored Jukebox Modules: Eliminates complex external dependency
- Library-First Design: Optimized for
pip installand programmatic use - Better Dependency Management: Explicit version pins and compatibility
SheetSage-Infer is built upon the excellent work of SheetSage by Chris Donahue. The original SheetSage represents a major advancement in music transcription, achieving state-of-the-art results through hierarchical transformer architectures.
SheetSage: A Hierarchical Transformer for Audio to Lead Sheet Transcription
This work introduced hierarchical music transcription with melody and harmony extraction, enabling high-quality lead sheet generation from audio.
- Chris Donahue - Original SheetSage creator
This package was created to continue the excellent work by providing easier deployment and vendored Jukebox modules, while preserving 100% of the original model quality and algorithms.
What we maintain:
- PyTorch 2.0+ compatibility
- Modern dependency management
- Inference-only packaging
What remains unchanged:
- All model architectures (100% original)
- All transcription algorithms (100% original)
- All model weights (100% original)
- All output formats (100% original)
Please cite using the following bibtex entry:
@inproceedings{donahue2024sheetsage,
title={SheetSage: A Hierarchical Transformer for Audio to Lead Sheet Transcription},
author={Donahue, Chris},
booktitle={ISMIR},
year={2024}
}If you use SheetSage-Infer in your research, please cite the original SheetSage paper above. This package is a maintenance fork to ensure easier deployment and continued compatibility - all credit for the models, algorithms, and research belongs to the original author.
MIT License (same as original SheetSage)
Copyright (c) 2024 Chris Donahue (Original SheetSage) Copyright (c) 2025 (SheetSage-Infer modifications)
See LICENSE for details.
This project includes code adapted from SheetSage (MIT License, Copyright 2024 Chris Donahue).
- Inference only - No training capabilities
- Jukebox features require GPU - 12GB+ VRAM recommended for Jukebox embeddings
- LilyPond required for PDF - Optional dependency for PDF generation
- Time signatures - Currently supports 4/4 and 3/4 only
- Audio length - Best results with segments 30-300 seconds
We welcome contributions! Please:
- Follow the code style (ruff/black)
- Add tests for new features
- Submit PRs with clear descriptions
# Install dependencies
pip install -e ".[dev]"
# Run tests
pytest tests/
# Format and lint code
ruff format . && ruff check .For issues and questions:
- GitHub Issues: github.com/openmirlab/sheetsage-infer/issues
- Examples:
examples/directory
- Original SheetSage: https://github.com/chrisdonahue/sheetsage
- This Repository: https://github.com/openmirlab/sheetsage-infer
- PyPI Package: https://pypi.org/project/openmirlab-sheetsage-infer/
Made with ❤️ for the ML community
Based on the excellent work by Chris Donahue and the SheetSage project.