Skip to content

openmirlab/sheetsage-infer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

18 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

SheetSage-Infer

Inference-only version of SheetSage for music transcription with vendored Jukebox modules.

PyPI Python 3.10+ License: MIT

AI-powered music transcription system that converts audio to lead sheets (melody + chord symbols) using deep learning models.


πŸ“Œ Overview

SheetSage-Infer is an inference-only version of SheetSage for music transcription, optimized for easy deployment with vendored Jukebox modules.


✨ Features

  • βœ… Vendored Jukebox Modules - No external Jukebox dependency needed
  • βœ… CPU & GPU Support - Handcrafted features (CPU) or Jukebox embeddings (GPU)
  • βœ… Multiple Export Formats - LilyPond notation, MIDI files, PDF generation
  • βœ… Audio from URLs - Support for YouTube, Bandcamp, and other sources
  • βœ… Simple API - High-level sheetsage() function

πŸš€ Quick Start

Installation

From PyPI:

# Using pip
pip install openmirlab-sheetsage-infer

# Using uv (recommended - faster)
uv pip install openmirlab-sheetsage-infer

# Or add to your project with uv
uv add openmirlab-sheetsage-infer

For Development:

git clone https://github.com/openmirlab/sheetsage-infer.git
cd sheetsage-infer
pip install -e ".[dev]"

Prerequisites

  • Python: β‰₯3.10 (tested on 3.10, 3.11, 3.12)
  • LilyPond (optional, for PDF generation)
    • Linux: sudo apt-get install lilypond
    • macOS: brew install lilypond
    • Windows: Download from lilypond.org

Simple API (Recommended for Python)

from sheetsage.infer import sheetsage
from sheetsage.utils import engrave
from sheetsage.align import create_beat_to_time_fn

# Transcribe audio URL
lead_sheet, segment_beats, segment_beats_times = sheetsage(
    'https://example.com/audio.mp3',
    use_jukebox=False,           # Use fast CPU-based features
    segment_start_hint=30,       # Start at 30 seconds
    segment_end_hint=60,         # End at 60 seconds
    beats_per_minute_hint=120    # Hint for BPM (improves accuracy)
)

# Export to LilyPond
lily_code = lead_sheet.as_lily()
print(lily_code)

# Export to MIDI
beat_to_time_fn = create_beat_to_time_fn(segment_beats, segment_beats_times)
midi_bytes = lead_sheet.as_midi(beat_to_time_fn)

# Save MIDI file
with open('output.mid', 'wb') as f:
    f.write(midi_bytes)

# Generate PDF (requires LilyPond)
pdf_bytes = engrave(lily_code, out_format='pdf')
with open('leadsheet.pdf', 'wb') as f:
    f.write(pdf_bytes)

Using Jukebox Features (Higher Quality, GPU Required)

from sheetsage.infer import sheetsage

# Requires GPU with >=12GB VRAM
lead_sheet, beats, beat_times = sheetsage(
    'audio.mp3',
    use_jukebox=True,  # Use Jukebox embeddings (vendored)
    segment_start_hint=0,
    segment_end_hint=30,
    beats_per_minute_hint=100
)

Note: Jukebox features require GPU with β‰₯12GB VRAM. Vendored modules work without external installation.

Command-Line Interface

# Basic transcription
python -m sheetsage.infer audio.mp3

# With options
python -m sheetsage.infer audio.mp3 \
    --segment_start_hint 30 \
    --segment_end_hint 60 \
    --beats_per_minute_hint 120 \
    --output_dir ./output

# See all options
python -m sheetsage.infer --help

πŸ“‹ Requirements

  • Python: β‰₯3.10
  • PyTorch: β‰₯2.0.0
  • GPU: Optional, but recommended for Jukebox features (12GB+ VRAM)
  • OS: Linux, macOS, Windows

⚑ Performance

Transcription speed depends on audio length and feature extraction method:

  • Handcrafted features (CPU): ~1-5 seconds per minute of audio
  • Jukebox features (GPU): ~30-60 seconds per minute of audio (requires GPU with β‰₯12GB VRAM)

Note: Performance depends on audio length, hardware, and feature extraction method. Jukebox features provide higher quality but are slower.


πŸ“š Examples

See examples/ directory for usage examples:

  • basic_transcription.py - Basic usage
  • jukebox_transcription.py - GPU-based transcription
  • hooktheory_example.py - Working with Hooktheory data

πŸ—οΈ Project Structure

sheetsage-infer/
β”œβ”€β”€ sheetsage/                    # Main package
β”‚   β”œβ”€β”€ infer.py                 # Main transcription pipeline
β”‚   β”œβ”€β”€ align.py                 # Beat-to-time alignment
β”‚   β”œβ”€β”€ beat_track.py             # Beat detection
β”‚   β”œβ”€β”€ utils.py                 # LilyPond engraving, audio I/O
β”‚   β”œβ”€β”€ assets.py                 # Asset management
β”‚   β”œβ”€β”€ assets/                   # Asset JSON files
β”‚   β”‚   β”œβ”€β”€ hooktheory.json
β”‚   β”‚   β”œβ”€β”€ jukebox.json
β”‚   β”‚   β”œβ”€β”€ rwc.json
β”‚   β”‚   β”œβ”€β”€ sheetsage.json
β”‚   β”‚   └── test.json
β”‚   β”œβ”€β”€ modules/                  # Neural network models
β”‚   β”‚   └── modules.py            # Transformer architectures
β”‚   β”œβ”€β”€ representations/          # Feature extractors
β”‚   β”‚   β”œβ”€β”€ handcrafted.py       # CPU-based mel-spectrograms
β”‚   β”‚   β”œβ”€β”€ jukebox.py            # Jukebox embedding interface
β”‚   β”‚   └── jukebox_modules/     # Vendored Jukebox code
β”‚   └── theory/                   # Music theory classes
β”‚       β”œβ”€β”€ lead_sheet.py         # LeadSheet class with export methods
β”‚       β”œβ”€β”€ basic.py              # Basic music theory primitives
β”‚       β”œβ”€β”€ internal.py           # Internal theory classes
β”‚       β”œβ”€β”€ theorytab.py          # TheoryTab integration
β”‚       └── utils.py              # Theory utilities
β”œβ”€β”€ examples/                     # Example scripts
β”‚   β”œβ”€β”€ basic_transcription.py    # Basic usage
β”‚   β”œβ”€β”€ jukebox_transcription.py  # GPU-based transcription
β”‚   β”œβ”€β”€ hooktheory_example.py     # Hooktheory data examples
β”‚   β”œβ”€β”€ hooktheory_simple.py     # Simple Hooktheory example
β”‚   └── transcribe_hooktheory_segments.py  # Hooktheory segment transcription
β”œβ”€β”€ hooktheory_data/              # Test data
β”‚   β”œβ”€β”€ Hooktheory_Test_MIDI.tar.gz
β”‚   └── Hooktheory_Test_Segments.json
β”œβ”€β”€ docs/                         # Documentation
β”‚   └── generated/               # Generated documentation
β”œβ”€β”€ .github/                      # GitHub configuration
β”‚   └── workflows/
β”‚       └── publish.yml           # PyPI publishing workflow
β”œβ”€β”€ pyproject.toml               # Project configuration
β”œβ”€β”€ requirements.txt             # Python dependencies
β”œβ”€β”€ uv.lock                      # UV lock file
β”œβ”€β”€ LICENSE                      # MIT License
└── README.md                    # This file

πŸ”„ Changes from Original SheetSage

SheetSage-Infer has been modified from the original SheetSage to make it more suitable for library use and easier to maintain.

Key Improvements

Feature Original This Version
Jukebox Dependency External, complex install Vendored, works out of box
Test Coverage Limited Test suite included
Python Support 3.12+ only 3.10, 3.11, 3.12
Build System Hatch Setuptools (standard)
Dependency Pins Loose Explicit versions

What We Maintain

  • βœ… All core transcription functionality
  • βœ… Same neural network models
  • βœ… Same output formats (LeadSheet, LilyPond, MIDI)
  • βœ… Same API interface for sheetsage() function
  • βœ… Same theory classes (Note, Chord, Melody, Harmony, etc.)

What We Changed

  • Vendored Jukebox Modules: Eliminates complex external dependency
  • Library-First Design: Optimized for pip install and programmatic use
  • Better Dependency Management: Explicit version pins and compatibility

πŸ™ Acknowledgments

Original Research by Chris Donahue

SheetSage-Infer is built upon the excellent work of SheetSage by Chris Donahue. The original SheetSage represents a major advancement in music transcription, achieving state-of-the-art results through hierarchical transformer architectures.

Research Paper

SheetSage: A Hierarchical Transformer for Audio to Lead Sheet Transcription

This work introduced hierarchical music transcription with melody and harmony extraction, enabling high-quality lead sheet generation from audio.

Original Author

  • Chris Donahue - Original SheetSage creator

About This Implementation

This package was created to continue the excellent work by providing easier deployment and vendored Jukebox modules, while preserving 100% of the original model quality and algorithms.

What we maintain:

  • PyTorch 2.0+ compatibility
  • Modern dependency management
  • Inference-only packaging

What remains unchanged:

  • All model architectures (100% original)
  • All transcription algorithms (100% original)
  • All model weights (100% original)
  • All output formats (100% original)

πŸ“„ Citation

Please cite using the following bibtex entry:

@inproceedings{donahue2024sheetsage,
  title={SheetSage: A Hierarchical Transformer for Audio to Lead Sheet Transcription},
  author={Donahue, Chris},
  booktitle={ISMIR},
  year={2024}
}

If you use SheetSage-Infer in your research, please cite the original SheetSage paper above. This package is a maintenance fork to ensure easier deployment and continued compatibility - all credit for the models, algorithms, and research belongs to the original author.


πŸ“„ License

MIT License (same as original SheetSage)

Copyright (c) 2024 Chris Donahue (Original SheetSage) Copyright (c) 2025 (SheetSage-Infer modifications)

See LICENSE for details.

This project includes code adapted from SheetSage (MIT License, Copyright 2024 Chris Donahue).


⚠️ Limitations

  • Inference only - No training capabilities
  • Jukebox features require GPU - 12GB+ VRAM recommended for Jukebox embeddings
  • LilyPond required for PDF - Optional dependency for PDF generation
  • Time signatures - Currently supports 4/4 and 3/4 only
  • Audio length - Best results with segments 30-300 seconds

🀝 Contributing

We welcome contributions! Please:

  1. Follow the code style (ruff/black)
  2. Add tests for new features
  3. Submit PRs with clear descriptions

Development Setup

# Install dependencies
pip install -e ".[dev]"

# Run tests
pytest tests/

# Format and lint code
ruff format . && ruff check .

πŸ“ž Support

For issues and questions:


πŸ”— Links


Made with ❀️ for the ML community

Based on the excellent work by Chris Donahue and the SheetSage project.

About

Inference-only version of SheetSage for music transcription with vendored Jukebox modules

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages