HarmonyRL is a deep learning framework for generating symbolic music (MIDI) using Supervised Learning, Reinforcement Learning (RL), and Diffusion-based Postprocessing.
HarmonyRL combines LSTM, Transformer architectures, and Reinforcement Learning (policy gradient–style optimization) to train generative music models on the MAESTRO Dataset. The goal is to generate high-quality, coherent, and musically pleasing MIDI outputs.
- Supervised Learning: Pretraining with cross-entropy loss on MAESTRO dataset.
- Reinforcement Learning: Fine-tuning with reward functions (harmony, rhythm, novelty, smoothness).
- Diffusion Postprocessing: Improving raw MIDI outputs by refining transitions and removing dissonance.
- Deep Learning: PyTorch (torch>=2.2, torchaudio>=2.2)
- Symbolic Music Processing: pretty_midi, mido, music21
- Audio Processing: librosa, soundfile
- Datasets & Tokenization: HuggingFace datasets>=2.20, transformers>=4.41
- Generative Refinement: diffusers>=0.30, accelerate>=0.33
- Utilities: numpy, scipy, tqdm, pyyaml
# Clone the repository
git clone https://github.com/SupratikB23/HarmonyRL.git
cd HarmonyRL
# Create virtual environment
python -m venv venv
source venv/bin/activate # (Linux/Mac)
venv\Scripts\activate # (Windows)
# Install dependencies
pip install -r requirements.txtThis project uses the MAESTRO Dataset (approx. 200 hours of virtuosic piano performances, ~1276 MIDI files). Download it from Google Magenta MAESTRO
python scripts/train_supervised.py --config configs/supervised_config.yamlpython scripts/train_rl.py --config configs/rl_config.yamlpython scripts/infer.py --ckpt checkpoints/best_model.pt --output_dir outputs/Cross-Entropy Loss is applied on sequence modeling of MIDI tokens:
L(θ) = - Σ [ yt log P(yt|x<t; θ) ]
We fine-tune the pretrained model using Policy Gradient (REINFORCE) with a baseline to reduce variance:
∇J(θ) = E[ (R - b) ∇ log πθ(a|s) ]
- Reward R is computed from multiple components:
- Harmony Consistency
- Rhythmic Structure
- Novelty & Diversity
- Smooth Transitions
Diffusion models (via diffusers) refine generated MIDI embeddings, denoising dissonance and smoothing temporal structure.
seed: 42
data:
root: "data/maestro"
max_seq_len: 2048
train_ratio: 0.95
model:
embed_dim: 512
hidden: 768
layers: 3
dropout: 0.2
train:
batch_size: 8
lr: 3e-4
epochs: 20
clip_grad_norm: 1.0
ckpt_dir: "checkpoints"seed: 123
model:
embed_dim: 512
hidden: 768
layers: 3
dropout: 0.2
train:
ckpt_dir: "checkpoints"
rl:
episodes: 2000
rollout_len: 512
lr: 1e-5
baseline_beta: 0.95
entropy_coef: 0.005├── .gitignore ├── README.md ├── config.yaml ├── configs │ ├── rl_config.yaml │ └── supervised_config.yaml ├── harmonyrl.egg-info ├── harmonyrl │ ├── __init__.py │ ├── datasets.py │ ├── inference.py │ ├── midi_utils.py │ ├── models │ │ ├── __init__.py │ │ ├── lstm.py │ │ └── transformer.py │ ├── postprocess_diffusers.py │ ├── rewards.py │ ├── training │ │ ├── __init__.py │ │ ├── rl.py │ │ └── supervised.py │ └── utils │ ├── __init__.py │ ├── evaluation.py │ └── logging.py ├── requirements.txt ├── scripts │ ├── infer.py │ ├── train_rl.py │ └── train_supervised.py ├── setup.py ├── NOTICE └── LICENSE
- Experiment with larger Transformer backbones (e.g., Performer, Music Transformer).
- Introduce Curriculum RL with staged rewards for melody, harmony, and structure.
- Extend to multi-instrument compositions beyond piano (MAESTRO is piano-only).
- Integrate GAN-based critics for adversarial refinement of generated music.
- Better postprocessing via latent diffusion in symbolic space.
- Magenta Project for MAESTRO dataset.
- PyTorch, HuggingFace, Diffusers team for tools.
- Inspiration from reinforcement learning in sequence generation (REINFORCE, PPO).