Skip to content

Deep Learning framework for generating symbolic music (MIDI) using Supervised Learning, Reinforcement Learning (RL), and Diffusion-based Postprocessing.

License

Notifications You must be signed in to change notification settings

SupratikB23/HarmonyRL

Repository files navigation

🎶 HarmonyRL: Music Generation using Supervised Reinforcement Learning 🎶

HarmonyRL is a deep learning framework for generating symbolic music (MIDI) using Supervised Learning, Reinforcement Learning (RL), and Diffusion-based Postprocessing.


📌 Project Overview

HarmonyRL combines LSTM, Transformer architectures, and Reinforcement Learning (policy gradient–style optimization) to train generative music models on the MAESTRO Dataset. The goal is to generate high-quality, coherent, and musically pleasing MIDI outputs.

  • Supervised Learning: Pretraining with cross-entropy loss on MAESTRO dataset.
  • Reinforcement Learning: Fine-tuning with reward functions (harmony, rhythm, novelty, smoothness).
  • Diffusion Postprocessing: Improving raw MIDI outputs by refining transitions and removing dissonance.

⚙️ Tech Stack

  • Deep Learning: PyTorch (torch>=2.2, torchaudio>=2.2)
  • Symbolic Music Processing: pretty_midi, mido, music21
  • Audio Processing: librosa, soundfile
  • Datasets & Tokenization: HuggingFace datasets>=2.20, transformers>=4.41
  • Generative Refinement: diffusers>=0.30, accelerate>=0.33
  • Utilities: numpy, scipy, tqdm, pyyaml

📦 Installation

# Clone the repository
git clone https://github.com/SupratikB23/HarmonyRL.git
cd HarmonyRL

# Create virtual environment
python -m venv venv
source venv/bin/activate   # (Linux/Mac)
venv\Scripts\activate      # (Windows)

# Install dependencies
pip install -r requirements.txt

🎼 Dataset

This project uses the MAESTRO Dataset (approx. 200 hours of virtuosic piano performances, ~1276 MIDI files). Download it from Google Magenta MAESTRO


🚀 Training

1. Supervised Pretraining

python scripts/train_supervised.py --config configs/supervised_config.yaml

2. Reinforcement Learning Fine-tuning

python scripts/train_rl.py --config configs/rl_config.yaml

3. Inference (Generate MIDI)

python scripts/infer.py --ckpt checkpoints/best_model.pt --output_dir outputs/

🧠 Algorithms & Formulations

1. Supervised Learning

Cross-Entropy Loss is applied on sequence modeling of MIDI tokens:

L(θ) = - Σ [ yt log P(yt|x<t; θ) ]

2. Reinforcement Learning

We fine-tune the pretrained model using Policy Gradient (REINFORCE) with a baseline to reduce variance:

∇J(θ) = E[ (R - b) ∇ log πθ(a|s) ]

  • Reward R is computed from multiple components:
    • Harmony Consistency
    • Rhythmic Structure
    • Novelty & Diversity
    • Smooth Transitions

3. Diffusion Postprocessing

Diffusion models (via diffusers) refine generated MIDI embeddings, denoising dissonance and smoothing temporal structure.


📊 Configuration

Supervised Config (configs/supervised_config.yaml)

seed: 42
data:
  root: "data/maestro"
  max_seq_len: 2048
  train_ratio: 0.95
model:
  embed_dim: 512
  hidden: 768
  layers: 3
  dropout: 0.2
train:
  batch_size: 8
  lr: 3e-4
  epochs: 20
  clip_grad_norm: 1.0
  ckpt_dir: "checkpoints"

Reinforcement Learning Config (configs/rl_config.yaml)

seed: 123
model:
  embed_dim: 512
  hidden: 768
  layers: 3
  dropout: 0.2
train:
  ckpt_dir: "checkpoints"
rl:
  episodes: 2000
  rollout_len: 512
  lr: 1e-5
  baseline_beta: 0.95
  entropy_coef: 0.005

📂 Repository Structure

├── .gitignore
├── README.md
├── config.yaml
├── configs
│   ├── rl_config.yaml
│   └── supervised_config.yaml
├── harmonyrl.egg-info
├── harmonyrl
│   ├── __init__.py
│   ├── datasets.py
│   ├── inference.py
│   ├── midi_utils.py
│   ├── models
│   │   ├── __init__.py
│   │   ├── lstm.py
│   │   └── transformer.py
│   ├── postprocess_diffusers.py
│   ├── rewards.py
│   ├── training
│   │   ├── __init__.py
│   │   ├── rl.py
│   │   └── supervised.py
│   └── utils
│       ├── __init__.py
│       ├── evaluation.py
│       └── logging.py
├── requirements.txt
├── scripts
│   ├── infer.py
│   ├── train_rl.py
│   └── train_supervised.py
├── setup.py
├── NOTICE 
└── LICENSE

📈 Future Improvements

  • Experiment with larger Transformer backbones (e.g., Performer, Music Transformer).
  • Introduce Curriculum RL with staged rewards for melody, harmony, and structure.
  • Extend to multi-instrument compositions beyond piano (MAESTRO is piano-only).
  • Integrate GAN-based critics for adversarial refinement of generated music.
  • Better postprocessing via latent diffusion in symbolic space.

🙌 Acknowledgements

- Magenta Project for MAESTRO dataset.
- PyTorch, HuggingFace, Diffusers team for tools.
- Inspiration from reinforcement learning in sequence generation (REINFORCE, PPO).

About

Deep Learning framework for generating symbolic music (MIDI) using Supervised Learning, Reinforcement Learning (RL), and Diffusion-based Postprocessing.

Topics

Resources

License

Stars

Watchers

Forks

Contributors

Languages