Skip to content

Latest commit

 

History

History
77 lines (56 loc) · 3.75 KB

File metadata and controls

77 lines (56 loc) · 3.75 KB

MusicMon 🎵

A personal exploration into training music generation models from scratch! This project documents my journey through different approaches to AI music generation, from diffusion models to transformers.

What I Built

🎯 Project Goals

  • Train my own music generation model from scratch (first time!)
  • Experiment with different architectures and approaches
  • Generate Pokemon-style music using MIDI data

🔢 Approaches Tried

1. Diffusion Model (Raw Waveforms)

  • Started with MNIST to understand diffusion
  • Attempted raw waveform diffusion for audio
  • Result: Pure static noise output 😅
  • Challenge: Raw audio diffusion does not like me. I'm still not sure why this gave me pure static noise. I considered trying spectograms too, but decided a transformer would likely be better at dealing with sequential data.

2. Transformer Model

  • Implemented transformer decoder (with sinusoidal positional encodings & multi-headed attention) from scratch
  • Used REMI tokenizer (from MidiTok) to tokenize MIDI songs
  • Trained on a subset of Pokemon MIDI songs
  • Result: Some musical patterns emerging, but quality varies
  • Challenge: Extremely slow training with low GPU utilization (10% memory, 30% duty cycle)

These were trained on my RTX 4070 so it's a given that I wasn't able to gain extremely good quality

🎼 Audio Samples

📚 Resources

🎮 Data Sources

The MIDI songs were sourced from here

🚧 Future Improvements

I'm flying out to Boston to join Suno tomorrow and work on music generation so I'm not sure whether I'll be working on these here, but here are things I'd like to work on given time.

  • Improve training pipeline for better GPU utilization
  • Experiment with different model architectures
  • Implement better sampling strategies
  • Add conditioning for style control
  • Try hybrid approaches combining diffusion and transformers

📁 Project Structure

music-gen/
├── midi_transformer.ipynb      # Main transformer implementation
├── exploration/                # Early experiments
│   ├── music_generator.ipynb   # Diffusion model attempts
│   └── audio_utils.py          # Audio processing utilities
├── midi_to_wav.ipynb           # Convert MIDI samples to wav
├── midis/                      # Pokemon MIDI dataset
├── outputs/                    # Generated audio samples
└── midi_tokenizer.json         # Trained REMI tokenizer

This project represents my first attempt at training music generation models from scratch, but I had fun because I love music :} 🎵✨