Skip to content

16BitNarwhal/MusicMon

Repository files navigation

MusicMon 🎵

A personal exploration into training music generation models from scratch! This project documents my journey through different approaches to AI music generation, from diffusion models to transformers.

What I Built

🎯 Project Goals

  • Train my own music generation model from scratch (first time!)
  • Experiment with different architectures and approaches
  • Generate Pokemon-style music using MIDI data

🔢 Approaches Tried

1. Diffusion Model (Raw Waveforms)

  • Started with MNIST to understand diffusion
  • Attempted raw waveform diffusion for audio
  • Result: Pure static noise output 😅
  • Challenge: Raw audio diffusion does not like me. I'm still not sure why this gave me pure static noise. I considered trying spectograms too, but decided a transformer would likely be better at dealing with sequential data.

2. Transformer Model

  • Implemented transformer decoder (with sinusoidal positional encodings & multi-headed attention) from scratch
  • Used REMI tokenizer (from MidiTok) to tokenize MIDI songs
  • Trained on a subset of Pokemon MIDI songs
  • Result: Some musical patterns emerging, but quality varies
  • Challenge: Extremely slow training with low GPU utilization (10% memory, 30% duty cycle)

These were trained on my RTX 4070 so it's a given that I wasn't able to gain extremely good quality

🎼 Audio Samples

📚 Resources

🎮 Data Sources

The MIDI songs were sourced from here

🚧 Future Improvements

I'm flying out to Boston to join Suno tomorrow and work on music generation so I'm not sure whether I'll be working on these here, but here are things I'd like to work on given time.

  • Improve training pipeline for better GPU utilization
  • Experiment with different model architectures
  • Implement better sampling strategies
  • Add conditioning for style control
  • Try hybrid approaches combining diffusion and transformers

📁 Project Structure

music-gen/
├── midi_transformer.ipynb      # Main transformer implementation
├── exploration/                # Early experiments
│   ├── music_generator.ipynb   # Diffusion model attempts
│   └── audio_utils.py          # Audio processing utilities
├── midi_to_wav.ipynb           # Convert MIDI samples to wav
├── midis/                      # Pokemon MIDI dataset
├── outputs/                    # Generated audio samples
└── midi_tokenizer.json         # Trained REMI tokenizer

This project represents my first attempt at training music generation models from scratch, but I had fun because I love music :} 🎵✨

About

make pokemon music

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors