A comprehensive music agent framework built on smolagents, integrating state-of-the-art music AI models for understanding, generation, and interaction.
- ChatMusician Integration: Natural language music analysis and understanding
- Music Theory Analysis: Automatic analysis of musical structures, harmony, and form
- Audio Understanding: Content analysis of audio files
- Symbolic Music Generation: ABC notation generation using NotaGen
- Audio Generation: High-quality audio synthesis using Stable Audio Open
- Conditional Generation: Generate music based on text prompts, styles, and constraints
- smolagents Integration: Intelligent agent system that decides which tools to use
- Multi-modal Support: Text, audio, and symbolic music inputs
- Tool Orchestration: Seamless integration between different music AI models
- Gradio Web Interface: User-friendly web interface for interaction
- Score Visualization Integration: Beautiful sheet music rendering in PNG, PDF, and MusicXML.
- Audio Playback: Integrated audio player for generated content
- Flexible Local Ready: Scalable deployment options based on system capacity.
- Remote Deployment: Partially remote deployment through HF Inference Clients.
WeaveMuse provides an intuitive web interface for seamless music AI interaction:
This project is GPU-optimized and requires:
- Python 3.10 or later
- NVIDIA GPU with CUDA 12.1+ (recommended)
- At least 8GB VRAM for local models (recommended > 40GB)
uv is a fast, modern Python package manager that provides reliable dependency resolution and faster installs.
# Install uv (once)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Or on Windows: powershell -c "irm https://astral.sh/uv/install.ps1 | iex"# Clone the repository
git clone https://github.com/manoskary/weavemuse.git
cd weavemuse
# Install with GPU support (CUDA 12.1)
uv sync --extra-index-url https://download.pytorch.org/whl/cu121
# For development with all extras:
uv sync --extra dev --extra gpu --extra remote --extra audio --extra music --extra-index-url https://download.pytorch.org/whl/cu121
# Lock dependencies for reproducible installs
uv lock# Activate the uv environment
source .venv/bin/activate
# Run WeaveMuse
weavemuse serve# Clone the repository
git clone https://github.com/manoskary/weavemuse.git
cd weavemuse
# Install with GPU support
pip install -e ".[gpu]" --extra-index-url https://download.pytorch.org/whl/cu121
# For development
pip install -e ".[dev,gpu,remote,audio,music]" --extra-index-url https://download.pytorch.org/whl/cu121# Create conda environment
conda create -n weavemuse python=3.10
conda activate weavemuse
# Install PyTorch with CUDA
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
# Install WeaveMuse
pip install -e .WeaveMuse uses models from HuggingFace Hub and supports remote inference. Set up your environment:
# Already included, but if needed separately:
pip install huggingface_hub# Login with your HuggingFace token
huggingface-cli login
# Or set environment variable
export HF_TOKEN="your_huggingface_token_here"For lightweight usage without local GPU requirements, you can use HuggingFace's Inference API. Please provide your HF token to use it.
# Copy example environment file
cp .env.example .envEdit .env with your configuration (optional):
# HuggingFace Configuration
HF_TOKEN=your_huggingface_token
HF_CACHE_DIR=./models/cache
# Model configurations
CHATMUSICIAN_MODEL_ID=m-a-p/ChatMusician
NOTAGEN_MODEL_PATH=./models/notagen
STABLE_AUDIO_MODEL_ID=stabilityai/stable-audio-open-1.0
# GPU configuration
DEVICE=cuda
TORCH_DTYPE=float16
CUDA_VISIBLE_DEVICES=0
# Server configuration
HOST=0.0.0.0
PORT=7860
DEBUG=false
# Remote API Keys (optional)
OPENAI_API_KEY=your_openai_key
ANTHROPIC_API_KEY=your_anthropic_keyAfter installation, verify everything is working:
# Check GPU availability
python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"
# Start the web interface
weavemuse guiWeaveMuse provides flexible command-line options to launch different interfaces:
# Launch web interface (default)
weavemuse gui
# Launch terminal interface
weavemuse terminal
# For backwards compatibility, this also works:
weavemuse
# Show version
weavemuse --version- User-friendly Gradio web interface
- File upload capabilities for audio analysis
- Interactive chat with music agents
- Visual display of generated scores and audio playback
- Accessible (usually) at
http://localhost:7860
- Command-line interaction for advanced users
- Fast startup with on-demand loading
- Direct text-based communication with agents
When launching WeaveMuse, you'll be prompted to choose your model configuration:
π€ Choose your AI model:
1. Only Local Models (Requires more resources and loading time)
2. HuggingFace cloud-based agent (some local tools - faster startup)
3. All Remote (All models and Tools are remote - no resources needed)
Important: The backbone language model drives the intelligence of all WeaveMuse agents. When using smaller models due to computational constraints, expect the overall intelligence and reasoning capabilities of the system to be affected accordingly.
WeaveMuse operates as a multi-agent system with specialized agents for different music tasks:
- Purpose: Orchestrates all music-related tasks
- Capabilities: Task routing, file handling, workflow management
- Tools: Base smolagents tools + specialized music agents
- Think-act-repair: The Manager agent follows Think-act-repair protocol.
- Intelligence: Driven by the backbone model (local or remote)
1. Symbolic Music Agent
- Tools: NotaGenTool
- Function: Generates symbolic music in ABC notation
- Output: PDF scores, MusicXML, MIDI files, MP3 audio
- Use Cases: Composition based on musical periods, composers, instrumentation
2. Audio Analysis Agent
- Tools: AudioFlamingoTool, AudioAnalysisTool (optional)
- Function: Advanced audio content analysis using NVIDIA Audio Flamingo
- Capabilities: Musical element identification, acoustic analysis, content description
- Input: Audio files (any format supported)
3. Audio Generation Agent
- Tools: StableAudioTool
- Function: High-quality audio synthesis from text descriptions
- Technology: Stable Audio Open model
- Output: 44.1kHz stereo audio files
4. Web Search Agent
- Tools: WebSearchTool
- Function: Music-related information retrieval
- Capabilities: Research, fact-checking, music knowledge expansion
WeaveMuse local tools support Lazy loading and GPU offloading, to prevent overheads and long waiting times when tools are not used.
ChatMusicianTool
- Natural language music analysis and understanding
- Music theory explanations and composition guidance
- Chord progression analysis and recommendations
NotaGenTool
- Symbolic music generation in ABC notation format
- Supports various musical styles and instrumentation
- Automatic conversion to multiple formats (PDF, MIDI, MusicXML, MP3)
StableAudioTool
- Text-to-audio generation using Stable Audio Open
- Good-quality stereo audio synthesis
- Conditional generation based on prompts
AudioFlamingoTool
- Remote audio analysis via NVIDIA's Audio Flamingo model
- Advanced acoustic analysis and content understanding
- Zero-setup remote processing
AudioAnalysisTool (Optional)
- Local audio analysis using Qwen2-Audio model
- Requires local GPU resources
- Detailed musical content analysis
The backbone language model is the core intelligence driving all WeaveMuse agents. This model determines:
- Task Understanding: How well the system interprets your requests
- Tool Selection: Which specialized tools to use for specific tasks
- Workflow Orchestration: How effectively multiple tools are combined
- Response Quality: The coherence and helpfulness of outputs
- Local Models: Better reasoning but require more resources (8GB+ VRAM recommended)
- Remote Models: Good balance of intelligence and resource usage
- Smaller Models: Limited reasoning but faster and lower resource requirements
weavemuse gui
# User: "Create a baroque-style piece for string quartet and analyze its harmonic structure"
# System: Uses NotaGenTool β ChatMusicianTool β Returns score + analysisweavemuse terminal
# User uploads audio file: "What instruments are in this recording? Generate something similar."
# System: Uses AudioFlamingoTool β StableAudioTool β Returns analysis + new audioweavemuse gui
# User: "Research Beethoven's late string quartets and compose something inspired by Op. 131"
# System: Uses WebSearchTool β ChatMusicianTool β NotaGenTool β Returns research + compositionCUDA/GPU Issues:
# Check GPU availability
python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"
# Install CUDA packages if needed
uv sync --extra gpu --extra-index-url https://download.pytorch.org/whl/cu121HuggingFace Authentication:
# Login with HuggingFace token
huggingface-cli login
# Or set environment variable
export HF_TOKEN="your_token_here"Dependency Conflicts:
# Reset and reinstall
rm -rf .venv uv.lock
uv sync --extra-index-url https://download.pytorch.org/whl/cu121 --index-strategy unsafe-best-matchAudio Generation Issues:
# Install audio extras
uv sync --extra audio --extra-index-url https://download.pytorch.org/whl/cu121WeaveMuse is designed to be extensible. To add custom music tools:
from smolagents.tools import Tool
from weavemuse.tools.base_tools import ManagedTransformersTool
class CustomMusicTool(ManagedTransformersTool):
name = "custom_music"
description = "Your custom music tool description"
inputs = {"prompt": {"type": "string", "description": "Input prompt"}}
output_type = "string"
def _load_model(self):
# Implement your model loading logic
pass
def _call_model(self, model, **kwargs):
# Implement your tool logic
pass-
ChatMusician: Music understanding and analysis
- Model:
m-a-p/ChatMusician - Capabilities: Music theory, harmony analysis, composition guidance
- Model:
-
NotaGen: Symbolic music generation
- Model: Custom implementation with pre-trained weights
- Output: ABC notation format
-
Stable Audio Open: Audio generation
- Model:
stabilityai/stable-audio-open-1.0 - Output: High-quality 44.1kHz stereo audio
- Model:
-
Audio Analysis: Content understanding
- Multiple models for different analysis tasks
- Capabilities: Genre classification, mood detection, structure analysis
- Music Composition: Generate complete musical pieces
- Harmony Analysis: Analyze chord progressions and harmonic structure
- Style Transfer: Convert music between different styles
- Audio Synthesis: Convert symbolic music to audio
- Format Conversion: Between ABC, MIDI, MusicXML, and audio formats
- Music Education: Explain music theory concepts and analysis
We welcome contributions! Please see our Contributing Guide for details.
This project is licensed under the MIT License - see the LICENSE file for details.
- smolagents - Agent framework
- ChatMusician - Music understanding symbolic
- NotaGen - Symbolic music generation
- Stable Audio - Audio generation
- Audio Flamingo - Music Understanding on Audio by NVIDIA
- Qwen-Audio - Music Understanding on Audio by QWEN
If you use this framework in your research, please cite:
@article{music_agent_framework,
title={WeaveMuse: An Open Agentic System for Multimodal Music Understanding and Generation},
author={Karystinaios, Emmanouil},
year={2025},
article={1st Workshop on Large Language Models for Music and Audio (LLM4MA)}
}- Integrate Spotify MCP for song playback and id - Add Spotify Model Context Protocol integration for enhanced music discovery, playback capabilities, and song identification features
- Dreambooth stable-audio-open on music for better outputs - Fine-tune Stable Audio Open with DreamBooth on music datasets to improve generation quality and musical coherence
- Improve control on NotaGen - Enhance NotaGen tool with better parameter control, style conditioning, and compositional constraints for more precise music generation
- Integrate Neural Codec fun tool - Add neural audio codec tools for advanced audio compression, reconstruction, and manipulation capabilities
- Integrate Engraving Model - Add EngravingGNN as a pipeline after NotaGen ouput to improve the layout of the generated MusicXML and PDF scores.
- Integrate more tools and MCPs - Expand the toolkit with additional Model Context Protocols and specialized music AI tools for broader functionality
These improvements aim to enhance WeaveMuse's capabilities in music discovery, generation quality, user control, and ecosystem integration. Contributions and suggestions for these features are welcome!

