Skip to content

Latest commit

 

History

History
69 lines (58 loc) · 3.49 KB

File metadata and controls

69 lines (58 loc) · 3.49 KB

MusicGen Agent Composer

  • Generates evolving MusicGen prompts with an LLM agent, then renders audio segments and stitches them with crossfades.
  • Backed by FastAPI for generation/rendering, with a Streamlit client and an optional React + Vite UI under frontend/MusicGen.

Architecture

  • Agent: agent.py uses agno with Groq Llama to produce 5s segment prompts from a description.
  • Renderer: main.py loads Hugging Face MusicGen (transformers) and renders WAV from prompts.
  • API: api.py exposes endpoints to get instructions and render audio; caches the model.
  • Clients: app.py (Streamlit) talks to the API. React app lives in frontend/MusicGen.

Prerequisites

  • Python 3.10+
  • Internet access on first run to download the MusicGen model
  • Groq API key for the agent

Quick Start

  • Create and activate a virtual environment, then install Python deps:
    • Windows (PowerShell): python -m venv .venv; .\.venv\Scripts\Activate.ps1; pip install -r requirements.txt
    • macOS/Linux: python3 -m venv .venv && source .venv/bin/activate && pip install -r requirements.txt
  • Set your Groq key in .env (already present as an example):
    • GROQ_API_KEY="<your_key>"
  • Start the API server: uvicorn api:app --reload --port 8080
  • Open the Streamlit client in another terminal: streamlit run app.py

Env Vars

  • GROQ_API_KEY (required for agent.py). The API and Streamlit load it via python-dotenv.

API Endpoints

  • GET /health — Versions of transformers/torch and CUDA status.
  • POST /v1/instructions — Body: { "description": str } → plan + list of prompts.
  • POST /v1/render — Body: { "prompts": [str], "guidance_scale": float, "max_new_tokens": int, "fade_ms": int, "output_dir": str, "return_audio_b64": bool } → stitched audio; optional base64.
  • POST /v1/generate-and-render — One shot: description → final audio.
  • POST /v1/render-segment — Render a single prompt; saves segment_XX.wav.

Common Workflows

  • Generate only (agent): use POST /v1/instructions from the UI (app.py) or cURL.
  • Render timeline: send edited prompts to POST /v1/render to get a final WAV with crossfades.
  • Per‑segment tweak: call POST /v1/render-segment to overwrite segments/segment_XX.wav.

React Frontend (Optional)

  • Location: frontend/MusicGen
  • Node 18+ recommended.
  • Install and run:
    • cd frontend/MusicGen
    • npm install
    • npm run dev
  • The UI has an input for API Base (default http://127.0.0.1:8080). Ensure the FastAPI server is running.

Files

  • agent.py:1 — LLM agent to expand descriptions into segment prompts.
  • main.py:1 — MusicGen load/generate utilities (transformers/torch).
  • api.py:1 — FastAPI service exposing generation/render routes.
  • app.py:1 — Streamlit client for the API.
  • requirements.txt:1 — Python dependencies.
  • frontend/MusicGen — React + Vite frontend (optional).

Troubleshooting

  • Torch or transformers install issues:
    • Try CPU‑only first: pip install torch --index-url https://download.pytorch.org/whl/cpu
    • Ensure transformers and accelerate are installed; update with pip install -U transformers accelerate.
  • Model download errors:
    • Verify internet access; rerun the API so transformers can fetch facebook/musicgen-small.
  • Agent errors about Groq:
    • Confirm .env is loaded and GROQ_API_KEY is valid.

Notes

  • First runs will download model weights; this may take a while.
  • The DuckDuckGo tool import in agent.py is optional and safely skipped if unavailable.