Skip to content

Commit e3b3e1e

Browse files
committed
Update AGENTS.md for LM planner and daemon changes
1 parent 2d82f56 commit e3b3e1e

File tree

1 file changed

+9
-4
lines changed

1 file changed

+9
-4
lines changed

AGENTS.md

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -117,6 +117,7 @@ src/
117117
│ │ ├── fsq.rs — ResidualFsq (stub, cover mode only)
118118
│ │ ├── pooler.rs — AttentionPooler (25Hz→5Hz)
119119
│ │ └── detokenizer.rs — AudioTokenDetokenizer (5Hz→25Hz)
120+
│ ├── lm_planner.rs — LmPlanner (5Hz LM CoT-only), PlannerOutput
120121
│ └── generation.rs — AceStepConditionGenerationModel (ODE loop)
121122
├── vae.rs — OobleckDecoder (Snake1d α+β, weight_norm)
122123
└── pipeline.rs — end-to-end inference (stub)
@@ -155,24 +156,28 @@ Module roots (e.g., `src/audio.rs`) contain `mod` declarations and re-exports. N
155156
**Socket:** `/tmp/ace-step-gen.sock` (default). Override with `--socket`.
156157

157158
**Protocol:**
158-
- Request: `{"caption":"...", "output":"/tmp/out.ogg", "duration_s":30, ...}` + newline
159+
- Request: `{"caption":"...", "output":"/tmp/out.ogg", "duration_s":30, "lyrics":"...", "metas":"...", "language":"en", "shift":3.0, "seed":null}` + newline
159160
- Response success: `{"ok":true, "path":"...", "duration_s":30, "sample_rate":48000, "channels":2}` + newline
160161
- Response error: `{"ok":false, "error":"..."}` + newline
161162

163+
**LM planner flag:** Pass `--use-lm` to load the 5Hz LM planner (1.7B, ~3.5GB VRAM). When active, the LM rewrites the caption and fills in BPM, key/scale, time signature, and language before the DiT runs. User-specified `duration_s` always takes priority; the LM suggestion is only used when duration is omitted. Recommended for best quality.
164+
165+
**Socket bind:** The socket is bound before model loading completes, so the socket file appears within ~1s of startup. Connections that arrive during loading queue in the channel and are served once the pipeline is ready.
166+
162167
**Build:**
163168
```bash
164169
LIBRARY_PATH=/usr/lib64:$LIBRARY_PATH PATH="/usr/local/cuda-12.4/bin:$PATH" \
165170
CUDA_HOME=/usr/local/cuda-12.4 NVCC_CCBIN=/usr/bin/g++-13 CPLUS_INCLUDE_PATH="/tmp/cuda-shim" \
166-
cargo build --release --example generation_daemon --features audio-ogg
171+
cargo build --release --features audio-all
167172
```
168173

169-
**Binary location:** `target/release/examples/generation_daemon`
174+
**Binary location:** `target/release/generation-daemon`
170175

171176
**Systemd unit:** `~/.config/systemd/user/ace-step-gen.service` (enabled, auto-starts)
172177

173178
**Skill:** `~/.spacebot/skills/generate_music/SKILL.md` — instructs the worker to talk to the daemon socket via `socat`, with CLI binary fallback.
174179

175-
**Design rationale:** `stream_daemon` (also in this repo) is for live/continuous playback to speakers. `generation_daemon` is for one-shot file generation: user asks for a track, gets a file. The `GenerationManager` (`src/manager.rs`) handles the resident pipeline with OOM retry; `generation_daemon` just wraps it with a socket interface.
180+
**Design rationale:** `generation_daemon` is for one-shot file generation: user asks for a track, gets a file. The `GenerationManager` (`src/manager.rs`) handles the resident pipeline with OOM retry; `generation_daemon` wraps it with a socket interface. The optional LM planner runs before each request (no separate socket — same process).
176181

177182
## Reference Repositories
178183

0 commit comments

Comments
 (0)