|
117 | 117 | │ │ ├── fsq.rs — ResidualFsq (stub, cover mode only) |
118 | 118 | │ │ ├── pooler.rs — AttentionPooler (25Hz→5Hz) |
119 | 119 | │ │ └── detokenizer.rs — AudioTokenDetokenizer (5Hz→25Hz) |
| 120 | +│ ├── lm_planner.rs — LmPlanner (5Hz LM CoT-only), PlannerOutput |
120 | 121 | │ └── generation.rs — AceStepConditionGenerationModel (ODE loop) |
121 | 122 | ├── vae.rs — OobleckDecoder (Snake1d α+β, weight_norm) |
122 | 123 | └── pipeline.rs — end-to-end inference (stub) |
@@ -155,24 +156,28 @@ Module roots (e.g., `src/audio.rs`) contain `mod` declarations and re-exports. N |
155 | 156 | **Socket:** `/tmp/ace-step-gen.sock` (default). Override with `--socket`. |
156 | 157 |
|
157 | 158 | **Protocol:** |
158 | | -- Request: `{"caption":"...", "output":"/tmp/out.ogg", "duration_s":30, ...}` + newline |
| 159 | +- Request: `{"caption":"...", "output":"/tmp/out.ogg", "duration_s":30, "lyrics":"...", "metas":"...", "language":"en", "shift":3.0, "seed":null}` + newline |
159 | 160 | - Response success: `{"ok":true, "path":"...", "duration_s":30, "sample_rate":48000, "channels":2}` + newline |
160 | 161 | - Response error: `{"ok":false, "error":"..."}` + newline |
161 | 162 |
|
| 163 | +**LM planner flag:** Pass `--use-lm` to load the 5Hz LM planner (1.7B, ~3.5GB VRAM). When active, the LM rewrites the caption and fills in BPM, key/scale, time signature, and language before the DiT runs. User-specified `duration_s` always takes priority; the LM suggestion is only used when duration is omitted. Recommended for best quality. |
| 164 | + |
| 165 | +**Socket bind:** The socket is bound before model loading completes, so the socket file appears within ~1s of startup. Connections that arrive during loading queue in the channel and are served once the pipeline is ready. |
| 166 | + |
162 | 167 | **Build:** |
163 | 168 | ```bash |
164 | 169 | LIBRARY_PATH=/usr/lib64:$LIBRARY_PATH PATH="/usr/local/cuda-12.4/bin:$PATH" \ |
165 | 170 | CUDA_HOME=/usr/local/cuda-12.4 NVCC_CCBIN=/usr/bin/g++-13 CPLUS_INCLUDE_PATH="/tmp/cuda-shim" \ |
166 | | -cargo build --release --example generation_daemon --features audio-ogg |
| 171 | +cargo build --release --features audio-all |
167 | 172 | ``` |
168 | 173 |
|
169 | | -**Binary location:** `target/release/examples/generation_daemon` |
| 174 | +**Binary location:** `target/release/generation-daemon` |
170 | 175 |
|
171 | 176 | **Systemd unit:** `~/.config/systemd/user/ace-step-gen.service` (enabled, auto-starts) |
172 | 177 |
|
173 | 178 | **Skill:** `~/.spacebot/skills/generate_music/SKILL.md` — instructs the worker to talk to the daemon socket via `socat`, with CLI binary fallback. |
174 | 179 |
|
175 | | -**Design rationale:** `stream_daemon` (also in this repo) is for live/continuous playback to speakers. `generation_daemon` is for one-shot file generation: user asks for a track, gets a file. The `GenerationManager` (`src/manager.rs`) handles the resident pipeline with OOM retry; `generation_daemon` just wraps it with a socket interface. |
| 180 | +**Design rationale:** `generation_daemon` is for one-shot file generation: user asks for a track, gets a file. The `GenerationManager` (`src/manager.rs`) handles the resident pipeline with OOM retry; `generation_daemon` wraps it with a socket interface. The optional LM planner runs before each request (no separate socket — same process). |
176 | 181 |
|
177 | 182 | ## Reference Repositories |
178 | 183 |
|
|
0 commit comments