Skip to content

Commit b38c6cf

Browse files
committed
Update README for current state; remove unused streaming module
1 parent 5d281a7 commit b38c6cf

File tree

3 files changed

+66
-1379
lines changed

3 files changed

+66
-1379
lines changed

README.md

Lines changed: 66 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,36 @@ Pure Rust implementation of [ACE-Step v1.5](https://github.com/ACE-Step/ACE-Step
44

55
Generates up to 10 minutes of stereo 48kHz audio from text captions and lyrics.
66

7-
## Quick Start
7+
## Binaries
8+
9+
### `ace-step` — CLI
10+
11+
One-shot generation from the command line. Prints a JSON summary to stdout on success.
12+
13+
```bash
14+
ace-step \
15+
--caption "upbeat jazz with piano and drums, bpm: 120, key: C major" \
16+
--lyrics "[verse]\nWalking down the street on a sunny day" \
17+
--duration 30 \
18+
--output output.ogg
19+
```
20+
21+
### `generation-daemon` — Unix socket daemon
22+
23+
Keeps the pipeline resident in VRAM across requests. Each client sends one JSON request line and receives one JSON response line.
24+
25+
Socket: `/tmp/ace-step-gen.sock` (override with `--socket`).
26+
27+
```sh
28+
echo '{"caption":"ambient piano","duration_s":20,"output":"/tmp/piano.ogg"}' \
29+
| socat - UNIX-CONNECT:/tmp/ace-step-gen.sock
30+
```
31+
32+
Uses the `GenerationManager` internally — monitors VRAM, proactively offloads to CPU on low memory, retries on CUDA OOM. Exits on unrecoverable failure so systemd can restart.
33+
34+
Requires the `audio-ogg` feature.
35+
36+
## Library
837

938
```rust
1039
use ace_step_rs::pipeline::{AceStepPipeline, GenerationParams};
@@ -15,22 +44,34 @@ fn main() -> ace_step_rs::Result<()> {
1544

1645
let params = GenerationParams {
1746
caption: "upbeat jazz with piano and drums".to_string(),
18-
metas: "bpm: 120, key: C major, genre: jazz".to_string(),
1947
lyrics: "[verse]\nWalking down the street on a sunny day\n".to_string(),
20-
language: "en".to_string(),
2148
duration_s: 30.0,
2249
..Default::default()
2350
};
2451

2552
let audio = pipeline.generate(&params)?;
26-
ace_step_rs::audio::write_wav("output.wav", &audio.samples, audio.sample_rate, audio.channels)?;
53+
ace_step_rs::audio::write_audio("output.wav", &audio.samples, audio.sample_rate, audio.channels)?;
2754

2855
Ok(())
2956
}
3057
```
3158

3259
Model weights (~6GB) are downloaded automatically from [ACE-Step/Ace-Step1.5](https://huggingface.co/ACE-Step/Ace-Step1.5) on first run and cached in `~/.cache/huggingface/`.
3360

61+
### Key modules
62+
63+
| Module | Description |
64+
|--------|-------------|
65+
| `pipeline` | End-to-end inference: text encoding → diffusion → VAE decode |
66+
| `manager` | `GenerationManager` — keeps the pipeline resident, queues requests, VRAM monitoring + OOM retry |
67+
| `radio` | `RadioStation` — whole-song generation with request queue, auto-duration from lyrics |
68+
| `audio` | WAV/OGG/MP3 I/O |
69+
| `vae` | AutoencoderOobleck decoder (latent → 48kHz stereo waveform) |
70+
71+
### Radio station
72+
73+
`RadioStation` manages a song request queue and generates complete tracks sequentially. Duration is auto-estimated from lyrics (8s per line, clamped to 100–600s). The `radio_daemon` example wires this to cpal audio output with gapless double-buffered playback, Unix socket control, and skip/queue/history commands.
74+
3475
## Architecture
3576

3677
```
@@ -65,6 +106,12 @@ Requires CUDA toolkit 12.x and a compatible NVIDIA GPU.
65106
cargo build --release --features cuda
66107
```
67108

109+
For cuDNN-accelerated ConvTranspose1d (faster VAE decode):
110+
111+
```bash
112+
cargo build --release --features cudnn
113+
```
114+
68115
Depending on your system, you may need additional environment variables for the CUDA build — see [AGENTS.md](AGENTS.md) for platform-specific notes.
69116

70117
### Metal (macOS)
@@ -75,12 +122,23 @@ cargo build --release --features metal
75122

76123
Note: Metal support is provided by candle but has not been tested with this project.
77124

125+
### Building the daemon
126+
127+
```bash
128+
cargo build --release --bin generation-daemon --features cuda,audio-ogg
129+
```
130+
78131
## Features
79132

80133
| Feature | Default | Description |
81134
|---------|---------|-------------|
82-
| `cuda` | yes | NVIDIA GPU acceleration via CUDA + cuDNN |
135+
| `cuda` | yes | NVIDIA GPU acceleration via CUDA |
136+
| `cudnn` | no | cuDNN-accelerated ConvTranspose1d (implies `cuda`) |
83137
| `metal` | no | Apple GPU acceleration via Metal |
138+
| `cli` | no | Audio playback + terminal input (cpal, rodio, crossterm) |
139+
| `audio-ogg` | no | OGG/Vorbis encoding (required by `generation-daemon`) |
140+
| `audio-mp3` | no | MP3 encoding |
141+
| `audio-all` | no | All audio encoders |
84142

85143
## `GenerationParams`
86144

@@ -93,6 +151,9 @@ Note: Metal support is provided by candle but has not been tested with this proj
93151
| `duration_s` | `f64` | `30.0` | Output duration in seconds (max 600) |
94152
| `shift` | `f64` | `3.0` | Turbo schedule shift (1, 2, or 3) |
95153
| `seed` | `Option<u64>` | `None` | Random seed for reproducibility |
154+
| `src_latents` | `Option<Tensor>` | `None` | Source latents for repaint/inpainting |
155+
| `chunk_masks` | `Option<Tensor>` | `None` | Mask for repaint (0 = keep, 1 = generate) |
156+
| `refer_audio` | `Option<Tensor>` | `None` | Reference audio latents for timbre conditioning |
96157

97158
## Performance
98159

src/lib.rs

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,6 @@ pub mod manager;
2626
pub mod model;
2727
pub mod pipeline;
2828
pub mod radio;
29-
pub mod streaming;
3029
pub mod vae;
3130

3231
mod error;

0 commit comments

Comments
 (0)