You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs: rewrite README with complete model docs and real output examples
- Update all code examples to use new Model handle API
- Add per-model documentation with real inference output
- Add pipeline compatibility table (what can/cannot combine)
- Document SenseVoiceSmall + ct-punc tag corruption issue
- Document Fun-ASR-Nano VAD incompatibility (batch_size limitation)
- Add all 19 models in registry with params and descriptions
- Add input methods section (file path, bytes, text)
- Remove outdated name-string API references
Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
> **Note:** SenseVoiceSmall can be combined with `vad_model="fsmn-vad"` to process long audio. Do NOT combine with `punc_model="ct-punc"` — the punctuation model will corrupt the special tags in the output.
87
+
88
+
#### Fun-ASR-Nano
89
+
90
+
End-to-end ASR with built-in punctuation and timestamps. 800M params, supports zh (7 dialects, 26 accents) + en + ja.
91
+
92
+
```python
93
+
nano = asr.load_model("Fun-ASR-Nano")
94
+
result = nano(audio="audio.wav")
95
+
```
96
+
97
+
Output:
98
+
```python
99
+
[{
100
+
"key": "audio",
101
+
"text": "欢迎大家来体验达摩院推出的语音识别模型。", # with punctuation
102
+
"text_tn": "欢迎大家来体验达摩院推出的语音识别模型", # without punctuation
> **Note:** Fun-ASR-Nano is a standalone model. Do NOT combine with `vad_model` or `punc_model`. Fun-ASR-Nano uses autoregressive decoding (token-by-token generation, like GPT), which only supports `batch_size=1`. However, FunASR's VAD pipeline (`inference_with_vad`) automatically sets a large batch size (default 300s worth of audio per batch) to process multiple VAD segments in parallel — this triggers Fun-ASR-Nano's `batch decoding is not implemented` error. This is a FunASR framework limitation, not a fundamental model constraint. Fun-ASR-Nano handles long audio end-to-end internally and does not need external VAD.
112
+
113
+
#### paraformer / paraformer-zh
114
+
115
+
Classic Paraformer ASR. 220M params. `paraformer` is for short audio (max 20s), `paraformer-zh` supports arbitrary length with SeACo.
|`load_model(model, name, ...)`| Load any FunASR model |
91
-
|`unload_model(name)`| Unload a model and free memory |
92
-
|`infer(input, name, **kwargs)`| Run inference on any loaded model |
93
-
|`transcribe(audio, name, **kwargs)`| Convenience alias for ASR |
94
-
|`execute(code)`| Execute arbitrary Python code on the server |
95
-
|`health()`| Check server status |
96
-
|`list_models()`| List loaded models |
289
+
### FunASR Methods
290
+
291
+
| Method | Returns | Description |
292
+
|--------|---------|-------------|
293
+
|`ensure_installed()`|`bool`| Install runtime (one-time). Returns True if already installed. |
294
+
|`start(timeout=60)`|`int`| Start server, returns port number. |
295
+
|`stop()`| - | Stop the server. |
296
+
|`load_model(model, ...)`|`Model`| Load a model, returns a `Model` handle. |
297
+
|`health()`|`dict`| Check server status. |
298
+
|`list_models()`|`dict`| List loaded models. |
299
+
|`execute(code)`|`dict`| Execute Python code on the server. |
300
+
301
+
### `load_model()` Parameters
302
+
303
+
```python
304
+
model = asr.load_model(
305
+
model, # Required: model name ("SenseVoiceSmall", "fsmn-vad", etc.)
306
+
vad_model=None, # VAD model for pipeline
307
+
punc_model=None, # Punctuation model for pipeline
308
+
spk_model=None, # Speaker model for pipeline
309
+
device=None, # "cuda" / "cpu" / None (auto)
310
+
hub=None, # "ms" / "hf" / None (auto)
311
+
quantize=None, # Enable quantization
312
+
fp16=None, # Enable half-precision
313
+
batch_size=None, # Batch size
314
+
disable_update=None, # Skip model update checks
315
+
)
316
+
```
317
+
318
+
### Model Methods
319
+
320
+
```python
321
+
model = asr.load_model("SenseVoiceSmall")
322
+
323
+
# Inference
324
+
result = model.infer(audio="file.wav")
325
+
result = model.infer(audio_bytes=raw_bytes)
326
+
result = model.infer(text="input text")
327
+
328
+
# Shorthand
329
+
result = model(audio="file.wav")
330
+
331
+
# Alias for ASR
332
+
result = model.transcribe(audio="file.wav")
333
+
334
+
# Unload from memory
335
+
model.unload()
336
+
```
337
+
338
+
**Inference parameters** (passed to `infer()` or `__call__()`):
339
+
340
+
| Parameter | Type | Description |
341
+
|-----------|------|-------------|
342
+
|`audio`|`str`| Path to audio file |
343
+
|`audio_bytes`|`bytes`| Raw audio bytes |
344
+
|`text`|`str`| Text input (for punctuation models) |
345
+
|`language`|`str`| Language hint (`"zh"`, `"en"`, `"ja"`, etc.) |
346
+
|`use_itn`|`bool`| Enable inverse text normalization |
347
+
|`batch_size`|`int`| Inference batch size |
348
+
|`hotword`|`str`| Hotword string for biased recognition |
349
+
|`merge_vad`|`bool`| Merge short VAD segments |
350
+
|`merge_length_s`|`float`| Max merge length in seconds (default: 15) |
97
351
98
352
## Architecture
99
353
100
354
```
101
355
Your Application
102
-
│
103
-
│ HTTP (localhost)
104
-
│ JSON-RPC 2.0
105
-
▼
356
+
|
357
+
| HTTP (localhost)
358
+
| JSON-RPC 2.0
359
+
v
106
360
FunASR Server (background process)
107
-
│
108
-
├── Models loaded in memory
109
-
├── Isolated Python environment (uv)
110
-
└── Auto GPU/CPU detection
361
+
|
362
+
|-- Models loaded in memory
363
+
|-- Isolated Python environment (uv)
364
+
+-- Auto GPU/CPU detection
111
365
```
112
366
113
367
The server runs in a completely isolated Python environment managed by `uv`. Your application communicates with it over HTTP using JSON-RPC 2.0 protocol.
0 commit comments