Skip to content

Commit f54a439

Browse files
committed
Make it more reviewer-friendly
Update README Follow the behavior in README, in particular, the "sliding window" part Rename variables to easier-to-review names and rewrite if-conditions
1 parent 6302794 commit f54a439

File tree

2 files changed

+144
-121
lines changed

2 files changed

+144
-121
lines changed

examples/stream/README.md

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ https://user-images.githubusercontent.com/1991296/194935793-76afede7-cfa8-48d8-a
1212

1313
## Sliding window mode with VAD
1414

15-
Setting the `--step` argument to `0` enables the sliding window mode:
15+
Setting the `--step` argument to `0` or a negative value enables the sliding window mode:
1616

1717
```bash
1818
./build/bin/whisper-stream -m ./models/ggml-base.en.bin -t 6 --step 0 --length 30000 -vth 0.6
@@ -25,6 +25,17 @@ It's best to tune it to the specific use case, but a value around `0.6` should b
2525
When silence is detected, it will transcribe the last `--length` milliseconds of audio and output
2626
a transcription block that is suitable for parsing.
2727

28+
You can also set the `--interim` argument to force transcription before the VAD detects silence.
29+
30+
```bash
31+
./build/bin/stream -m ./models/ggml-base.en.bin -t 6 --step -2000 --length 10000 -vth 0.6 --interim --keep 200
32+
```
33+
34+
This will transcribe the audio, keeping the last segment unconfirmed, every two seconds
35+
even if the VAD says the speech is still ongoing. In this mode, if the sentence doesn't end
36+
in `--length` milliseconds, the time window will not slide. The audio will be cut there
37+
to be transcribed anyway, keeping the last `--keep` milliseconds for the next inference.
38+
2839
## Building
2940

3041
The `whisper-stream` tool depends on SDL2 library to capture audio from the microphone. You can build it like this:

0 commit comments

Comments
 (0)