Skip to content

Commit 3402a73

Browse files
committed
add transcribe diarization option
1 parent 0cd816b commit 3402a73

File tree

1 file changed

+13
-3
lines changed

1 file changed

+13
-3
lines changed

mkdocs/docs/HPC/transcribe.md

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -73,9 +73,18 @@ Default model is `large-v3`, others can be choosen but should be careful to comp
7373

7474
### Flavour
7575

76-
We currently support 2 flavours: `whisper` (the OpenAI reference implementation), and `whisper-ctranslate2`
77-
(a faster version with some extras). Benchmarks indicate that `whisper-ctranslate2` is about 4 times faster than `whisper`,
78-
but might have some lower quality.
76+
We currently support 3 flavours:
77+
- `whisper` the OpenAI reference implementation
78+
- `whisper-ctranslate2` a faster version with some extras
79+
- `WhisperX` a faster version with most features like voice activity detection and speaker diarization
80+
81+
Benchmarks indicate that `whisper-ctranslate2` is about 4 times faster than `whisper`,
82+
but might have some lower quality. `WhisperX` should be on par with `whisper-ctranslate2`.
83+
84+
### Speaker diarization
85+
86+
Speaker diarization (associate words with speaker) is only available for the `WhisperX` flavour.
87+
You must both select the flavour and enable this feature to get the diarization working.
7988

8089
### Task
8190

@@ -91,3 +100,4 @@ than it will take to complete the transcription on the default cluster.
91100
## Resources
92101

93102
Default settings of 4 cores with at least 10GB of RAM and 1 hour (wall)time should be enough for most transcriptions.
103+
But don't forget that translations and diarization add to the total runtime.

0 commit comments

Comments
 (0)