Skip to content

Commit ae530ce

Browse files
committed
prepping transcrive for prod
1 parent 7ff23ad commit ae530ce

File tree

1 file changed

+16
-7
lines changed

1 file changed

+16
-7
lines changed

mkdocs/docs/HPC/transcribe.md

Lines changed: 16 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,6 @@
22

33
## What is Transcribe
44

5-
!!! warning
6-
The `Transcribe` application is currently only available to members of the `gpilot` user group.
7-
85
`Transcribe` is a non-interactive application that offers audio transcription based on `OpenAI` `Whisper` (and derivatives thereoff).
96

107
The main use case is sporadic transcription of audio or video files. There is intentionally no bulk mode (or API or library)
@@ -15,7 +12,7 @@ The supported flow is:
1512
- Upload audio or video file using the `Files` interface of the web portal
1613

1714
- Configure transcription via the `Interactive Apps` -> `Transcribe` application (currently under `Testing` section at the bottom);
18-
you can select `Whisper inputfile` and `Whisper languages`.
15+
you can select `Whisper inputfile`.
1916

2017
- Launch it and wait. Connecting to the running transcription is entirely optional; there is nothing interactive to do.
2118
You will also receive an email when the transcritpion started.
@@ -32,7 +29,7 @@ This is intentionally kept simple. There is also no risk of loosing previous res
3229
## Performance and default settings
3330

3431
The defaults should give the best balance between quality, performance and time to result.
35-
You can expect approx 10 minutes of transcription time per language and per hour of input.
32+
You can expect approx 10 minutes of transcription time per language and per hour of input using the default flavour.
3633
This combined with an almost immediate start time is the best combination for the intended use case.
3734
There should be enough resources available to get this result most of the time.
3835

@@ -43,6 +40,17 @@ There are some advanced options one can choose from. They should not be needed f
4340
They are intended for corner cases, or to compare results between different `Whisper` models and/or different implementation flavours
4441
(`whisper` and `whisper-ctranslate2`) with respect to speed and quality.
4542

43+
### Whisper language
44+
45+
Using the `Automatic detection` (the default), whisper determines the spoken language based on the first 30 seconds of audio.
46+
If, for some reason, the autodection fails (e.g. the input file starts with silence or some music), you can force one of the languages.
47+
48+
When selecting more than one, the transcription will be done separately for each language
49+
(this will also increase your total running time, so you might also want to increase the `Time`).
50+
51+
The selected language will also determine the output language. However, this is ***not*** meant as a translation feature;
52+
although the quality is not that bad if your languages have enough similarity.
53+
4654
### Cluster
4755

4856
Changing the cluster from the interactive cluster will give you access to much better GPU,
@@ -51,12 +59,13 @@ than it will take to complete the transcription on the default cluster.
5159

5260
## Resources
5361

54-
Default settings of 4 cores with at least 10GB of RAM and 1 hour walltime should be enough for most transcriptions.
62+
Default settings of 4 cores with at least 10GB of RAM and 1 hour (wall)time should be enough for most transcriptions.
5563

5664
### Flavour
5765

5866
We currently support 2 flavours: `whisper` (the OpenAI reference implementation), and `whisper-ctranslate2`
59-
(a faster version with some extras).
67+
(a faster version with some extras). Benchmarks indicate that `whisper-ctranslate2` is about 4 times faster than `whisper`,
68+
but might have some lower quality.
6069

6170
### Model
6271

0 commit comments

Comments
 (0)