Skip to content

Commit d861972

Browse files
committed
update transcribe docs to new version
1 parent 0555367 commit d861972

File tree

1 file changed

+35
-20
lines changed

1 file changed

+35
-20
lines changed

mkdocs/docs/HPC/transcribe.md

Lines changed: 35 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -12,25 +12,31 @@ The supported flow is:
1212
- Upload audio or video file using the `Files` interface of the web portal
1313

1414
- Configure transcription via the `Interactive Apps` -> `Transcribe` application (currently under `Testing` section at the bottom);
15-
you can select `Whisper inputfile`.
15+
you can select `Whisper inputfile` and `Whisper language`.
1616

1717
- Launch it and wait. Connecting to the running transcription is entirely optional; there is nothing interactive to do.
1818
You will also receive an email when the transcritpion started.
1919

2020
- Upon completion, you will receive an email with link to the result directory. This info will also be shown in the application session under
2121
`My interactive sessions` (but the session data is only available for a week).
2222

23-
The result directory has a subdirectory per language with the text files and some metadata in JSON format of the transcritpion itself and input file.
23+
The result directory has a subdirectory per language (transcription and optional translation)
24+
with the text files and some metadata in JSON format of the transcritpion itself and input file.
2425

2526

2627
This is intentionally kept simple. There is also no risk of loosing previous results
2728
(although some previous result directories might get renamed when input file names are reused).
2829

30+
31+
If you have issues, please report them via the `Problems with this session? Submit support ticket` link.
32+
33+
2934
## Performance and default settings
3035

3136
The defaults should give the best balance between quality, performance and time to result.
32-
You can expect approx 10 minutes of transcription time per language and per hour of input using the default flavour.
33-
This combined with an almost immediate start time is the best combination for the intended use case.
37+
You can expect approximate 10 minutes of transcription time per hour of input and
38+
approximate 1 minute per translation language using the default flavour.
39+
This performance combined with an almost immediate start time is the best combination for the intended use case.
3440
There should be enough resources available to get this result most of the time.
3541

3642
## Advanced options
@@ -42,37 +48,46 @@ They are intended for corner cases, or to compare results between different `Whi
4248

4349
### Whisper language
4450

45-
Using the `Automatic detection` (the default), whisper determines the spoken language based on the first 30 seconds of audio.
46-
If, for some reason, the autodection fails (e.g. the input file starts with silence or some music), you can force one of the languages.
47-
48-
When selecting more than one, the transcription will be done separately for each language
49-
(this will also increase your total running time, so you might also want to increase the `Time`).
50-
5151
The selected language will also determine the output language. However, this is ***not*** meant as a translation feature;
5252
although the quality is not that bad if your languages have enough similarity.
5353

54-
### Cluster
54+
### Translation target languages
5555

56-
Changing the cluster from the interactive cluster will give you access to much better GPU,
57-
but at a penalty of having to wait in the queue of the other cluster typically for a much longer time
58-
than it will take to complete the transcription on the default cluster.
56+
Select one or more languages to translate to. Hold the `Ctrl` button pressed while clicking to select (or unselect) languages.
5957

60-
## Resources
58+
!!! warning
59+
Right-to-left languages might generate incorrect subtitles. If you have any examples, you can use the
60+
`Problems with this session? Submit support ticket` to report this so we can investigate properly.
6161

62-
Default settings of 4 cores with at least 10GB of RAM and 1 hour (wall)time should be enough for most transcriptions.
62+
The translation is run after the transcription, and is separate from the Whisper based process.
63+
If you select a language to translate to that is also the `Whisper language`, there will be no translation generated for it.
6364

64-
### Flavour
65+
The default languages are `Dutch` and `English`. If e.g. you have a Dutch spoken video (and select Dutch as the `Whisper language`),
66+
the end result will be a Dutch transcription and an English translation of the Dutch transcription. (And thus no `Dutch-to-Dutch` translation.)
6567

66-
We currently support 2 flavours: `whisper` (the OpenAI reference implementation), and `whisper-ctranslate2`
67-
(a faster version with some extras). Benchmarks indicate that `whisper-ctranslate2` is about 4 times faster than `whisper`,
68-
but might have some lower quality.
68+
There is thus no need to unselect the languages each time based on changing input languages.
6969

7070
### Model
7171

7272
Default model is `large-v3`, others can be choosen but should be careful to compare resulting speed and/or quality differences.
7373

74+
### Flavour
75+
76+
We currently support 2 flavours: `whisper` (the OpenAI reference implementation), and `whisper-ctranslate2`
77+
(a faster version with some extras). Benchmarks indicate that `whisper-ctranslate2` is about 4 times faster than `whisper`,
78+
but might have some lower quality.
79+
7480
### Task
7581

7682
From the selected (or auto-detected) source speech language, you can choose to transcribe to the same language or to `English`.
7783
You can use this last option to translate to English (as opposed to force the detection of the source language as if it was spoken in English).
7884

85+
### Cluster
86+
87+
Changing the cluster from the interactive cluster will give you access to much better GPU,
88+
but at a penalty of having to wait in the queue of the other cluster typically for a much longer time
89+
than it will take to complete the transcription on the default cluster.
90+
91+
## Resources
92+
93+
Default settings of 4 cores with at least 10GB of RAM and 1 hour (wall)time should be enough for most transcriptions.

0 commit comments

Comments
 (0)