Skip to content

Commit e25301d

Browse files
committed
initial explanation of the transcribe app
1 parent 4f3d7ad commit e25301d

File tree

2 files changed

+59
-0
lines changed

2 files changed

+59
-0
lines changed

mkdocs/docs/HPC/transcribe.md

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
# Transcribe
2+
3+
## What is Transcribe
4+
5+
`Transcribe` is a non-interactive application that offers audio transcription based on `OpenAI` `Whisper` (and derivatives thereoff).
6+
7+
The main use case is sporadic transcription of audio or video files. There is intentionally no bulk mode (or API or library)
8+
to help with large scale projects.
9+
10+
The supported flow is:
11+
- Upload audio or video file using the `Files` interface of the web portal
12+
- Configure transcription via the `Interactive Apps` -> `Transcribe` application (currently under `Testing` section at the bottom);
13+
you can select `Whisper inputfile` and `Whisper languages`.
14+
- Launch it and wait. Connecting to the running transcription is entirely optional; there is nothing interactive to do.
15+
You will also receive an email when the transcritpion started.
16+
- Upon completion, you will receive an email with link to the result directory. This info will also be shown in the application session under
17+
`My interactive sessions` (but the session data is only available for a week).
18+
19+
The result directory has a subdirectory per language with the text files and some metadata in JSON format of the transcritpion itself and input file.
20+
21+
This is intentionally kept simple. There is also no risk of loosing previous results
22+
(although some previous result directories might get renamed when input file names are reused).
23+
24+
## Performance and default settings
25+
26+
The defaults should give the best balance between quality, performance and time to result.
27+
You can expect approx 10 minutes of transcription time per language and per hour of input.
28+
This combined with an almost immediate start time is the best combination for the intended use case.
29+
There should be enough resources available to get this result most of the time.
30+
31+
## Advanced options
32+
33+
There are some advanced options one can choose from. They should not be needed for normal usage.
34+
35+
They are intended for corner cases, or to compare results between different `Whisper` models and/or different implementation flavours
36+
(`whisper` and `whisper-ctranslate2`) with respect to speed and quality.
37+
38+
### Cluster
39+
40+
Changing the cluster from the interactive cluster will give you access to much better GPU,
41+
but at a penalty of having to wait in the queue of the other cluster typically for a much longer time
42+
than it will take to complete the transcription on the default cluster.
43+
44+
## Resources
45+
46+
Default settings of 4 cores with at least 10GB of RAM and 1 hour walltime should be enough for most transcriptions.
47+
48+
### Flavour
49+
50+
We currently support 2 flavours: `whisper` (the OpenAI reference implementation), and `whisper-ctranslate2`
51+
(a faster version with some extras).
52+
53+
### Model
54+
55+
Default model is `large-v3`, others can be choosen but should be careful to compare resulting speed and/or quality differences.

mkdocs/docs/HPC/web_portal.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -267,6 +267,10 @@ It is also possible to relaunch a desktop session that has ended by clicking the
267267

268268
See [dedicated page on Jupyter notebooks](../jupyter)
269269

270+
#### Transcribe
271+
272+
See [dedicated page on audio transcription app Transcribe](../transcribe)
273+
270274
## Restarting your web server in case of problems
271275

272276
In case of problems with the web portal, it could help to restart the

0 commit comments

Comments
 (0)