|
| 1 | +# Transcribe |
| 2 | + |
| 3 | +## What is Transcribe |
| 4 | + |
| 5 | +`Transcribe` is a non-interactive application that offers audio transcription based on `OpenAI` `Whisper` (and derivatives thereoff). |
| 6 | + |
| 7 | +The main use case is sporadic transcription of audio or video files. There is intentionally no bulk mode (or API or library) |
| 8 | +to help with large scale projects. |
| 9 | + |
| 10 | +The supported flow is: |
| 11 | + - Upload audio or video file using the `Files` interface of the web portal |
| 12 | + - Configure transcription via the `Interactive Apps` -> `Transcribe` application (currently under `Testing` section at the bottom); |
| 13 | + you can select `Whisper inputfile` and `Whisper languages`. |
| 14 | + - Launch it and wait. Connecting to the running transcription is entirely optional; there is nothing interactive to do. |
| 15 | + You will also receive an email when the transcritpion started. |
| 16 | + - Upon completion, you will receive an email with link to the result directory. This info will also be shown in the application session under |
| 17 | + `My interactive sessions` (but the session data is only available for a week). |
| 18 | + |
| 19 | + The result directory has a subdirectory per language with the text files and some metadata in JSON format of the transcritpion itself and input file. |
| 20 | + |
| 21 | +This is intentionally kept simple. There is also no risk of loosing previous results |
| 22 | +(although some previous result directories might get renamed when input file names are reused). |
| 23 | + |
| 24 | +## Performance and default settings |
| 25 | + |
| 26 | +The defaults should give the best balance between quality, performance and time to result. |
| 27 | +You can expect approx 10 minutes of transcription time per language and per hour of input. |
| 28 | +This combined with an almost immediate start time is the best combination for the intended use case. |
| 29 | +There should be enough resources available to get this result most of the time. |
| 30 | + |
| 31 | +## Advanced options |
| 32 | + |
| 33 | +There are some advanced options one can choose from. They should not be needed for normal usage. |
| 34 | + |
| 35 | +They are intended for corner cases, or to compare results between different `Whisper` models and/or different implementation flavours |
| 36 | +(`whisper` and `whisper-ctranslate2`) with respect to speed and quality. |
| 37 | + |
| 38 | +### Cluster |
| 39 | + |
| 40 | +Changing the cluster from the interactive cluster will give you access to much better GPU, |
| 41 | +but at a penalty of having to wait in the queue of the other cluster typically for a much longer time |
| 42 | +than it will take to complete the transcription on the default cluster. |
| 43 | + |
| 44 | +## Resources |
| 45 | + |
| 46 | +Default settings of 4 cores with at least 10GB of RAM and 1 hour walltime should be enough for most transcriptions. |
| 47 | + |
| 48 | +### Flavour |
| 49 | + |
| 50 | +We currently support 2 flavours: `whisper` (the OpenAI reference implementation), and `whisper-ctranslate2` |
| 51 | +(a faster version with some extras). |
| 52 | + |
| 53 | +### Model |
| 54 | + |
| 55 | +Default model is `large-v3`, others can be choosen but should be careful to compare resulting speed and/or quality differences. |
0 commit comments