diff --git a/mkdocs/docs/HPC/transcribe.md b/mkdocs/docs/HPC/transcribe.md new file mode 100644 index 000000000000..3f7ad669b490 --- /dev/null +++ b/mkdocs/docs/HPC/transcribe.md @@ -0,0 +1,55 @@ +# Transcribe + +## What is Transcribe + +`Transcribe` is a non-interactive application that offers audio transcription based on `OpenAI` `Whisper` (and derivatives thereoff). + +The main use case is sporadic transcription of audio or video files. There is intentionally no bulk mode (or API or library) +to help with large scale projects. + +The supported flow is: + - Upload audio or video file using the `Files` interface of the web portal + - Configure transcription via the `Interactive Apps` -> `Transcribe` application (currently under `Testing` section at the bottom); + you can select `Whisper inputfile` and `Whisper languages`. + - Launch it and wait. Connecting to the running transcription is entirely optional; there is nothing interactive to do. + You will also receive an email when the transcritpion started. + - Upon completion, you will receive an email with link to the result directory. This info will also be shown in the application session under + `My interactive sessions` (but the session data is only available for a week). + + The result directory has a subdirectory per language with the text files and some metadata in JSON format of the transcritpion itself and input file. + +This is intentionally kept simple. There is also no risk of loosing previous results +(although some previous result directories might get renamed when input file names are reused). + +## Performance and default settings + +The defaults should give the best balance between quality, performance and time to result. +You can expect approx 10 minutes of transcription time per language and per hour of input. +This combined with an almost immediate start time is the best combination for the intended use case. +There should be enough resources available to get this result most of the time. + +## Advanced options + +There are some advanced options one can choose from. They should not be needed for normal usage. + +They are intended for corner cases, or to compare results between different `Whisper` models and/or different implementation flavours +(`whisper` and `whisper-ctranslate2`) with respect to speed and quality. + +### Cluster + +Changing the cluster from the interactive cluster will give you access to much better GPU, +but at a penalty of having to wait in the queue of the other cluster typically for a much longer time +than it will take to complete the transcription on the default cluster. + +## Resources + +Default settings of 4 cores with at least 10GB of RAM and 1 hour walltime should be enough for most transcriptions. + +### Flavour + +We currently support 2 flavours: `whisper` (the OpenAI reference implementation), and `whisper-ctranslate2` +(a faster version with some extras). + +### Model + +Default model is `large-v3`, others can be choosen but should be careful to compare resulting speed and/or quality differences. diff --git a/mkdocs/docs/HPC/web_portal.md b/mkdocs/docs/HPC/web_portal.md index 76b12410714f..c34e211625e2 100644 --- a/mkdocs/docs/HPC/web_portal.md +++ b/mkdocs/docs/HPC/web_portal.md @@ -267,6 +267,10 @@ It is also possible to relaunch a desktop session that has ended by clicking the See [dedicated page on Jupyter notebooks](../jupyter) +#### Transcribe + +See [dedicated page on audio transcription app Transcribe](../transcribe) + ## Restarting your web server in case of problems In case of problems with the web portal, it could help to restart the