initial explanation of the transcribe app

stdweird · stdweird · commit e25301daa88a · 2025-01-06T15:06:14.000+01:00
diff --git a/mkdocs/docs/HPC/transcribe.md b/mkdocs/docs/HPC/transcribe.md
@@ -0,0 +1,55 @@
+# Transcribe
+
+## What is Transcribe
+
+`Transcribe` is a non-interactive application that offers audio transcription based on `OpenAI` `Whisper` (and derivatives thereoff).
+
+The main use case is sporadic transcription of audio or video files. There is intentionally no bulk mode (or API or library)
+to help with large scale projects.
+
+The supported flow is:
+ - Upload audio or video file using the `Files` interface of the web portal
+ - Configure transcription via the `Interactive Apps` -> `Transcribe` application (currently under `Testing` section at the bottom);
+   you can select `Whisper inputfile` and `Whisper languages`.
+ - Launch it and wait. Connecting to the running transcription is entirely optional; there is nothing interactive to do.
+   You will also receive an email when the transcritpion started.
+ - Upon completion, you will receive an email with link to the result directory. This info will also be shown in the application session under
+   `My interactive sessions` (but the session data is only available for a week).
+
+   The result directory has a subdirectory per language with the text files and some metadata in JSON format of the transcritpion itself and input file.
+
+This is intentionally kept simple. There is also no risk of loosing previous results
+(although some previous result directories might get renamed when input file names are reused).
+
+## Performance and default settings
+
+The defaults should give the best balance between quality, performance and time to result.
+You can expect approx 10 minutes of transcription time per language and per hour of input.
+This combined with an almost immediate start time is the best combination for the intended use case.
+There should be enough resources available to get this result most of the time.
+
+## Advanced options
+
+There are some advanced options one can choose from. They should not be needed for normal usage.
+
+They are intended for corner cases, or to compare results between different `Whisper` models and/or different implementation flavours
+(`whisper` and `whisper-ctranslate2`) with respect to speed and quality.
+
+### Cluster
+
+Changing the cluster from the interactive cluster will give you access to much better GPU,
+but at a penalty of having to wait in the queue of the other cluster typically for a much longer time
+than it will take to complete the transcription on the default cluster.
+
+## Resources
+
+Default settings of 4 cores with at least 10GB of RAM and 1 hour walltime should be enough for most transcriptions.
+
+### Flavour
+
+We currently support 2 flavours: `whisper` (the OpenAI reference implementation), and `whisper-ctranslate2`
+(a faster version with some extras).
+
+### Model
+
+Default model is `large-v3`, others can be choosen but should be careful to compare resulting speed and/or quality differences.
diff --git a/mkdocs/docs/HPC/web_portal.md b/mkdocs/docs/HPC/web_portal.md
@@ -267,6 +267,10 @@ It is also possible to relaunch a desktop session that has ended by clicking the
 
 See [dedicated page on Jupyter notebooks](../jupyter)
 
+#### Transcribe
+
+See [dedicated page on audio transcription app Transcribe](../transcribe)
+
 ## Restarting your web server in case of problems
 
 In case of problems with the web portal, it could help to restart the