Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 55 additions & 0 deletions mkdocs/docs/HPC/transcribe.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# Transcribe

## What is Transcribe

`Transcribe` is a non-interactive application that offers audio transcription based on `OpenAI` `Whisper` (and derivatives thereoff).

The main use case is sporadic transcription of audio or video files. There is intentionally no bulk mode (or API or library)
to help with large scale projects.

The supported flow is:
- Upload audio or video file using the `Files` interface of the web portal
- Configure transcription via the `Interactive Apps` -> `Transcribe` application (currently under `Testing` section at the bottom);
you can select `Whisper inputfile` and `Whisper languages`.
- Launch it and wait. Connecting to the running transcription is entirely optional; there is nothing interactive to do.
You will also receive an email when the transcritpion started.
- Upon completion, you will receive an email with link to the result directory. This info will also be shown in the application session under
`My interactive sessions` (but the session data is only available for a week).

The result directory has a subdirectory per language with the text files and some metadata in JSON format of the transcritpion itself and input file.

This is intentionally kept simple. There is also no risk of loosing previous results
(although some previous result directories might get renamed when input file names are reused).

## Performance and default settings

The defaults should give the best balance between quality, performance and time to result.
You can expect approx 10 minutes of transcription time per language and per hour of input.
This combined with an almost immediate start time is the best combination for the intended use case.
There should be enough resources available to get this result most of the time.

## Advanced options

There are some advanced options one can choose from. They should not be needed for normal usage.

They are intended for corner cases, or to compare results between different `Whisper` models and/or different implementation flavours
(`whisper` and `whisper-ctranslate2`) with respect to speed and quality.

### Cluster

Changing the cluster from the interactive cluster will give you access to much better GPU,
but at a penalty of having to wait in the queue of the other cluster typically for a much longer time
than it will take to complete the transcription on the default cluster.

## Resources

Default settings of 4 cores with at least 10GB of RAM and 1 hour walltime should be enough for most transcriptions.

### Flavour

We currently support 2 flavours: `whisper` (the OpenAI reference implementation), and `whisper-ctranslate2`
(a faster version with some extras).

### Model

Default model is `large-v3`, others can be choosen but should be careful to compare resulting speed and/or quality differences.
4 changes: 4 additions & 0 deletions mkdocs/docs/HPC/web_portal.md
Original file line number Diff line number Diff line change
Expand Up @@ -267,6 +267,10 @@ It is also possible to relaunch a desktop session that has ended by clicking the

See [dedicated page on Jupyter notebooks](../jupyter)

#### Transcribe

See [dedicated page on audio transcription app Transcribe](../transcribe)

## Restarting your web server in case of problems

In case of problems with the web portal, it could help to restart the
Expand Down