The backend for our realtime translation project. Expected to be ran alongside the frontend.
This project is using the wanjawischmeier/WhisperLiveKit fork of QuentinFuxa's Whisper wrapper to transcribe audio locally and in realtime. It is able to translate this transcript into a list of dynamically requested languages using LibreTranslate and send out transcript chunks to the respective frontends using a websocket connection. This pipeline is able to support multiple streamers and viewers in a room system. When streamers connect to and activate a room, they are able to send their microphone audio to the server for processing.
- Python 3.9.23 (pyenv)
- Poetry
- FFmpeg
sudo apt-get install ffmpeg
# If using pyenv
pyenv install 3.9.23 # if not installed already
pyenv local 3.9.23
poetry env use /home/username/.pyenv/versions/3.9.23/bin/python# With predefined parameters
bash backend.sh
# Or manually
poetry run python src/whisper_server.py-vac # Very important, should be always on
--buffer-trimming sentence # waits for sentence to be finished before processing
--buffer-trimming segment # processes after certain amount of time without waiting for context
# Segment is more stable when people speak very fast without breaks
# Sentence is a bit more accurate, but may cause lag when people speak too fast
--confidence-validation # Makes it a lot faster but slightly less accurate
--punctuation-split # Adds points between each chunk, doesnt matter if its a sentence or not
--min-chunk-size 1 # default 1, slightly lower or higher can tweak it a bit - higher leads to cut sentences, lower to more accuracy, but increases workload for GPU
--device e.g. cuda # run via cpu or gpu
--compute-type float16/float32 # float32 is more precise but takes more computing power - depends on GPU architecture
- http://localhost:3000: Umami frontend stats
- http://localhost:8090: Beszel backend performance stats
- http://localhost:5000: LibreTranslate instance
- http://localhost:8000: FastAPI backend for http traffic
GET /health: Health check, returns statusGET /room_list: Returns a room listGET /vote: Get vote listGET /vote/{id}/{action}: Action can beaddorremovePOST /auth: Checks password, returns resultPOST /transcript_list: Returns a list of transcript infosPOST /room/{room_id}/transcript/{target_lang}: Compiles and returns the entire transcript of a given room in thetarget_langas a string. Joins all partial transcripts available for that room.POST /room/{room_id}/close: Closes that room, can only be performed with admin password.
ws://localhost:8000/room/{room_id}/{role}/{source_lang}/{target_lang}- FastAPI websocket for handling streaming
- Bidirectional
- expects audio stream from host (
audio/webm;codecs=opus) - sends all available transcriptions to host and clients in chunks
- expects audio stream from host (
- Expects correct password in
authenticatedcookie, otherwise refuses connection - Parameters
room_id: unique room identifierrole: Can behostorclientsource_lang/target_lang: The respective country codes, e.g.de,enen
endpoints:
- name: frontend
upstream:
url: 5173
- name: backend>
url: https://dynamic-freely-chigger.ngrok-free.app
upstream:
url: 8000Start using
ngrok start --all
{
# Languages available for transcription by the whisper engine
"available_source_langs": [
"de",
"en",
# ...
],
# Languages that can be translated into by LibreTranslate
"available_target_langs": [
"ar",
"az",
# ...
],
# The maximum number of rooms that can be handled by the hardware simultaniously
"max_active_rooms": 2,
# List of all rooms that are relevant at this point in time
"rooms": [
{
# Information provided per room
"id": "",
"title": "",
"description": "",
"track": "",
"location": "",
"presenter": "",
"host_connection_id": "",
"source_lang": ""
}
]
}{
"last_n_sents": [
{
"line_idx": 0,
"beg": 0,
"end": 13,
"speaker": -1,
"sentences": [
{
"sent_idx": 0,
"content": {
"en": "",
"de": "",
}
},
{
"sent_idx": 1,
"content": {
"en": "",
# NOTE: Not all sentences will be available in the same languages, as translation happens asynchronously
}
},
{
"sent_idx": 2,
"content": {
"en": "",
"de": "",
}
}
]
}
],
"incomplete_sentence": "",
"transcription_delay": 10.610000000000001,
"translation_delay": 0
}# If server is ready to accept requests
{"status": "ok"}
# If server is running, but not ready to accept requests
{"status": "not ready"}# If password is valid
{"status": "ok"}
# If password is invalid
{"status": "fail"}[
{
"id": "room_id_0",
"firstChunkTimestamp": 0,
"lastChunkTimestamp": 0
},
{
"id": "room_id_1",
"firstChunkTimestamp": 0,
"lastChunkTimestamp": 0
},
# ...
]Used for tracking certain events and pageviews coming in from the frontend.
To run:
cd stats/umami
docker compose up -dUsed for tracking backend performance metrics (gpu utilization etc.)
To run:
# To start the beszel server
cd stats/beszel
docker compose up -d
# To start the agent instance for the current system
cd agent # in stats/beszel/agent
docker compose up -d- Whisper Engine an Rauminstanzen binden
- Räume richtig öffnen/schließen
- Ein Raum wird geöffnet wenn der Host joint
- Ein Raum wird geschlossen, wenn der host rausgegangen ist (+ 5 minpuffer, sodass Host neu reingehen kann falls mensch nur kurz rausfliegt)
- Wenn sich die Host-Sprache ändert (erfordert neustart der engine),soll der host aus dem raum rausgehen und mit der neuen Sprache neu reingehen
- Wenn der host einem bereits offenen raum mit geänderten parametern joint, wird der raum vom room manager neu gestartet
- Send "ready" packet
- Eine Restart-Option für Räume im Frontend implementieren
- Websocket connects/disconnects handlen und Bugs fixen
- Unique host id
- Fix: Client disconnects dont get recognized correctly
- Fix: Rooms get prematurely closed upon host reconnects
- Preserve source lang across host reconnects
- Everyone should get kicked out of room if it closes
- Fix host disconnect after long time
- Raumliste an frontend schicken (Endpoint)
- Auth cookie zum Authentifizieren nutzen
- Check if room is "DO-NOT-RECORD" and prevent activating it
- Use AVAILABLE_WHISPER_LANGS & AVAILABLE_LT_LANGS to verify frontend requests
- Endpoint to fetch human readable transcript for room (join all partial transcripts, with date timestamp)
- Provide endpoint
- Join all partial transcripts
- Load from memory or from disk if thats not available
- Endpoint to provide list of all room id's that have transcripts stored to disk
- Available as transcript info at /transcript_list
- Also store and provide room metadata alongside (@whoami)
- Respect user preferences on wether to store transcripts (@substatoo)
- Respect user preferences on wether clients can download transcripts (@substatoo)
- Respect whisper instance limit when activating rooms
- Whisper
device, compute_typepassthrough to cli from custom WhisperLiveKit fork - Support whisper model unloading (in custom fork)
- Propably fine, now handled by gc
- Performance monitoring
- https://beszel.dev/guide/gpu
- (Write stats to log file? Not strictly necessary) -> Is now in umami
- Docker compose is set up in
stats/beszel
- Umami stats
- Docker compose is set up in
stats/umami
- Docker compose is set up in
- Fix country coding in transcription chunks
- No longer provide default sentence, instead make a sentences
contentfield a dict of country codes
- No longer provide default sentence, instead make a sentences
- Move whisper engine to seperate process
- Proper target langs subscribe/unsubscribe
- Prevent doubling of target langs
- Ignore target langs that are equal to source lang (don't add to list)
- Send initial transcript chunk on client connection
- Move transcript and room system to seperate files in dedicated dirs
- Pace translation worker (@substratoo)
- As of now will just work through all sentences in one loop if a new language gets subscribed to
- Add admin acc
- Ability to force close rooms as admin
- Help markdown file (@whoami)
- Translation worker should only try to fetch the most recent n sentences (in reverse order, so most recent first)
- Fix: Ending process does not work properly some threads seems to stay running
- Fix CTRL-C
- (Pause fetch loop when connected host is not streaming?)
- Fix: Multiple hosts not allowed error
- Very rare, have not been able to pin it down
- Is maybe fine for now as rooms can be restarted
- Convert pickle files for transcripts into conventional database implementation