Getting started

The backend for our realtime translation project. Expected to be ran alongside the frontend.

This project is using the wanjawischmeier/WhisperLiveKit fork of QuentinFuxa's Whisper wrapper to transcribe audio locally and in realtime. It is able to translate this transcript into a list of dynamically requested languages using LibreTranslate and send out transcript chunks to the respective frontends using a websocket connection. This pipeline is able to support multiple streamers and viewers in a room system. When streamers connect to and activate a room, they are able to send their microphone audio to the server for processing.

Getting started

Dependencies

Python 3.9.23 (pyenv)
Poetry
FFmpeg

sudo apt-get install ffmpeg

# If using pyenv
pyenv install 3.9.23 # if not installed already
pyenv local 3.9.23
poetry env use /home/username/.pyenv/versions/3.9.23/bin/python

Run using

# With predefined parameters
bash backend.sh

# Or manually
poetry run python src/whisper_server.py

Parameter explanation

-vac # Very important, should be always on
--buffer-trimming sentence # waits for sentence to be finished before processing
--buffer-trimming segment # processes after certain amount of time without waiting for context
# Segment is more stable when people speak very fast without breaks
# Sentence is a bit more accurate, but may cause lag when people speak too fast
--confidence-validation # Makes it a lot faster but slightly less accurate
--punctuation-split # Adds points between each chunk, doesnt matter if its a sentence or not
--min-chunk-size 1 # default 1, slightly lower or higher can tweak it a bit - higher leads to cut sentences, lower to more accuracy, but increases workload for GPU
--device e.g. cuda # run via cpu or gpu
--compute-type float16/float32 # float32 is more precise but takes more computing power - depends on GPU architecture

Architecture

Endpoints

http://localhost:3000: Umami frontend stats
http://localhost:8090: Beszel backend performance stats
http://localhost:5000: LibreTranslate instance
http://localhost:8000: FastAPI backend for http traffic
- GET /health: Health check, returns status
- GET /room_list: Returns a room list
- GET /vote: Get vote list
- GET /vote/{id}/{action}: Action can be add or remove
- POST /auth: Checks password, returns result
- POST /transcript_list: Returns a list of transcript infos
- POST /room/{room_id}/transcript/{target_lang}: Compiles and returns the entire transcript of a given room in the target_lang as a string. Joins all partial transcripts available for that room.
- POST /room/{room_id}/close: Closes that room, can only be performed with admin password.
ws://localhost:8000/room/{room_id}/{role}/{source_lang}/{target_lang}
- FastAPI websocket for handling streaming
- Bidirectional
  - expects audio stream from host (audio/webm;codecs=opus)
  - sends all available transcriptions to host and clients in chunks
- Expects correct password in authenticated cookie, otherwise refuses connection
- Parameters
  - room_id: unique room identifier
  - role: Can be host or client
  - source_lang/target_lang: The respective country codes, e.g. de, en en

Ngrok config:

endpoints:
  - name: frontend
    upstream:
      url: 5173
  - name: backend>
    url: https://dynamic-freely-chigger.ngrok-free.app
    upstream:
      url: 8000

Start using

ngrok start --all

Data structures

Room list

{
  # Languages available for transcription by the whisper engine
  "available_source_langs": [
    "de",
    "en",
    # ...
  ],

  # Languages that can be translated into by LibreTranslate
  "available_target_langs": [
    "ar",
    "az",
    # ...
  ],

  # The maximum number of rooms that can be handled by the hardware simultaniously
  "max_active_rooms": 2,

  # List of all rooms that are relevant at this point in time
  "rooms": [
    {
      # Information provided per room
      "id": "",
      "title": "",
      "description": "",
      "track": "",
      "location": "",
      "presenter": "",
      "host_connection_id": "",
      "source_lang": ""
    }
  ]
}

Transcript chunk

{
  "last_n_sents": [
    {
      "line_idx": 0,
      "beg": 0,
      "end": 13,
      "speaker": -1,
      "sentences": [
        {
          "sent_idx": 0,
          "content": {
            "en": "",
            "de": "",
          }
        },
        {
          "sent_idx": 1,
          "content": {
            "en": "",
            # NOTE: Not all sentences will be available in the same languages, as translation happens asynchronously
          }
        },
        {
          "sent_idx": 2,
          "content": {
            "en": "",
            "de": "",
          }
        }
      ]
    }
  ],
  "incomplete_sentence": "",
  "transcription_delay": 10.610000000000001,
  "translation_delay": 0
}

Health check

# If server is ready to accept requests
{"status": "ok"}

# If server is running, but not ready to accept requests
{"status": "not ready"}

Auth check

# If password is valid
{"status": "ok"}

# If password is invalid
{"status": "fail"}

Transcript infos

[
  {
    "id": "room_id_0",
    "firstChunkTimestamp": 0,
    "lastChunkTimestamp": 0
  },
  {
    "id": "room_id_1",
    "firstChunkTimestamp": 0,
    "lastChunkTimestamp": 0
  },
  # ...
]

Umami

Used for tracking certain events and pageviews coming in from the frontend.

To run:

cd stats/umami
docker compose up -d

Beszel

Used for tracking backend performance metrics (gpu utilization etc.)

To run:

# To start the beszel server
cd stats/beszel
docker compose up -d

# To start the agent instance for the current system
cd agent # in stats/beszel/agent
docker compose up -d

TODOs

Important

For potential future updates

Fix: Ending process does not work properly some threads seems to stay running
- Fix CTRL-C
(Pause fetch loop when connected host is not streaming?)
Fix: Multiple hosts not allowed error
- Very rare, have not been able to pin it down
- Is maybe fine for now as rooms can be restarted
Convert pickle files for transcripts into conventional database implementation

Name		Name	Last commit message	Last commit date
Latest commit History 228 Commits
.vscode		.vscode
src		src
stats		stats
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
config.yml		config.yml
docker-compose.yaml		docker-compose.yaml
kill.sh		kill.sh
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Getting started

Dependencies

Run using

Parameter explanation

Architecture

Endpoints

Ngrok config:

Data structures

Room list

Transcript chunk

Health check

Auth check

Transcript infos

Umami

Beszel

TODOs

Important

For potential future updates

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Getting started

Dependencies

Run using

Parameter explanation

Architecture

Endpoints

Ngrok config:

Data structures

Room list

Transcript chunk

Health check

Auth check

Transcript infos

Umami

Beszel

TODOs

Important

For potential future updates

About

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages