GitHub - mattjamo/UniversalTranslator-Omnilingual-ASR: A universal translator application utilizing Omnilingual ASR for voice transcription and any LLM using Ollama

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Dockerfile		Dockerfile
README.txt		README.txt
app.py		app.py
requirements.txt		requirements.txt

Repository files navigation

# Universal Translator with Omnilingual ASR

## Description

Universal Translator is a powerful application that leverages the state-of-the-art **Omnilingual-ASR** model (Meta's Massively Multilingual Speech) to provide high-accuracy speech recognition and translation. 
It supports **1,600+ languages** for transcription and utilizes a modern interface built with **NiceGUI**.
It utilizes any **Ollama** based LLM model for translation.

**Key Features:**
*   **Massive Language Support:** Transcribe audio from over 1,600 languages.
*   **Web Interface:** User-friendly GUI for recording, uploading files, and managing models.
*   **Local Processing:** Runs entirely locally on your machine (GPU recommended).
*   **Translation:** Integrated AI translation capabilities using Ollama.
*   **Real-time Recording:** Record efficiently directly from your browser.
*   **Model Management:** Download and manage different sizes of the Omnilingual model (300M, 1B, 3B, 7B) directly from the UI.

---

## Running Locally

To run the application directly, follow these steps.

### Prerequisites

*   **Python 3.10** or higher
*   **FFmpeg**: Required for audio processing.
    *   *Windows*: Download from [ffmpeg.org](https://ffmpeg.org/download.html) and add to your system PATH.
    *   *Linux*: `sudo apt install ffmpeg`
    *   *macOS*: `brew install ffmpeg`
*   **CUDA (Optional but Recommended)**: For GPU acceleration. Without a GPU, inference will be slow.

    git clone <your-repository-url>
    cd UniversalTranslator-Omnilingual-ASR
    ```

2.  **Create a Virtual Environment (Recommended)**
    
    python -m venv venv
    .\venv\Scripts\activate

    # Linux/macOS
    python3 -m venv venv
    source venv/bin/activate
    ```

3.  **Install Dependencies**
    ```bash
    pip install -r requirements.txt
    ```
    
    *Note: If you encounter issues installing `omnilingual-asr`, you may need to install it directly from the source:*
    ```bash
    # pip install git+https://github.com/facebookresearch/omnilingual-asr.git
    ```

### Running the Application

1.  **Start the Server**
    ```bash
    python app.py
    ```

2.  **Access the GUI**
    Open your web browser and navigate to:
    ```
    http://localhost:8080
    ```

---

## Docker Setup Guide

If you prefer using Docker to keep your environment clean, follow these instructions.

### Prerequisites (Docker)

1. **Windows 10/11** with WSL2 enabled
2. **Docker Desktop for Windows** (ensure WSL2 backend is selected in settings)
3. **Git for Windows** (or Git installed in your WSL2 distro)
4. At least **8GB of free disk space** (model downloads can be large)

### 1. Clone the Repository

Open your terminal (PowerShell or WSL2 Ubuntu) and clone the project:

```bash
git clone <your-repository-url>
cd UniversalTranslator-Omnilingual-ASR
```

### 2. Build the Docker Image

```bash
docker build -t omnilingual-asr .
```

**Note:** This step will take 10-20 minutes as it:
- Downloads the base Ubuntu image
- Installs system dependencies
- Downloads Python packages
- Downloads the omnilingual-asr model (~1-2GB)

### 3. Run the Docker Container

```bash
with CPU support:

docker run -d -p 8080:8080 --name asr-app omnilingual-asr

or with GPU support:

docker run -d -p 8080:8080 --gpus all --name asr-app omnilingual-asr
```

**Options explained:**
- `-d`: Run in detached mode (background)
- `-p 8080:8080`: Map port 8080 from container to host
- `--name asr-app`: Give the container a friendly name
- `omnilingual-asr`: The image name we built

### 4. Access the Application

Open your web browser and navigate to:

```
http://localhost:8080
```

## Docker Management Commands

### Check if container is running:
```bash
docker ps
```

### View container logs:
```bash
docker logs asr-app
```

### View live logs (follow mode):
```bash
docker logs -f asr-app
```

### Stop the container:
```bash
docker stop asr-app
```

### Start the container again:
```bash
docker start asr-app
```

### Remove the container:
```bash
docker rm asr-app
```

### Remove the image:
```bash
docker rmi omnilingual-asr
```

### Restart the container:
```bash
docker restart asr-app
```

## Troubleshooting

### Container won't start:
```bash
# Check logs for errors
docker logs asr-app

# Try running in interactive mode to see errors
docker run -it -p 8080:8080 omnilingual-asr
```

### Port already in use:
```bash
# Use a different port
docker run -d -p 8080:8080 --name asr-app omnilingual-asr
# Then access at http://localhost:8080
```

### Model loading issues:
The first time you transcribe audio, the model will download. This can take several minutes and may appear to hang. Be patient!

### Out of memory:
If you encounter memory issues, increase Docker's memory allocation in Docker Desktop settings (recommended: 8GB+)

## GPU Support (Optional)

To enable GPU acceleration for faster transcription:

### 1. Install NVIDIA Container Toolkit in WSL2:
Follow the official guide to install the NVIDIA Container Toolkit:
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html

Quick summary for Ubuntu (WSL2):
```bash
# Configure the production repository
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \

### 2. Run container with GPU:
```bash
docker run -d -p 8080:8080 --gpus all --name asr-app omnilingual-asr
```

## Usage Tips

1. **Live Recording**: Click "Start Recording", speak clearly, then click "Stop Recording". The transcription will appear after processing.

2. **File Upload**: Click "Choose Audio File", select an audio file (WAV, MP3, etc.), and the transcription will appear automatically.

3. **Supported Languages**: The omnilingual-asr model supports 1,600+ languages automatically!

4. **Audio Format**: Most common audio formats are supported (WAV, MP3, M4A, FLAC, OGG, etc.)

## Performance Notes

- **First transcription** will be slower as the model loads into memory
- **Subsequent transcriptions** will be faster
- **CPU mode** is slower but works on any system
- **GPU mode** (if available) provides 5-10x faster transcription

## Updating the Application

If you make changes to the code:

```bash
# Pull latest changes
git pull

# Stop and remove old container
docker stop asr-app
docker rm asr-app

# Rebuild image
docker build -t omnilingual-asr .

# Run new container
docker run -d -p 8080:8080 --name asr-app omnilingual-asr
```