mattjamo/UniversalTranslator-Omnilingual-ASR
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|
Repository files navigation
# Universal Translator with Omnilingual ASR
## Description
Universal Translator is a powerful application that leverages the state-of-the-art **Omnilingual-ASR** model (Meta's Massively Multilingual Speech) to provide high-accuracy speech recognition and translation.
It supports **1,600+ languages** for transcription and utilizes a modern interface built with **NiceGUI**.
It utilizes any **Ollama** based LLM model for translation.
**Key Features:**
* **Massive Language Support:** Transcribe audio from over 1,600 languages.
* **Web Interface:** User-friendly GUI for recording, uploading files, and managing models.
* **Local Processing:** Runs entirely locally on your machine (GPU recommended).
* **Translation:** Integrated AI translation capabilities using Ollama.
* **Real-time Recording:** Record efficiently directly from your browser.
* **Model Management:** Download and manage different sizes of the Omnilingual model (300M, 1B, 3B, 7B) directly from the UI.
---
## Running Locally
To run the application directly, follow these steps.
### Prerequisites
* **Python 3.10** or higher
* **FFmpeg**: Required for audio processing.
* *Windows*: Download from [ffmpeg.org](https://ffmpeg.org/download.html) and add to your system PATH.
* *Linux*: `sudo apt install ffmpeg`
* *macOS*: `brew install ffmpeg`
* **CUDA (Optional but Recommended)**: For GPU acceleration. Without a GPU, inference will be slow.
git clone <your-repository-url>
cd UniversalTranslator-Omnilingual-ASR
```
2. **Create a Virtual Environment (Recommended)**
python -m venv venv
.\venv\Scripts\activate
# Linux/macOS
python3 -m venv venv
source venv/bin/activate
```
3. **Install Dependencies**
```bash
pip install -r requirements.txt
```
*Note: If you encounter issues installing `omnilingual-asr`, you may need to install it directly from the source:*
```bash
# pip install git+https://github.com/facebookresearch/omnilingual-asr.git
```
### Running the Application
1. **Start the Server**
```bash
python app.py
```
2. **Access the GUI**
Open your web browser and navigate to:
```
http://localhost:8080
```
---
## Docker Setup Guide
If you prefer using Docker to keep your environment clean, follow these instructions.
### Prerequisites (Docker)
1. **Windows 10/11** with WSL2 enabled
2. **Docker Desktop for Windows** (ensure WSL2 backend is selected in settings)
3. **Git for Windows** (or Git installed in your WSL2 distro)
4. At least **8GB of free disk space** (model downloads can be large)
### 1. Clone the Repository
Open your terminal (PowerShell or WSL2 Ubuntu) and clone the project:
```bash
git clone <your-repository-url>
cd UniversalTranslator-Omnilingual-ASR
```
### 2. Build the Docker Image
```bash
docker build -t omnilingual-asr .
```
**Note:** This step will take 10-20 minutes as it:
- Downloads the base Ubuntu image
- Installs system dependencies
- Downloads Python packages
- Downloads the omnilingual-asr model (~1-2GB)
### 3. Run the Docker Container
```bash
with CPU support:
docker run -d -p 8080:8080 --name asr-app omnilingual-asr
or with GPU support:
docker run -d -p 8080:8080 --gpus all --name asr-app omnilingual-asr
```
**Options explained:**
- `-d`: Run in detached mode (background)
- `-p 8080:8080`: Map port 8080 from container to host
- `--name asr-app`: Give the container a friendly name
- `omnilingual-asr`: The image name we built
### 4. Access the Application
Open your web browser and navigate to:
```
http://localhost:8080
```
## Docker Management Commands
### Check if container is running:
```bash
docker ps
```
### View container logs:
```bash
docker logs asr-app
```
### View live logs (follow mode):
```bash
docker logs -f asr-app
```
### Stop the container:
```bash
docker stop asr-app
```
### Start the container again:
```bash
docker start asr-app
```
### Remove the container:
```bash
docker rm asr-app
```
### Remove the image:
```bash
docker rmi omnilingual-asr
```
### Restart the container:
```bash
docker restart asr-app
```
## Troubleshooting
### Container won't start:
```bash
# Check logs for errors
docker logs asr-app
# Try running in interactive mode to see errors
docker run -it -p 8080:8080 omnilingual-asr
```
### Port already in use:
```bash
# Use a different port
docker run -d -p 8080:8080 --name asr-app omnilingual-asr
# Then access at http://localhost:8080
```
### Model loading issues:
The first time you transcribe audio, the model will download. This can take several minutes and may appear to hang. Be patient!
### Out of memory:
If you encounter memory issues, increase Docker's memory allocation in Docker Desktop settings (recommended: 8GB+)
## GPU Support (Optional)
To enable GPU acceleration for faster transcription:
### 1. Install NVIDIA Container Toolkit in WSL2:
Follow the official guide to install the NVIDIA Container Toolkit:
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
Quick summary for Ubuntu (WSL2):
```bash
# Configure the production repository
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
### 2. Run container with GPU:
```bash
docker run -d -p 8080:8080 --gpus all --name asr-app omnilingual-asr
```
## Usage Tips
1. **Live Recording**: Click "Start Recording", speak clearly, then click "Stop Recording". The transcription will appear after processing.
2. **File Upload**: Click "Choose Audio File", select an audio file (WAV, MP3, etc.), and the transcription will appear automatically.
3. **Supported Languages**: The omnilingual-asr model supports 1,600+ languages automatically!
4. **Audio Format**: Most common audio formats are supported (WAV, MP3, M4A, FLAC, OGG, etc.)
## Performance Notes
- **First transcription** will be slower as the model loads into memory
- **Subsequent transcriptions** will be faster
- **CPU mode** is slower but works on any system
- **GPU mode** (if available) provides 5-10x faster transcription
## Updating the Application
If you make changes to the code:
```bash
# Pull latest changes
git pull
# Stop and remove old container
docker stop asr-app
docker rm asr-app
# Rebuild image
docker build -t omnilingual-asr .
# Run new container
docker run -d -p 8080:8080 --name asr-app omnilingual-asr
```