Skip to content

ETNyx/docker-whisper-rocm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Docker Whisper ROCm

OpenAI Whisper running on AMD GPUs with ROCm support. Automatically detects AMD GPU architecture at runtime (RDNA 2/3/4 supported).

Features

  • ROCm 7.1.1 GPU acceleration
  • Web UI (Gradio) for easy transcription
  • CLI interface for batch processing
  • Persistent model caching
  • Multiple output formats (txt, srt, vtt)
  • All Whisper models supported (tiny to large)
  • VRAM safeguard - automatically filters models based on available GPU memory
  • API endpoint for querying available models programmatically

Prerequisites

  • Docker installed
  • AMD GPU with ROCm support
  • Base image docker-rocm:latest built

Quick Start

1. Build the Image

make build

2. Run Web UI

make run

Access the web interface at: http://localhost:7860

The UI allows you to:

  • Upload audio files
  • Select model (tiny, base, small, medium, large, turbo)
  • Choose task (transcribe or translate)
  • Select language or auto-detect
  • Pick output format (txt, srt, vtt)

3. CLI Usage

Transcribe a single file

# Place your audio file in ./data/input/
cp your-audio.mp3 ./data/input/

# Transcribe using the base model
make transcribe FILE=/data/input/your-audio.mp3 MODEL=base

# Output will be saved to ./data/output/

Interactive bash session

make bash

# Inside container:
whisper /data/input/audio.mp3 --model base --output_dir /data/output
whisper /data/input/audio.mp3 --model medium --language en --task translate

Models

Model Size VRAM Speed Accuracy
tiny 39M ~1GB Fastest Lowest
base 74M ~1GB Very Fast Good
small 244M ~2GB Fast Better
medium 769M ~5GB Medium Great
large 1550M ~10GB Slow Best
turbo 809M ~6GB Fast Very Good

The web UI automatically filters models based on your GPU's available VRAM (with 1GB safety margin). Models that would exceed your VRAM are hidden from the dropdown.

Directory Structure

docker-whisper-rocm/
├── Dockerfile              # Container definition
├── whisper_ui.py          # Gradio web interface
├── entrypoint.sh          # Startup script
├── Makefile               # Build and run commands
├── models/                # Model cache (auto-downloaded)
├── data/
│   ├── input/            # Place audio files here
│   └── output/           # Transcriptions saved here
└── README.md             # This file

Usage Examples

Transcribe with auto-detection

whisper audio.mp3 --model base --output_dir /data/output

Transcribe in Spanish

whisper audio.mp3 --model medium --language es --output_dir /data/output

Translate to English

whisper audio.mp3 --model medium --task translate --output_dir /data/output

Output formats

# Text only
whisper audio.mp3 --model base --output_format txt

# Subtitles (SRT)
whisper audio.mp3 --model base --output_format srt

# WebVTT
whisper audio.mp3 --model base --output_format vtt

# All formats
whisper audio.mp3 --model base --output_format all

Advanced Options

Custom model cache location

Models are automatically cached in ./models/ directory and persist between container restarts.

Batch processing

# Process all files in input directory
for file in ./data/input/*.mp3; do
    docker run --rm \
        --device=/dev/kfd --device=/dev/dri \
        -v $(PWD)/models:/models \
        -v $(PWD)/data:/data \
        docker-whisper-rocm \
        whisper "$file" --model base --output_dir /data/output
done

GPU Verification

Check if GPU is detected:

docker run --rm --device=/dev/kfd --device=/dev/dri docker-whisper-rocm \
    python -c "import torch; print('GPU:', torch.cuda.get_device_name(0))"

Expected output:

GPU: AMD Radeon Graphics

Troubleshooting

GPU not detected

  • Ensure /dev/kfd and /dev/dri are accessible
  • Check ROCm is properly installed on host
  • Verify base image docker-rocm:latest is built

Models downloading slowly

  • First run downloads models (can be large)
  • Subsequent runs use cached models from ./models/

Out of memory

  • Use a smaller model (tiny, base, small)
  • Check GPU VRAM usage: rocm-smi

Makefile Commands

  • make build - Build the Docker image
  • make rebuild - Rebuild without cache
  • make run - Start web UI on port 7860
  • make run-detached - Start web UI in background
  • make stop - Stop and remove container
  • make logs - View container logs
  • make bash - Interactive shell for debugging
  • make transcribe FILE=<path> MODEL=<name> - Transcribe single file

Technical Details

  • Base: Ubuntu 24.04
  • ROCm: 7.1.1
  • Python: 3.11
  • PyTorch: 2.9.1+rocm7.1.1
  • GPU: Auto-detected at runtime (RDNA 2/3/4 supported)

API Endpoints

The service exposes REST API endpoints for programmatic access:

Get Available Models

Returns models filtered by available VRAM.

curl http://localhost:7860/api/models

Response:

{
  "available_models": ["tiny", "base", "small", "medium"],
  "all_models": ["tiny", "base", "small", "medium", "large", "turbo"],
  "model_vram": {"tiny": 1, "base": 1, "small": 2, "medium": 5, "large": 10, "turbo": 6},
  "gpu_vram": 8.0,
  "device": "cuda"
}

Transcribe Audio (Gradio API)

# Upload and transcribe (see Gradio API docs for file upload format)
curl -X POST http://localhost:7860/gradio_api/call/transcribe_audio \
  -H "Content-Type: application/json" \
  -d '{"data": ["<audio_file>", "base", "transcribe", "auto", "txt"]}'

License

Whisper is released by OpenAI under MIT License.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published