Docker Whisper ROCm

OpenAI Whisper running on AMD GPUs with ROCm support. Automatically detects AMD GPU architecture at runtime (RDNA 2/3/4 supported).

Features

ROCm 7.1.1 GPU acceleration
Web UI (Gradio) for easy transcription
CLI interface for batch processing
Persistent model caching
Multiple output formats (txt, srt, vtt)
All Whisper models supported (tiny to large)
VRAM safeguard - automatically filters models based on available GPU memory
API endpoint for querying available models programmatically

Prerequisites

Docker installed
AMD GPU with ROCm support
Base image docker-rocm:latest built

Quick Start

1. Build the Image

make build

2. Run Web UI

make run

Access the web interface at: http://localhost:7860

The UI allows you to:

Upload audio files
Select model (tiny, base, small, medium, large, turbo)
Choose task (transcribe or translate)
Select language or auto-detect
Pick output format (txt, srt, vtt)

3. CLI Usage

Transcribe a single file

# Place your audio file in ./data/input/
cp your-audio.mp3 ./data/input/

# Transcribe using the base model
make transcribe FILE=/data/input/your-audio.mp3 MODEL=base

# Output will be saved to ./data/output/

Interactive bash session

make bash

# Inside container:
whisper /data/input/audio.mp3 --model base --output_dir /data/output
whisper /data/input/audio.mp3 --model medium --language en --task translate

Models

Model	Size	VRAM	Speed	Accuracy
tiny	39M	~1GB	Fastest	Lowest
base	74M	~1GB	Very Fast	Good
small	244M	~2GB	Fast	Better
medium	769M	~5GB	Medium	Great
large	1550M	~10GB	Slow	Best
turbo	809M	~6GB	Fast	Very Good

The web UI automatically filters models based on your GPU's available VRAM (with 1GB safety margin). Models that would exceed your VRAM are hidden from the dropdown.

Directory Structure

docker-whisper-rocm/
├── Dockerfile              # Container definition
├── whisper_ui.py          # Gradio web interface
├── entrypoint.sh          # Startup script
├── Makefile               # Build and run commands
├── models/                # Model cache (auto-downloaded)
├── data/
│   ├── input/            # Place audio files here
│   └── output/           # Transcriptions saved here
└── README.md             # This file

Usage Examples

Transcribe with auto-detection

whisper audio.mp3 --model base --output_dir /data/output

Transcribe in Spanish

whisper audio.mp3 --model medium --language es --output_dir /data/output

Translate to English

whisper audio.mp3 --model medium --task translate --output_dir /data/output

Output formats

# Text only
whisper audio.mp3 --model base --output_format txt

# Subtitles (SRT)
whisper audio.mp3 --model base --output_format srt

# WebVTT
whisper audio.mp3 --model base --output_format vtt

# All formats
whisper audio.mp3 --model base --output_format all

Advanced Options

Custom model cache location

Models are automatically cached in ./models/ directory and persist between container restarts.

Batch processing

# Process all files in input directory
for file in ./data/input/*.mp3; do
    docker run --rm \
        --device=/dev/kfd --device=/dev/dri \
        -v $(PWD)/models:/models \
        -v $(PWD)/data:/data \
        docker-whisper-rocm \
        whisper "$file" --model base --output_dir /data/output
done

GPU Verification

Check if GPU is detected:

docker run --rm --device=/dev/kfd --device=/dev/dri docker-whisper-rocm \
    python -c "import torch; print('GPU:', torch.cuda.get_device_name(0))"

Expected output:

GPU: AMD Radeon Graphics

Troubleshooting

GPU not detected

Ensure /dev/kfd and /dev/dri are accessible
Check ROCm is properly installed on host
Verify base image docker-rocm:latest is built

Models downloading slowly

First run downloads models (can be large)
Subsequent runs use cached models from ./models/

Out of memory

Use a smaller model (tiny, base, small)
Check GPU VRAM usage: rocm-smi

Makefile Commands

make build - Build the Docker image
make rebuild - Rebuild without cache
make run - Start web UI on port 7860
make run-detached - Start web UI in background
make stop - Stop and remove container
make logs - View container logs
make bash - Interactive shell for debugging
make transcribe FILE=<path> MODEL=<name> - Transcribe single file

Technical Details

Base: Ubuntu 24.04
ROCm: 7.1.1
Python: 3.11
PyTorch: 2.9.1+rocm7.1.1
GPU: Auto-detected at runtime (RDNA 2/3/4 supported)

API Endpoints

The service exposes REST API endpoints for programmatic access:

Get Available Models

Returns models filtered by available VRAM.

curl http://localhost:7860/api/models

Response:

{
  "available_models": ["tiny", "base", "small", "medium"],
  "all_models": ["tiny", "base", "small", "medium", "large", "turbo"],
  "model_vram": {"tiny": 1, "base": 1, "small": 2, "medium": 5, "large": 10, "turbo": 6},
  "gpu_vram": 8.0,
  "device": "cuda"
}

Transcribe Audio (Gradio API)

# Upload and transcribe (see Gradio API docs for file upload format)
curl -X POST http://localhost:7860/gradio_api/call/transcribe_audio \
  -H "Content-Type: application/json" \
  -d '{"data": ["<audio_file>", "base", "transcribe", "auto", "txt"]}'

License

Whisper is released by OpenAI under MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Docker Whisper ROCm

Features

Prerequisites

Quick Start

1. Build the Image

2. Run Web UI

3. CLI Usage

Transcribe a single file

Interactive bash session

Models

Directory Structure

Usage Examples

Transcribe with auto-detection

Transcribe in Spanish

Translate to English

Output formats

Advanced Options

Custom model cache location

Batch processing

GPU Verification

Troubleshooting

GPU not detected

Models downloading slowly

Out of memory

Makefile Commands

Technical Details

API Endpoints

Get Available Models

Transcribe Audio (Gradio API)

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
entrypoint.sh		entrypoint.sh
plan.md		plan.md
whisper_ui.py		whisper_ui.py

ETNyx/docker-whisper-rocm

Folders and files

Latest commit

History

Repository files navigation

Docker Whisper ROCm

Features

Prerequisites

Quick Start

1. Build the Image

2. Run Web UI

3. CLI Usage

Transcribe a single file

Interactive bash session

Models

Directory Structure

Usage Examples

Transcribe with auto-detection

Transcribe in Spanish

Translate to English

Output formats

Advanced Options

Custom model cache location

Batch processing

GPU Verification

Troubleshooting

GPU not detected

Models downloading slowly

Out of memory

Makefile Commands

Technical Details

API Endpoints

Get Available Models

Transcribe Audio (Gradio API)

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages