OpenAI Whisper running on AMD GPUs with ROCm support. Automatically detects AMD GPU architecture at runtime (RDNA 2/3/4 supported).
- ROCm 7.1.1 GPU acceleration
- Web UI (Gradio) for easy transcription
- CLI interface for batch processing
- Persistent model caching
- Multiple output formats (txt, srt, vtt)
- All Whisper models supported (tiny to large)
- VRAM safeguard - automatically filters models based on available GPU memory
- API endpoint for querying available models programmatically
- Docker installed
- AMD GPU with ROCm support
- Base image
docker-rocm:latestbuilt
make buildmake runAccess the web interface at: http://localhost:7860
The UI allows you to:
- Upload audio files
- Select model (tiny, base, small, medium, large, turbo)
- Choose task (transcribe or translate)
- Select language or auto-detect
- Pick output format (txt, srt, vtt)
# Place your audio file in ./data/input/
cp your-audio.mp3 ./data/input/
# Transcribe using the base model
make transcribe FILE=/data/input/your-audio.mp3 MODEL=base
# Output will be saved to ./data/output/make bash
# Inside container:
whisper /data/input/audio.mp3 --model base --output_dir /data/output
whisper /data/input/audio.mp3 --model medium --language en --task translate| Model | Size | VRAM | Speed | Accuracy |
|---|---|---|---|---|
| tiny | 39M | ~1GB | Fastest | Lowest |
| base | 74M | ~1GB | Very Fast | Good |
| small | 244M | ~2GB | Fast | Better |
| medium | 769M | ~5GB | Medium | Great |
| large | 1550M | ~10GB | Slow | Best |
| turbo | 809M | ~6GB | Fast | Very Good |
The web UI automatically filters models based on your GPU's available VRAM (with 1GB safety margin). Models that would exceed your VRAM are hidden from the dropdown.
docker-whisper-rocm/
├── Dockerfile # Container definition
├── whisper_ui.py # Gradio web interface
├── entrypoint.sh # Startup script
├── Makefile # Build and run commands
├── models/ # Model cache (auto-downloaded)
├── data/
│ ├── input/ # Place audio files here
│ └── output/ # Transcriptions saved here
└── README.md # This file
whisper audio.mp3 --model base --output_dir /data/outputwhisper audio.mp3 --model medium --language es --output_dir /data/outputwhisper audio.mp3 --model medium --task translate --output_dir /data/output# Text only
whisper audio.mp3 --model base --output_format txt
# Subtitles (SRT)
whisper audio.mp3 --model base --output_format srt
# WebVTT
whisper audio.mp3 --model base --output_format vtt
# All formats
whisper audio.mp3 --model base --output_format allModels are automatically cached in ./models/ directory and persist between container restarts.
# Process all files in input directory
for file in ./data/input/*.mp3; do
docker run --rm \
--device=/dev/kfd --device=/dev/dri \
-v $(PWD)/models:/models \
-v $(PWD)/data:/data \
docker-whisper-rocm \
whisper "$file" --model base --output_dir /data/output
doneCheck if GPU is detected:
docker run --rm --device=/dev/kfd --device=/dev/dri docker-whisper-rocm \
python -c "import torch; print('GPU:', torch.cuda.get_device_name(0))"Expected output:
GPU: AMD Radeon Graphics
- Ensure
/dev/kfdand/dev/driare accessible - Check ROCm is properly installed on host
- Verify base image
docker-rocm:latestis built
- First run downloads models (can be large)
- Subsequent runs use cached models from
./models/
- Use a smaller model (tiny, base, small)
- Check GPU VRAM usage:
rocm-smi
make build- Build the Docker imagemake rebuild- Rebuild without cachemake run- Start web UI on port 7860make run-detached- Start web UI in backgroundmake stop- Stop and remove containermake logs- View container logsmake bash- Interactive shell for debuggingmake transcribe FILE=<path> MODEL=<name>- Transcribe single file
- Base: Ubuntu 24.04
- ROCm: 7.1.1
- Python: 3.11
- PyTorch: 2.9.1+rocm7.1.1
- GPU: Auto-detected at runtime (RDNA 2/3/4 supported)
The service exposes REST API endpoints for programmatic access:
Returns models filtered by available VRAM.
curl http://localhost:7860/api/modelsResponse:
{
"available_models": ["tiny", "base", "small", "medium"],
"all_models": ["tiny", "base", "small", "medium", "large", "turbo"],
"model_vram": {"tiny": 1, "base": 1, "small": 2, "medium": 5, "large": 10, "turbo": 6},
"gpu_vram": 8.0,
"device": "cuda"
}# Upload and transcribe (see Gradio API docs for file upload format)
curl -X POST http://localhost:7860/gradio_api/call/transcribe_audio \
-H "Content-Type: application/json" \
-d '{"data": ["<audio_file>", "base", "transcribe", "auto", "txt"]}'Whisper is released by OpenAI under MIT License.