The whisperX API is a tool for enhancing and analyzing audio content. This API provides a suite of services for processing audio and video files, including transcription, alignment, diarization, and combining transcript with diarization results.
Swagger UI is available at /docs for all the services, dump of OpenAPI definition is available in folder app/docs as well. You can explore it directly in Swagger Editor
See the WhisperX Documentation for details on whisperX functions.
- in
.envyou can define default LanguageDEFAULT_LANG, if not defined en is used (you can also set it in the request) .envcontains definition of Whisper model usingWHISPER_MODEL(you can also set it in the request).envcontains definition of logging level usingLOG_LEVEL, if not defined DEBUG is used in development and INFO in production.envcontains definition of environment usingENVIRONMENT, if not defined production is used.envcontains a booleanDEVto indicate if the environment is development, if not defined true is used.envcontains a booleanFILTER_WARNINGto enable or disable filtering of specific warnings, if not defined true is used
.oga,.m4a,.aac,.wav,.amr,.wma,.awb,.mp3,.ogg
.wmv,.mkv,.avi,.mov,.mp4
-
Speech-to-Text (
/speech-to-text)- Upload audio/video files for transcription
- Supports multiple languages and Whisper models
-
Speech-to-Text URL (
/speech-to-text-url)- Transcribe audio/video from URLs
- Same features as direct upload
-
Individual Services:
- Transcribe (
/service/transcribe): Convert speech to text - Align (
/service/align): Align transcript with audio - Diarize (
/service/diarize): Speaker diarization - Combine (
/service/combine): Merge transcript with diarization
- Transcribe (
-
Task Management:
- Get all tasks (
/task/all) - Get task status (
/task/{identifier})
- Get all tasks (
-
Health Check Endpoints:
- Basic health check (
/health): Simple service status check - Liveness probe (
/health/live): Verifies if application is running - Readiness probe (
/health/ready): Checks if application is ready to accept requests (includes database connectivity check)
- Basic health check (
Task status and results are stored in a database via async SQLAlchemy. The DB connection
is configured with DB_URL (default: sqlite:///records.db).
See SQLAlchemy Engine configuration for supported database URLs.
Async drivers are required — the application rewrites the URL scheme automatically:
DB_URL scheme |
Async driver used |
|---|---|
sqlite:// |
aiosqlite (included by default) |
postgresql:// |
asyncpg (install with --extra postgres) |
For PostgreSQL, install the driver extra: uv sync --no-dev --extra postgres. The
Docker image includes it automatically.
Performance note: SQLite is suitable for development and low-concurrency use. For production or sustained concurrent load, use PostgreSQL — it sustains 350+ req/s at 200 concurrent users vs. ~15 req/s with SQLite. See Async SQLAlchemy concurrency guide for full load test results.
Structure of the of the db is described in DB Schema
Configure compute options in .env:
DEVICE: Device for inference (cudaorcpu, default:cuda)COMPUTE_TYPE: Computation type (float16,float32,int8, default:float16)Note: When using CPU,
COMPUTE_TYPEmust be set toint8
WhisperX supports these model sizes:
tiny,tiny.enbase,base.ensmall,small.enmedium,medium.enlarge,large-v1,large-v2,large-v3,large-v3-turbo- Distilled models:
distil-large-v2,distil-medium.en,distil-small.en,distil-large-v3 - Custom models:
nyrahealth/faster_CrisperWhisper
Set default model in .env using WHISPER_MODEL= (default: tiny)
- NVIDIA GPU with CUDA 12.8+ support
- At least 8GB RAM (16GB+ recommended for large models)
- Storage space for models (varies by model size):
- tiny/base: ~1GB
- small: ~2GB
- medium: ~5GB
- large: ~10GB
To get started with the API, follow these steps:
-
Install
uvpackage manager -
Create virtual environment and install dependencies:
# For production dependencies only uv sync --no-dev # For development (includes testing, linting, async SQLite driver) uv sync --all-extras
-
Configure your environment (see
.envfile setup below)
Note: This project uses
uvfor dependency management with platform-specific PyTorch configuration (CUDA 12.8 on Linux, CPU-only on macOS/Windows). All dependencies are defined inpyproject.toml.
The application uses two logging configuration files:
uvicorn_log_conf.yaml: Used by Uvicorn for logging configuration.gunicorn_logging.conf: Used by Gunicorn for logging configuration (located in the root of theappdirectory).
Ensure these files are correctly configured and placed in the app directory.
- Create
.envfile
define your Whisper Model and token for Huggingface
HF_TOKEN=<<YOUR HUGGINGFACE TOKEN>>
WHISPER_MODEL=<<WHISPER MODEL SIZE>>
LOG_LEVEL=<<LOG LEVEL>>- Run the FastAPI application:
uvicorn app.main:app --reload --log-config uvicorn_log_conf.yaml --log-level $LOG_LEVELThe API will be accessible at http://127.0.0.1:8000.
- Create
.envfile
define your Whisper Model and token for Huggingface
HF_TOKEN=<<YOUR HUGGINGFACE TOKEN>>
WHISPER_MODEL=<<WHISPER MODEL SIZE>>
LOG_LEVEL=<<LOG LEVEL>>- Build Image
using docker-compose.yaml
#build and start the image using compose file
docker-compose upalternative approach
#build image
docker build -t whisperx-service .
# Run Container
docker run -d --gpus all -p 8000:8000 --env-file .env whisperx-serviceThe API will be accessible at http://127.0.0.1:8000.
Note: The Docker build uses
uvfor installing dependencies, as specified in the Dockerfile. The main entrypoint for the Docker container is via Gunicorn (not Uvicorn directly), using the configuration inapp/gunicorn_logging.conf.Important: For GPU support in Docker, you must have CUDA drivers 12.8+ installed on your host system.
The models used by whisperX are stored in root/.cache, if you want to avoid downloanding the models each time the container is starting you can store the cache in persistent storage. docker-compose.yaml defines a volume whisperx-models-cache to store this cache.
- faster-whisper cache:
root/.cache/huggingface/hub - pyannotate and other models cache:
root/.cache/torch
-
Environment Variables Not Loaded
- Ensure your
.envfile is correctly formatted and placed in the root directory. - Verify that all required environment variables are defined.
- Ensure your
-
Database Connection Issues
- Check the
DB_URLenvironment variable for correctness. - Ensure the database server is running and accessible.
- PostgreSQL driver: when using
DB_URL=postgresql://...outside Docker, install the driver withuv sync --extra postgres. - Async driver mismatch: if you set a
DB_URLwith a sync scheme (e.g.postgresql+psycopg2://), the app will fail to start. Use the plain scheme (postgresql://) and let the app rewrite it topostgresql+asyncpg://automatically.
- Check the
-
Model Download Failures
- Verify your internet connection.
- Ensure the
HF_TOKENis correctly set in the.envfile.
-
GPU Not Detected
- Ensure NVIDIA drivers and CUDA are correctly installed.
- Verify that Docker is configured to use the GPU (
nvidia-docker).
-
Warnings Not Filtered
- Ensure the
FILTER_WARNINGenvironment variable is set totruein the.envfile.
- Ensure the
- Check the logs for detailed error messages.
- Use the
LOG_LEVELenvironment variable to set the appropriate logging level (DEBUG,INFO,WARNING,ERROR).
The API provides built-in health check endpoints that can be used for monitoring and orchestration:
-
Basic Health Check (
/health)- Returns a simple status check with HTTP 200 if the service is running
- Useful for basic availability monitoring
-
Liveness Probe (
/health/live)- Includes a timestamp with status information
- Designed for Kubernetes liveness probes or similar orchestration systems
- Returns HTTP 200 if the application is running
-
Readiness Probe (
/health/ready)- Tests if the application is fully ready to accept requests
- Checks connectivity to the database
- Returns HTTP 200 if all dependencies are available
- Returns HTTP 503 if there's an issue with dependencies (e.g., database connection)
For further assistance, please open an issue on the GitHub repository.