This is currently not working as intended and is not developed further. Timings are not quite correct and the model get's stuck sometimes.
Audio transcription using Mistral AI's Voxtral API. Supports long audio files and provides a REST API for job-based processing.
- Transcribes audio/video files using Mistral AI
- Handles long files by splitting into segments
- REST API with job queue
- Outputs SRT subtitles or JSON format
- Docker support
git clone <repository>
uv sync
docker build -t transcription-api .
docker run -d -p 8888:8888 -e MISTRAL_API_KEY=your_key transcription-api
Start the server:
MISTRAL_API_KEY=your_key uv run python web_server.py
POST /transcription/job
Content-Type: application/json
{
"path": "/path/to/audio.mp3",
"language": "en",
"format": "json",
"priority": 1000
}
Response:
{
"job_id": "uuid-string",
"status": "QUEUED"
}
GET /transcription/job/{job_id}
Response (completed):
{
"job_id": "uuid-string",
"status": "COMPLETED",
"result": {
"text": "Full transcription...",
"segments": [
{
"text": "Segment text",
"start": 0.0,
"end": 5.2,
"words": [...]
}
]
}
}
GET /transcription/jobs
GET /stats
- Supported formats:
json
,srt
- Supported models:
voxtral-small-2507
,voxtral-small-latest
- Supported languages: All 67 Mistral AI language codes
- Priority: 1-499 (low), 500-9999 (high)
- Python 3.9+
- Mistral AI API key
- FFmpeg (for audio processing)