CLI Tool for automatic transcription of audio and video files with the ability to create text summaries. Uses OpenAI API for media file processing.
π·πΊ Π ΡΡΡΠΊΠ°Ρ Π²Π΅ΡΡΠΈΡ
- Python 3.12 or higher
uvdependency manager- Installed system utilities:
ffmpegandffprobe - Active OpenAI API key
- (Optional) AWS S3 credentials for downloading files from S3 buckets
macOS (via Homebrew):
brew install ffmpegUbuntu/Debian:
sudo apt update
sudo apt install ffmpegWindows: Download ffmpeg from the official website https://ffmpeg.org/download.html
# Installing project dependencies
uv sync# Copying configuration template
cp .env.example .env
# Editing configuration
nano .envFill in the following parameters in the .env file:
# OpenAI API Configuration (required)
OPENAI_API_KEY=your-openai-api-key-here
OPENAI_BASE_URL=https://api.openai.com/v1
OPENAI_WHISPER_MODEL=whisper-1
OPENAI_SUMMARY_MODEL=gpt-4o
# AWS S3 Configuration (optional, for S3 file downloads)
AWS_ACCESS_KEY_ID=your-aws-access-key-id
AWS_SECRET_ACCESS_KEY=your-aws-secret-access-key
AWS_DEFAULT_REGION=us-east-1
# AWS_ENDPOINT_URL=http://localhost:9000 # For MinIO or other S3-compatible servicesvoice-summarizer/
βββ main.py # Main transcription script
βββ summarization_prompt.md # Prompt template for summarization
βββ .env.example # Configuration example
βββ Dockerfile # Docker image
βββ docker-compose.yml # Docker Compose configuration
βββ input/ # Directory for input files
βββ output/ # Directory for results
# Transcribing a local file
python main.py input/recording.mp4
# Transcribing a file from S3 bucket
python main.py s3://your-bucket/path/to/recording.mp4
# Specifying custom output directory
python main.py input/recording.mp4 -o custom_output# Transcription with automatic summary creation (local file) - default behavior
python main.py input/recording.mp4
# Transcription with automatic summary creation (S3 file) - default behavior
python main.py s3://your-bucket/path/to/recording.mp4
# Using custom prompt for summarization
python main.py input/recording.mp4 --prompt-file custom_prompt.md
# Disable summarization (transcription only)
python main.py input/recording.mp4 --no-summarize# Specifying specific Whisper model
python main.py input/recording.mp4 --whisper-model whisper-1
# Specifying summarization model
python main.py input/recording.mp4 --summary-model gpt-4
# Full configuration with custom parameters
python main.py input/recording.mp4 \
--whisper-model whisper-1 \
--summary-model gpt-4 \
--prompt-file my_prompt.md \
--api-key YOUR_API_KEY \
--base-url https://custom-endpoint.com/v1
# Transcription only (no summary)
python main.py input/recording.mp4 --no-summarizeReady-made configurations are provided for users who prefer Docker.
# 1. Environment preparation
cp .env.example .env
# Edit the .env file with your OpenAI API key
# 2. Place audio/video files in the input/ directory
cp your-recording.mp4 input/
# 3. Build and run with Docker Compose
docker-compose build
docker-compose run --rm voice-summarizer input/your-recording.mp4# Building the image
docker build -t voice-summarizer .
# Running the container
docker run -v $(pwd)/input:/app/input:ro \
-v $(pwd)/output:/app/output \
-v $(pwd)/.env:/app/.env:ro \
voice-summarizer input/your-file.mp4# Show help
docker-compose run --rm voice-summarizer --help
# Basic transcription with summarization (default)
docker-compose run --rm voice-summarizer input/recording.mp4
# Transcription only (no summarization)
docker-compose run --rm voice-summarizer input/recording.mp4 --no-summarize
# Using custom model
docker-compose run --rm voice-summarizer input/recording.mp4 \
--whisper-model whisper-1 \
--summary-model gpt-4
# Development mode (interactive shell)
docker-compose run --rm voice-summarizer-dev| Parameter | Description |
|---|---|
input_file |
Path to input audio/video file or S3 URL (s3://bucket/key) |
-o, --output |
Directory for saving results (default: output) |
--api-key |
OpenAI API key (alternative to OPENAI_API_KEY variable) |
--base-url |
Base URL for OpenAI API (alternative to OPENAI_BASE_URL variable) |
--whisper-model |
Whisper model for transcription (default: whisper-1) |
--summary-model |
Model for summary creation (default: gpt-4o) |
--no-summarize |
Disable text summary creation (default: enabled) |
--prompt-file |
Path to summarization prompt file (default: summarization_prompt.md) |
After processing, the following directory structure is created:
output/
βββ filename.ext/
βββ filename_combined.md # Combined transcription
βββ filename_summary.md # Summary (if enabled)
βββ segments/
βββ filename_segment_001.mp3 # Audio segment 1
βββ filename_segment_001.md # Segment 1 transcription
βββ filename_segment_002.mp3 # Audio segment 2
βββ filename_segment_002.md # Segment 2 transcription
The tool supports all formats compatible with ffmpeg:
Audio: MP3, WAV, FLAC, AAC, OGG, M4A Video: MP4, AVI, MOV, MKV, WMV, FLV
- Maximum segment duration: 570 seconds (9.5 minutes)
- Automatic splitting of long files into segments
- Conversion of all segments to MP3 format
- S3 support for downloading files from AWS S3 or compatible services (MinIO, etc.)
- Intelligent caching: reuse existing local files and segments to avoid reprocessing
- Process logging with timestamps
- Comprehensive error handling for API, file operations, and S3 access
The tool supports downloading files from AWS S3 or S3-compatible services:
- Configure AWS credentials in
.envfile:
AWS_ACCESS_KEY_ID=your-access-key-id
AWS_SECRET_ACCESS_KEY=your-secret-access-key
AWS_DEFAULT_REGION=us-east-1- Use S3 URLs in your commands:
python main.py s3://your-bucket/path/to/file.mp4For MinIO or other S3-compatible services, add the endpoint URL:
AWS_ENDPOINT_URL=http://localhost:9000- Files are downloaded to
input/directory and cached locally - Existing local files are reused to avoid re-downloading
- Supports both AWS S3 and S3-compatible services
- Works without S3 configuration when using local files only
To create your own summarization prompt:
- Copy the
summarization_prompt.mdfile - Modify the content according to requirements
- The prompt is used as a system message, transcription text is passed separately
- Specify the file path via the
--prompt-fileparameter orPROMPT_FILEenvironment variable
This project is licensed under the MIT License - see the LICENSE file for details.