Skip to content

Commit d505bb5

Browse files
feat: Subgeneratorr v2.0.0 — AI-powered subtitle generation
Deepgram Nova-3 transcription with Docker-based web UI and CLI. 50+ languages, LLM-enhanced keyterm extraction, batch processing.
0 parents  commit d505bb5

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

45 files changed

+13433
-0
lines changed

.dockerignore

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
# Version control
2+
.git
3+
.gitignore
4+
5+
# Python
6+
__pycache__
7+
*.pyc
8+
*.pyo
9+
.venv
10+
venv
11+
12+
# IDE
13+
.vscode
14+
.idea
15+
16+
# Logs and runtime data
17+
deepgram-logs/
18+
*.log
19+
20+
# Documentation (not needed in image)
21+
docs/
22+
_devdocs/
23+
examples/
24+
CONTRIBUTING.md
25+
LICENSE
26+
README.md
27+
Makefile
28+
29+
# Tests
30+
tests/
31+
32+
# Environment files (secrets — never bake into image)
33+
.env
34+
.env.*
35+
!.env.example

.env.example

Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
# ============================================================================
2+
# Deepgram API Configuration (Required for both CLI and Web UI)
3+
# ============================================================================
4+
# Get your API key from: https://console.deepgram.com/
5+
DEEPGRAM_API_KEY=your_deepgram_api_key_here
6+
7+
# ============================================================================
8+
# Media Library Path (Required)
9+
# ============================================================================
10+
# Path to your media library on the host machine.
11+
# This directory is mounted into the container at /media.
12+
#
13+
# Linux: MEDIA_PATH=/home/username/media
14+
# macOS: MEDIA_PATH=/Users/username/Movies
15+
# Windows: MEDIA_PATH=C:/Users/YourName/Videos
16+
MEDIA_PATH=/path/to/your/media
17+
18+
# ============================================================================
19+
# CLI Tool Configuration
20+
# ============================================================================
21+
# Optional: Force regenerate existing SRT files (0=no, 1=yes)
22+
# FORCE_REGENERATE=0
23+
24+
# Optional: Profanity filter mode - "off", "tag", or "remove" (default: off)
25+
# PROFANITY_FILTER=off
26+
27+
# Optional: Save raw Deepgram JSON responses for debugging (0=no, 1=yes)
28+
# When enabled, saves raw API responses to Transcripts/JSON/ folder
29+
# SAVE_RAW_JSON=0
30+
31+
# Nova-3 Quality Enhancements (optional)
32+
# Convert spoken numbers to digits (e.g., "twenty twenty four" → "2024")
33+
# NUMERALS=0
34+
35+
# Include filler words like "uh", "um" in transcription (usually off for subtitles)
36+
# FILLER_WORDS=0
37+
38+
# Convert spoken measurements (e.g., "fifty meters" → "50m")
39+
# MEASUREMENTS=0
40+
41+
# ============================================================================
42+
# Language Configuration (50+ Languages with Regional Variants)
43+
# ============================================================================
44+
# Language code for transcription (default: en)
45+
# See docs/languages.md for 50+ supported languages and regional variants
46+
# Common: en, en-GB, es, es-419, fr, de, ja, ko, pt-BR, multi
47+
# LANGUAGE=en
48+
49+
# Automatic Language Detection (35 languages supported, batch mode only)
50+
# When enabled, automatically detects the dominant language from audio
51+
# Returns detected language code and confidence score (0-1)
52+
# Overrides LANGUAGE setting when enabled
53+
# Note: Not available for streaming transcription, batch processing only
54+
# DETECT_LANGUAGE=0
55+
56+
# ============================================================================
57+
# Web UI Configuration (Optional)
58+
# ============================================================================
59+
60+
# Flask Security
61+
# Generate a secure random key for production: python -c "import secrets; print(secrets.token_hex(32))"
62+
SECRET_KEY=change-me-in-production
63+
64+
# Redis Configuration
65+
REDIS_URL=redis://redis:6379/0
66+
67+
# Paths (should match your docker-compose.yml volume mounts)
68+
MEDIA_ROOT=/media
69+
LOG_ROOT=/logs
70+
71+
# Transcription Defaults
72+
DEFAULT_MODEL=nova-3
73+
DEFAULT_LANGUAGE=en
74+
# DEFAULT_PROFANITY_FILTER=off
75+
76+
# Security: Email Allowlist (optional)
77+
# Comma-separated list of allowed email addresses for OAuth access
78+
# Leave empty to allow all authenticated Google OAuth users
79+
# Example: ALLOWED_EMAILS=user1@example.com,user2@example.com
80+
ALLOWED_EMAILS=
81+
82+
# Bazarr Integration (optional)
83+
# Enable automatic subtitle rescan after batch completion
84+
# Leave BAZARR_BASE_URL empty to disable integration
85+
BAZARR_BASE_URL=
86+
BAZARR_API_KEY=
87+
88+
# Worker Concurrency (optional)
89+
# Number of concurrent transcription jobs per worker
90+
# Start with 1, increase to 2-3 if your system can handle it
91+
WORKER_CONCURRENCY=1
92+
93+
# ============================================================================
94+
# LLM API Keys for AI-Powered Keyterm Generation (Optional Feature)
95+
# ============================================================================
96+
# These keys enable AI-powered generation of keyterms in the Web UI.
97+
# Keyterms boost transcription accuracy for character names and terminology.
98+
# This feature is COMPLETELY OPTIONAL - manual keyterms work just as well.
99+
#
100+
# Get Anthropic API key: https://console.anthropic.com/
101+
# Get OpenAI API key: https://platform.openai.com/
102+
#
103+
# Get Gemini API key: https://aistudio.google.com/apikey (free tier available)
104+
#
105+
# Leave blank to disable AI keyterm generation (manual keyterms still work)
106+
# ANTHROPIC_API_KEY=
107+
# OPENAI_API_KEY=
108+
# GEMINI_API_KEY=
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
---
2+
name: Bug Report
3+
about: Report a bug to help improve Subgeneratorr
4+
title: ''
5+
labels: bug
6+
assignees: ''
7+
---
8+
9+
**Describe the bug**
10+
A clear description of what the bug is.
11+
12+
**To reproduce**
13+
Steps to reproduce the behavior:
14+
1. ...
15+
2. ...
16+
17+
**Expected behavior**
18+
What you expected to happen.
19+
20+
**Environment**
21+
- OS: [e.g., Ubuntu 24.04, macOS 15, Windows 11]
22+
- Docker version: [e.g., 27.0]
23+
- Subgeneratorr version: [e.g., v2.0.0]
24+
- Interface: [CLI / Web UI]
25+
26+
**Logs**
27+
Paste relevant logs from `docker compose logs web worker` or CLI output.
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
---
2+
name: Feature Request
3+
about: Suggest an enhancement or new feature
4+
title: ''
5+
labels: enhancement
6+
assignees: ''
7+
---
8+
9+
**Is your feature request related to a problem?**
10+
A clear description of the problem. E.g., "I'm always frustrated when..."
11+
12+
**Describe the solution you'd like**
13+
What you want to happen.
14+
15+
**Alternatives considered**
16+
Any alternative solutions or workarounds you've considered.
17+
18+
**Additional context**
19+
Any other context, screenshots, or examples.

.github/PULL_REQUEST_TEMPLATE.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
## What does this PR do?
2+
3+
Brief description of the change.
4+
5+
## Checklist
6+
7+
- [ ] Tested with `python3 scripts/validate_setup.py`
8+
- [ ] Docker build succeeds (`docker compose build`)
9+
- [ ] Updated docs if configuration, CLI flags, or API endpoints changed
10+
- [ ] Follows code style (PEP 8 for Python, vanilla JS, CSS custom properties)
Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
name: Build and Publish Docker Images
2+
3+
on:
4+
push:
5+
tags:
6+
- 'v*'
7+
8+
env:
9+
REGISTRY: ghcr.io
10+
11+
jobs:
12+
build-and-push:
13+
runs-on: ubuntu-latest
14+
permissions:
15+
contents: read
16+
packages: write
17+
18+
strategy:
19+
matrix:
20+
include:
21+
- image: subgeneratorr-web
22+
dockerfile: web/Dockerfile
23+
context: .
24+
- image: subgeneratorr-worker
25+
dockerfile: web/Dockerfile
26+
context: .
27+
- image: subgeneratorr-cli
28+
dockerfile: cli/Dockerfile
29+
context: .
30+
31+
steps:
32+
- name: Checkout
33+
uses: actions/checkout@v4
34+
35+
- name: Set up QEMU
36+
uses: docker/setup-qemu-action@v3
37+
38+
- name: Set up Docker Buildx
39+
uses: docker/setup-buildx-action@v3
40+
41+
- name: Log in to GHCR
42+
uses: docker/login-action@v3
43+
with:
44+
registry: ${{ env.REGISTRY }}
45+
username: ${{ github.actor }}
46+
password: ${{ secrets.GITHUB_TOKEN }}
47+
48+
- name: Extract metadata
49+
id: meta
50+
uses: docker/metadata-action@v5
51+
with:
52+
images: ${{ env.REGISTRY }}/${{ github.repository_owner }}/${{ matrix.image }}
53+
tags: |
54+
type=semver,pattern={{version}}
55+
type=semver,pattern={{major}}.{{minor}}
56+
type=raw,value=latest
57+
58+
- name: Build and push
59+
uses: docker/build-push-action@v6
60+
with:
61+
context: ${{ matrix.context }}
62+
file: ${{ matrix.dockerfile }}
63+
platforms: linux/amd64,linux/arm64
64+
push: true
65+
tags: ${{ steps.meta.outputs.tags }}
66+
labels: ${{ steps.meta.outputs.labels }}
67+
cache-from: type=gha
68+
cache-to: type=gha,mode=max

.gitignore

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
# Environment variables
2+
.env
3+
4+
# Personal/internal documentation (keep local only)
5+
_devdocs/
6+
7+
# Claude Code
8+
.claude/
9+
10+
11+
# Python
12+
__pycache__/
13+
*.py[cod]
14+
*$py.class
15+
*.so
16+
.Python
17+
env/
18+
venv/
19+
ENV/
20+
build/
21+
dist/
22+
*.egg-info/
23+
.pytest_cache/
24+
25+
# Logs
26+
deepgram-logs/*.json
27+
*.log
28+
29+
# Personal video lists (users should create their own)
30+
video-list.txt
31+
batch-videos.txt
32+
test-videos.txt
33+
34+
# Temporary files
35+
/tmp/
36+
*.tmp
37+
*.mp3
38+
39+
# IDE
40+
.vscode/
41+
.idea/
42+
*.swp
43+
*.swo
44+
*~
45+
46+
# OS
47+
.DS_Store
48+
Thumbs.db
49+
50+
# Docker
51+
docker-compose.yml
52+
docker-compose.override.yml
53+
.mcp.json
54+
.worktrees/

CHANGELOG.md

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
# Changelog
2+
3+
All notable changes to Subgeneratorr will be documented in this file.
4+
5+
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
6+
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7+
8+
## [2.0.0] - 2026-02-25
9+
10+
Initial public release.
11+
12+
### Added
13+
14+
- **Core Transcription Engine** — Deepgram Nova-3 speech recognition with SRT subtitle output
15+
- **Nova-3 Full Feature Coverage** — Model selector (General/Medical), redaction (PCI/PII/numbers), find & replace, dictation mode, multichannel processing, utterance split threshold (0.1–5.0s), and request tagging
16+
- **Audio Intelligence** — Sentiment analysis, summarization, topic/intent/entity detection, and term search (English only, saved to Intelligence/ folder)
17+
- **Web UI** — Flask-based interface with dark/light themes, zone-based layout, gear popover for preferences, and collapsible Transcription Settings panel
18+
- **CLI** — Command-line tool for batch processing directories, individual files, or file lists
19+
- **LLM-Enhanced Keyterms** — Optional AI-powered generation of character names and terminology using Claude, GPT, or Gemini to improve transcription accuracy
20+
- **Multi-Language Support** — 50+ languages with regional variants (English, Spanish, French, German, Japanese, Korean, Hindi, and many more)
21+
- **Multilingual Model** — Special `multi` mode processes 10 languages simultaneously with automatic language detection
22+
- **Language-Aware Audio Selection** — Automatically selects the correct audio track in multi-language containers with surround sound center channel extraction
23+
- **Speaker Diarization** — Identify and label speakers in generated transcripts
24+
- **Subtitle Detection** — Sidecar file glob (`.en.srt`, `.ass`, `.vtt`) with ffprobe fallback to identify existing subtitles before processing
25+
- **File Browser** — Navigate media directories with client-side filtering and API-backed global search across the entire library
26+
- **Batch Processing** — Queue multiple files with Celery/Redis, real-time progress tracking, and polling watchdog for reliability
27+
- **Overwrite Protection** — Confirmation dialog before regenerating existing subtitles
28+
- **Cost Tracking** — Real-time per-file and session cost estimates with detailed logging (~$0.0043/min)
29+
- **Smart Skipping** — Automatically skip files that already have subtitles
30+
- **Docker Deployment** — Docker Compose with `MEDIA_PATH` env var, Dockerfile builds, health checks, and resource limits
31+
- **GHCR Docker Images** — Multi-arch (amd64 + arm64) pre-built images via GitHub Actions
32+
- **Media Server Integration** — Output `.eng.srt` files auto-recognized by Plex, Jellyfin, and Emby
33+
- **Sticky Action Bar** — Language selector and transcribe button remain accessible while scrolling
34+
- **iOS Safari Compatibility** — Fixed scroll bounce and viewport issues for mobile access
35+
- **Documentation** — Setup guide, technical reference, language support guide, API docs, contributing guidelines, and community files (CODE_OF_CONDUCT, SECURITY, issue/PR templates)
36+
37+
### Security
38+
39+
- **Path traversal protection** — Input validation on file paths to prevent directory escape
40+
- **Error path hardening** — Removed bare excepts, added timeout guards, and safe handling of empty API responses
41+
42+
[Unreleased]: https://github.com/tylerbcrawford/subgeneratorr/compare/v2.0.0...HEAD
43+
[2.0.0]: https://github.com/tylerbcrawford/subgeneratorr/releases/tag/v2.0.0

0 commit comments

Comments
 (0)