You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: video pipeline OCR quality fixes + two-pass AI enhancement
- Skip OCR on WEBCAM/OTHER frames (eliminates ~64 junk results per video)
- Add _clean_ocr_line() to strip line numbers, IDE decorations, collapse markers
- Add _fix_intra_line_duplication() for multi-engine OCR overlap artifacts
- Add _is_likely_code() filter to prevent UI junk in reference code fences
- Add language detection to get_text_groups() via LanguageDetector
- Apply OCR cleaning in _assemble_structured_text() pipeline
- Add two-pass AI enhancement: Pass 1 cleans reference Code Timeline
using transcript context, Pass 2 generates SKILL.md from cleaned refs
- Update video-tutorial.yaml prompts for pre-cleaned references
- Add 17 new tests (197 total video tests), 2540 tests passing
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: CHANGELOG.md
+13-3Lines changed: 13 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
7
7
8
8
## [Unreleased]
9
9
10
-
**Theme:** Video source support (BETA), Word document support, and quality improvements. 94 files changed, +23,037 lines since v3.1.3. **2,523 tests passing.**
10
+
**Theme:** Video source support (BETA), Word document support, and quality improvements. 94 files changed, +23,500 lines since v3.1.3. **2,540 tests passing.**
11
11
12
12
### 🎬 Video Tutorial Scraping Pipeline (BETA)
13
13
@@ -23,7 +23,7 @@ Complete video tutorial extraction system that converts YouTube videos and local
23
23
-**`video_metadata.py`** (~270 lines) — YouTube metadata extraction (title, channel, views, chapters, duration) via yt-dlp; local file metadata via ffprobe
24
24
-**`video_transcript.py`** (~370 lines) — Multi-source transcript extraction with 3-tier fallback: YouTube Transcript API → yt-dlp subtitles → faster-whisper local transcription
25
25
-**`video_segmenter.py`** (~220 lines) — Chapter-based and time-window segmentation with configurable overlap
- Panel detection — splits IDE screenshots into independent sub-sections (code, terminal, file tree)
@@ -37,11 +37,13 @@ Complete video tutorial extraction system that converts YouTube videos and local
37
37
- Tesseract circuit breaker (`_tesseract_broken` flag) — disables pytesseract after first failure
38
38
-**Audio-visual alignment** — Code blocks paired with narrator transcript for context
39
39
-**Video-specific AI enhancement** — Custom prompt for OCR denoising, code reconstruction, and tutorial narrative synthesis
40
+
-**Two-pass AI enhancement** — Pass 1 cleans reference files (Code Timeline reconstruction from transcript context), Pass 2 generates SKILL.md from cleaned references
41
+
-**`_ai_clean_reference()`** — Sends reference file to Claude to reconstruct code blocks using transcript context, fixing OCR noise before SKILL.md generation
#### Video `--setup`: GPU Auto-Detection & Dependency Installation
47
49
-**`skill-seekers video --setup`** — One-command GPU auto-detection and dependency installation
@@ -80,6 +82,14 @@ Complete video tutorial extraction system that converts YouTube videos and local
80
82
81
83
### Fixed
82
84
85
+
#### Video Pipeline OCR Quality Fixes (6)
86
+
-**Webcam/OTHER frames skip OCR** — WEBCAM and OTHER frame types no longer get OCR'd, eliminating ~64 junk OCR results per video
87
+
-**`_clean_ocr_line()` helper** — Strips leading line numbers, IDE tab bar text, Unity Inspector labels, and VS Code collapse markers from OCR output
88
+
-**`_fix_intra_line_duplication()`** — Detects and removes token sequence repetition from multi-engine OCR overlap (e.g., `gpublic class Card Jpublic class Card` → `public class Card`)
89
+
-**`_is_likely_code()` filter** — Reference file code fences now filtered to reject UI junk (Inspector, Hierarchy, Canvas labels) that passed frame classification
90
+
-**Language detection on text groups** — `get_text_groups()` now runs `LanguageDetector.detect_from_code()` on each group, filling the previously-always-None `detected_language` field
91
+
-**OCR cleaning in text assembly** — `_assemble_structured_text()` applies `_clean_ocr_line()` to every line before joining
92
+
83
93
#### Video Pipeline Fixes (15)
84
94
-**`extract_visual_data` returning 2-tuple instead of 3** — Caused `ValueError` crash when unpacking results
85
95
-**pytesseract in core deps** — Moved from core dependencies to `[video-full]` optional group
0 commit comments