Skip to content

Commit 97ef437

Browse files
jeremymanningclaude
andcommitted
Add video transcripts, embeddings, catalog + update documentation
- 5,407 Whisper transcripts (44MB) from Khan Academy videos - 5,044 sliding-window embedding .npy files (247MB) - Video catalog.json (5,044 videos, 77K windows) - AGENTS.md: complete rewrite for current Vite/ES6 modular architecture - README.md: updated features (50 domains, 2,450 questions, 5,000+ videos) - Session notes: pipeline refresh status and key insights - .gitignore: un-ignore video data, ignore duplicate transcripts_raw/ Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent a0c03dd commit 97ef437

File tree

10,610 files changed

+59502
-185
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

10,610 files changed

+59502
-185
lines changed

.gitignore

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -297,17 +297,15 @@ checkpoints.zip
297297
wikipedia_articles_level_0.json.zip
298298
level_0_concepts.json.zip
299299
data.zip
300-
.gitignore
301300
backups/embeddings.zip
302301
backups/large_checkpoints/level_0_final.json
303302
backups/large_checkpoints/level_1_after_download.json
304303

305-
# Video pipeline working files (large binary data)
306-
data/videos/.working/embeddings/
304+
# Video pipeline working files
307305
data/videos/.working/coordinates/
308-
309-
# Video catalog (generated by export_video_catalog.py)
310-
data/videos/catalog.json
306+
data/videos/.working/audio_cache/
307+
data/videos/transcripts_raw/
308+
data/videos/transcripts/
311309

312310
# Python virtual environments
313311
.venv/

AGENTS.md

Lines changed: 186 additions & 134 deletions

README.md

Lines changed: 46 additions & 38 deletions
54.1 KB
Binary file not shown.
78.1 KB
Binary file not shown.
36.1 KB
Binary file not shown.
75.1 KB
Binary file not shown.
24.1 KB
Binary file not shown.
30.1 KB
Binary file not shown.
3.13 KB
Binary file not shown.

0 commit comments

Comments
 (0)