YuhHearDem3 - Quick Reference
# Basic transcription
python transcribe.py --order-file order.txt
# With custom segment duration
python transcribe.py --order-file order.txt --segment-minutes 30
# From specific start time
python transcribe.py --order-file order.txt --start-minutes 60
# Limit segments for testing
python transcribe.py --order-file order.txt --max-segments 2
Knowledge Graph Extraction
# Extract KG from video
python scripts/kg_extract_from_video.py --youtube-video-id " VIDEO_ID"
# Extract KG from bill excerpts
python scripts/kg_extract_from_bills.py --max-bills 10
# With custom window parameters
python scripts/kg_extract_from_video.py --youtube-video-id " VIDEO_ID" --window-size 15 --stride 10
# Limit windows for testing
python scripts/kg_extract_from_video.py --youtube-video-id " VIDEO_ID" --max-windows 5
# Enable debug mode
python scripts/kg_extract_from_video.py --youtube-video-id " VIDEO_ID" --debug
# Start chat API
python -m uvicorn api.search_api:app --reload --host 0.0.0.0 --port 8000
# Enable tracing
CHAT_TRACE=1 python -m uvicorn api.search_api:app --reload
# Process watchlist
python scripts/cron_transcription.py --process
# List watchlist
python scripts/cron_transcription.py --list
# Add to watchlist
python scripts/cron_transcription.py --add " VIDEO_ID"
# Remove from watchlist
python scripts/cron_transcription.py --remove " VIDEO_ID"
# Clear KG tables
python scripts/clear_kg.py --yes
# Migrate chat schema
python scripts/migrate_chat_schema.py
# Backfill speaker roles
python scripts/backfill_speaker_video_roles.py
# Ingest transcript JSON into Postgres
python scripts/ingest_transcript_json.py --transcript-file transcription_output.json --youtube-video-id " VIDEO_ID"
# Ingest bills into Postgres
python scripts/ingest_bills.py --scrape
# Ingest order paper PDF
python scripts/ingest_order_paper_pdf.py --file " order_paper.pdf"
# Match papers to videos
python scripts/match_order_papers_to_videos.py
# Export order paper
python scripts/export_order_paper.py --id " ORDER_ID"
# Run all tests
python -m pytest tests/ -v
# Run specific test file
python -m pytest tests/test_chat_agent_v2_unit.py -v
python -m pytest tests/test_kg_agent_loop_unit.py -v
# Lint
ruff check .
ruff check . --fix
# Type check
mypy lib/
Base URL: http://localhost:8000
Method
Endpoint
Description
POST
/search
Hybrid search
POST
/search/temporal
Search with date/speaker/entity filters
GET
/search/trends
Trend analysis for entities
GET
/speakers
List speakers
GET
/speakers/{speaker_id}
Speaker details
GET
/videos/{youtube_video_id}/speakers/{speaker_id}/roles
Speaker roles for a video
POST
/chat/threads
Create thread
POST
/chat/threads/{thread_id}/messages
Send message
GET
/chat/threads/{thread_id}/messages/stream
Stream message response
GET
/health
Health check
GET
/api
API metadata
Variable
Description
GOOGLE_API_KEY
Google AI API key
CHAT_TRACE
Enable tracing (1/true/on)
ENABLE_THINKING
Enable model thinking
Component
Location
Chat Agent
lib/chat_agent_v2.py
KG Agent Loop
lib/kg_agent_loop.py
Hybrid Graph-RAG
lib/kg_hybrid_graph_rag.py
Search API
api/search_api.py
Main Script
transcribe.py
KG Extraction
lib/knowledge_graph/
Order Papers
lib/order_papers/
Option
Default
Description
`--order-file
Path`
Required to order file
--order-paper-id
None
Order paper ID from database
--segment-minutes
30
Segment duration
--overlap-minutes
1
Segment overlap
--start-minutes
0
Start position
--max-segments
None
Limit segments
--output-file
Varies
Output file path
--video
None
YouTube ID/URL or gs:// URI
kg_extract_from_video.py
Option
Default
Description
--youtube-video-id
Required
Video ID
--window-size
30
Utterances per window
--stride
18
Utterances between windows
--max-windows
None
Limit windows
--model
gemini-2.5-flash
Model to use
--debug
False
Save failed responses
chat_threads - Conversation threads
chat_messages - Messages with role/content
chat_thread_state - Persisted state
kg_nodes - Canonical nodes
kg_aliases - Alias index
kg_edges - Edges with provenance
paragraphs - Paragraphs with embeddings
sentences - Sentences with provenance
speakers - Speaker information
speaker_video_roles - Speaker roles per video