Skip to content

Latest commit

 

History

History
208 lines (152 loc) · 5.35 KB

File metadata and controls

208 lines (152 loc) · 5.35 KB

YuhHearDem3 - Quick Reference

Essential Commands

Transcription

# Basic transcription
python transcribe.py --order-file order.txt

# With custom segment duration
python transcribe.py --order-file order.txt --segment-minutes 30

# From specific start time
python transcribe.py --order-file order.txt --start-minutes 60

# Limit segments for testing
python transcribe.py --order-file order.txt --max-segments 2

Knowledge Graph Extraction

# Extract KG from video
python scripts/kg_extract_from_video.py --youtube-video-id "VIDEO_ID"

# Extract KG from bill excerpts
python scripts/kg_extract_from_bills.py --max-bills 10

# With custom window parameters
python scripts/kg_extract_from_video.py --youtube-video-id "VIDEO_ID" --window-size 15 --stride 10

# Limit windows for testing
python scripts/kg_extract_from_video.py --youtube-video-id "VIDEO_ID" --max-windows 5

# Enable debug mode
python scripts/kg_extract_from_video.py --youtube-video-id "VIDEO_ID" --debug

API Server

# Start chat API
python -m uvicorn api.search_api:app --reload --host 0.0.0.0 --port 8000

# Enable tracing
CHAT_TRACE=1 python -m uvicorn api.search_api:app --reload

Cron Transcription

# Process watchlist
python scripts/cron_transcription.py --process

# List watchlist
python scripts/cron_transcription.py --list

# Add to watchlist
python scripts/cron_transcription.py --add "VIDEO_ID"

# Remove from watchlist
python scripts/cron_transcription.py --remove "VIDEO_ID"

Database Management

# Clear KG tables
python scripts/clear_kg.py --yes

# Migrate chat schema
python scripts/migrate_chat_schema.py

# Backfill speaker roles
python scripts/backfill_speaker_video_roles.py

# Ingest transcript JSON into Postgres
python scripts/ingest_transcript_json.py --transcript-file transcription_output.json --youtube-video-id "VIDEO_ID"

# Ingest bills into Postgres
python scripts/ingest_bills.py --scrape

Order Papers

# Ingest order paper PDF
python scripts/ingest_order_paper_pdf.py --file "order_paper.pdf"

# Match papers to videos
python scripts/match_order_papers_to_videos.py

# Export order paper
python scripts/export_order_paper.py --id "ORDER_ID"

Testing

# Run all tests
python -m pytest tests/ -v

# Run specific test file
python -m pytest tests/test_chat_agent_v2_unit.py -v
python -m pytest tests/test_kg_agent_loop_unit.py -v

# Lint
ruff check .
ruff check . --fix

# Type check
mypy lib/

API Endpoints

Base URL: http://localhost:8000

Method Endpoint Description
POST /search Hybrid search
POST /search/temporal Search with date/speaker/entity filters
GET /search/trends Trend analysis for entities
GET /speakers List speakers
GET /speakers/{speaker_id} Speaker details
GET /videos/{youtube_video_id}/speakers/{speaker_id}/roles Speaker roles for a video
POST /chat/threads Create thread
POST /chat/threads/{thread_id}/messages Send message
GET /chat/threads/{thread_id}/messages/stream Stream message response
GET /health Health check
GET /api API metadata

Environment Variables

Variable Description
GOOGLE_API_KEY Google AI API key
CHAT_TRACE Enable tracing (1/true/on)
ENABLE_THINKING Enable model thinking

Key Files

Component Location
Chat Agent lib/chat_agent_v2.py
KG Agent Loop lib/kg_agent_loop.py
Hybrid Graph-RAG lib/kg_hybrid_graph_rag.py
Search API api/search_api.py
Main Script transcribe.py
KG Extraction lib/knowledge_graph/
Order Papers lib/order_papers/

Documentation

Document Description
README.md Project overview
COMPLETE_GUIDE.md Full implementation guide
CODE_MAP_AND_REVIEW.md Code structure
CHAT_TRACE.md Debug tracing

Common Options

transcribe.py

Option Default Description
`--order-file Path` Required to order file
--order-paper-id None Order paper ID from database
--segment-minutes 30 Segment duration
--overlap-minutes 1 Segment overlap
--start-minutes 0 Start position
--max-segments None Limit segments
--output-file Varies Output file path
--video None YouTube ID/URL or gs:// URI

kg_extract_from_video.py

Option Default Description
--youtube-video-id Required Video ID
--window-size 30 Utterances per window
--stride 18 Utterances between windows
--max-windows None Limit windows
--model gemini-2.5-flash Model to use
--debug False Save failed responses

Database Tables

Chat Schema

  • chat_threads - Conversation threads
  • chat_messages - Messages with role/content
  • chat_thread_state - Persisted state

KG Schema

  • kg_nodes - Canonical nodes
  • kg_aliases - Alias index
  • kg_edges - Edges with provenance

Transcript Schema

  • paragraphs - Paragraphs with embeddings
  • sentences - Sentences with provenance
  • speakers - Speaker information
  • speaker_video_roles - Speaker roles per video