@@ -26,6 +26,78 @@ YuhHearDem3 is a parliamentary transcription and knowledge graph system that pro
2626└─────────────────────────────────────────────────────────────────────────────┘
2727```
2828
29+ ## Code Flow Diagram (Mermaid)
30+
31+ ``` mermaid
32+ flowchart LR
33+ subgraph Sources
34+ YT[YouTube or GCS video]
35+ OPDF[Order paper PDF]
36+ BillsSite[Parliament bills site]
37+ end
38+
39+ subgraph Transcription
40+ Transcribe[transcribe.py]
41+ JSONOut[transcription_output.json]
42+ end
43+
44+ subgraph TranscriptIngest
45+ IngestScript[scripts/ingest_transcript_json.py]
46+ Ingestor[lib/transcripts/ingestor.py]
47+ end
48+
49+ subgraph OrderPapers
50+ OPIngest[scripts/ingest_order_paper_pdf.py]
51+ OPParser[lib/order_papers/*.py]
52+ end
53+
54+ subgraph Bills
55+ BillIngest[scripts/ingest_bills.py]
56+ BillScraper[lib/scraping/bill_scraper.py]
57+ BillProcessor[lib/processors/bill_ingestor.py]
58+ end
59+
60+ subgraph KGExtraction
61+ KGVideo[scripts/kg_extract_from_video.py]
62+ KGBills[scripts/kg_extract_from_bills.py]
63+ WindowBuilder[lib/knowledge_graph/window_builder.py]
64+ BillWindowBuilder[lib/knowledge_graph/bill_window_builder.py]
65+ Extractor[lib/knowledge_graph/oss_kg_extractor.py + kg_extractor.py]
66+ KGStore[lib/knowledge_graph/kg_store.py]
67+ end
68+
69+ subgraph Storage[(PostgreSQL + pgvector)]
70+ Tables[Transcript + search + KG tables]
71+ end
72+
73+ subgraph SearchAPI
74+ API[api/search_api.py]
75+ ChatAgent[lib/chat_agent_v2.py]
76+ AgentLoop[lib/kg_agent_loop.py]
77+ HybridRAG[lib/kg_hybrid_graph_rag.py]
78+ AdvSearch[lib/advanced_search_features.py]
79+ end
80+
81+ subgraph Frontend
82+ UI[frontend/src (Vite + React)]
83+ end
84+
85+ YT --> Transcribe --> JSONOut --> IngestScript --> Ingestor --> Tables
86+ OPDF --> OPIngest --> OPParser --> Tables
87+ BillsSite --> BillIngest --> BillScraper --> BillProcessor --> Tables
88+
89+ Tables --> WindowBuilder --> KGVideo
90+ Tables --> BillWindowBuilder --> KGBills
91+ KGVideo --> Extractor --> KGStore --> Tables
92+ KGBills --> Extractor --> KGStore
93+
94+ Tables --> API
95+ API --> ChatAgent --> AgentLoop --> HybridRAG --> Tables
96+ API --> AdvSearch --> Tables
97+ UI --> API
98+ API --> UI
99+ ```
100+
29101## Code Map
30102
31103### Entry Points
@@ -50,12 +122,15 @@ YuhHearDem3 is a parliamentary transcription and knowledge graph system that pro
50122
51123| File | Lines | Purpose |
52124| ------| -------| ---------|
125+ | ` oss_kg_extractor.py ` | ~ 800 | OSS KG extraction (two-pass) |
53126| ` oss_two_pass.py ` | 677 | OSS two-pass entity extraction |
54127| ` window_builder.py ` | 287 | Window-based processing for transcripts |
128+ | ` bill_window_builder.py ` | ~ 200 | Bill excerpt window construction |
55129| ` kg_store.py ` | ~ 350 | KG storage operations |
56130| ` kg_extractor.py ` | ~ 550 | Main KG extraction logic |
57131| ` base_kg_seeder.py ` | ~ 300 | Base KG seeding |
58132| ` model_compare.py ` | ~ 300 | Model comparison utilities |
133+ | ` window_benchmark.py ` | ~ 160 | Window performance benchmarks |
59134
60135#### Order Papers (` lib/order_papers/ ` )
61136
@@ -74,6 +149,12 @@ YuhHearDem3 is a parliamentary transcription and knowledge graph system that pro
74149| ------| -------| ---------|
75150| ` ingestor.py ` | 433 | Transcript ingestion |
76151
152+ #### Embeddings (` lib/embeddings/ ` )
153+
154+ | File | Lines | Purpose |
155+ | ------| -------| ---------|
156+ | ` google_client.py ` | ~ 200 | Embedding generation client |
157+
77158#### Database (` lib/db/ ` )
78159
79160| File | Lines | Purpose |
@@ -101,18 +182,26 @@ YuhHearDem3 is a parliamentary transcription and knowledge graph system that pro
101182| File | Lines | Purpose |
102183| ------| -------| ---------|
103184| ` config.py ` | 85 | Configuration management |
104- | ` roles.py ` | ~ 50 | Role utilities |
185+
186+ #### Utilities (` lib/ ` )
187+
188+ | File | Lines | Purpose |
189+ | ------| -------| ---------|
105190| ` id_generators.py ` | ~ 100 | ID generation utilities |
191+ | ` roles.py ` | ~ 120 | Speaker role normalization utilities |
106192
107193### Scripts (` scripts/ ` )
108194
109195| File | Purpose |
110196| ------| ---------|
111197| ` kg_extract_from_video.py ` | Extract KG from video |
198+ | ` kg_extract_from_bills.py ` | Extract KG from bill excerpts |
112199| ` cron_transcription.py ` | Automated transcription jobs |
113200| ` migrate_chat_schema.py ` | Chat schema migration |
114201| ` clear_kg.py ` | Clear KG tables |
115202| ` ingest_order_paper_pdf.py ` | Ingest order paper PDFs |
203+ | ` ingest_transcript_json.py ` | Ingest transcript JSON into Postgres |
204+ | ` ingest_bills.py ` | Scrape/process bills and ingest |
116205| ` ingest_knowledge_graph.py ` | Ingest KG data |
117206| ` list_channel_videos.py ` | List channel videos |
118207| ` match_order_papers_to_videos.py ` | Match papers to videos |
0 commit comments