YuhHearDem3 - Barbados Parliament Search & Knowledge Graph

A comprehensive parliamentary transcription and search system that processes video recordings, extracts knowledge graphs, and enables conversational search over Barbados Parliament debates.

Features

Video Transcription

Iterative Processing: Breaks long videos into overlapping segments with context preservation
Speaker Consistency: Maintains speaker IDs across segments using fuzzy name matching
Legislation Tracking: Identifies bills and laws discussed in videos
Order Paper Integration: Uses order papers for context and speaker roles
Video Metadata: Automatically fetches title, duration, and upload date via yt-dlp

Knowledge Graph Extraction

LLM-First Extraction: Uses Google Gemini to extract entities and relationships in a single pass
Window-Based Processing: Concept windows with configurable size and stride (default: 30 utterances, stride 18)
Semantic Relationships: Captures 15 predicates (11 conceptual + 4 discourse)
Canonical IDs: Hash-based stable node and edge IDs for consistency
OSS Two-Pass: Advanced extraction with improved entity resolution

Conversational Search

Thread-Based Chat: Persistent conversation threads in PostgreSQL
Hybrid Graph-RAG: Retrieves compact subgraphs with citations
Follow-Up Suggestions: Generates contextual follow-up questions
Full Citation Tracing: Every answer grounded in transcript evidence

Frontend UI

React + Vite: Single-page app served from frontend/dist
Streaming Chat: SSE-based progress updates
Graph View: Explore entity connections visually

Search System

Hybrid Search: Combines vector similarity, BM25 full-text, and graph traversal
Temporal Filters: Search within date ranges
Speaker Filtering: Filter results by speaker
Graph Visualization: Interactive exploration of entity relationships

Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│                         YuhHearDem3 System                               │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  ┌─────────────┐    ┌─────────────────┐    ┌─────────────────────────┐  │
│  │   Video     │───▶│  Transcription  │───▶│  Three-Tier Storage     │  │
│  │  (YouTube)  │    │  (Gemini 2.5)   │    │  (PostgreSQL + pgvector)│  │
│  └─────────────┘    └─────────────────┘    └─────────────────────────┘  │
│         │                   │                         │                    │
│         ▼                   ▼                         ▼                    │
│  ┌─────────────┐    ┌─────────────────┐    ┌─────────────────────────┐  │
│  │ Order Papers│    │ Knowledge Graph │    │     Search API (FastAPI) │  │
│  │  (PDF)      │───▶│    Extraction   │───▶│     - Hybrid Search      │  │
│  └─────────────┘    └─────────────────┘    │     - Conversational    │  │
│                                             │     - Graph Traversal    │  │
│                                             └─────────────────────────┘  │
└─────────────────────────────────────────────────────────────────────────┘

The frontend is served by FastAPI from frontend/dist and talks to the same API origin.

Quick Start

Prerequisites

Python 3.13+
PostgreSQL 16+ with pgvector
Google AI API key

Installation

# Clone and install
git clone https://github.com/anomalyco/YuhHearDem3.git
cd YuhHearDem3

# Create virtual environment
python -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Set API key
export GOOGLE_API_KEY="your-api-key"

Transcribe a Video

python transcribe.py --order-file order.txt --segment-minutes 30

Extract Knowledge Graph

python scripts/kg_extract_from_video.py --youtube-video-id "VIDEO_ID"

Ingest Transcript JSON

python scripts/ingest_transcript_json.py --transcript-file transcription_output.json --youtube-video-id "VIDEO_ID"

Start Chat API

python -m uvicorn api.search_api:app --reload --host 0.0.0.0 --port 8000

Project Structure

YuhHearDem3/
├── api/
│   └── search_api.py              # FastAPI application with all endpoints
├── lib/
│   ├── chat_agent_v2.py           # Conversational AI agent
│   ├── kg_agent_loop.py           # KG-powered agent loop
│   ├── kg_hybrid_graph_rag.py     # Hybrid Graph-RAG retrieval
│   ├── advanced_search_features.py # Temporal search, trends, graph queries
│   ├── embeddings/                # Embedding clients
│   ├── knowledge_graph/
│   │   ├── oss_two_pass.py       # OSS two-pass extraction
│   │   ├── window_builder.py      # Window-based processing
│   │   ├── kg_store.py            # KG storage operations
│   │   └── kg_extractor.py        # Main KG extraction
│   ├── order_papers/
│   │   ├── pdf_parser.py          # PDF order paper parsing
│   │   ├── video_matcher.py       # Match papers to videos
│   │   └── ingestor.py            # Order paper ingestion
│   └── transcripts/
│       └── ingestor.py            # Transcript ingestion
├── scripts/
│   ├── kg_extract_from_video.py   # Extract KG from video
│   ├── cron_transcription.py      # Automated transcription
│   ├── migrate_chat_schema.py     # Chat schema migration
│   └── clear_kg.py                # Clear KG tables
├── frontend/                       # React frontend (Vite)
├── tests/                          # Unit tests
└── docs/                           # Documentation

Documentation

Document	Description
COMPLETE_GUIDE.md	Comprehensive implementation guide
QUICK_REFERENCE.md	Command quick reference
CHAT_TRACE.md	Debug tracing documentation
DATE_NORMALIZATION.md	Date handling
CODE_MAP_AND_REVIEW.md	Code map and flow diagram
README_SEARCH_SYSTEM.md	Search system details

Technology Stack

Backend: Python 3.13+, FastAPI, Pydantic
Database: PostgreSQL 16+, pgvector
AI: Google Gemini 2.5 Flash
Video: yt-dlp
Search: Hybrid vector/graph retrieval
Testing: pytest, ruff, mypy

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 115 Commits
.github/workflows		.github/workflows
api		api
config		config
deploy		deploy
docs		docs
frontend		frontend
lib		lib
schema		schema
scripts		scripts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
Dockerfile		Dockerfile
Makefile		Makefile
db.sh		db.sh
demo_trace.py		demo_trace.py
docker-compose.yml		docker-compose.yml
order.txt		order.txt
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
runme.sh		runme.sh
test_staging.py		test_staging.py
transcribe.py		transcribe.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

YuhHearDem3 - Barbados Parliament Search & Knowledge Graph

Features

Video Transcription

Knowledge Graph Extraction

Conversational Search

Frontend UI

Search System

Architecture

Quick Start

Prerequisites

Installation

Transcribe a Video

Extract Knowledge Graph

Ingest Transcript JSON

Start Chat API

Project Structure

Documentation

Technology Stack

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

YuhHearDem3 - Barbados Parliament Search & Knowledge Graph

Features

Video Transcription

Knowledge Graph Extraction

Conversational Search

Frontend UI

Search System

Architecture

Quick Start

Prerequisites

Installation

Transcribe a Video

Extract Knowledge Graph

Ingest Transcript JSON

Start Chat API

Project Structure

Documentation

Technology Stack

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages