OverSight - AI & Data Governance Platform

Enterprise AI control plane providing continuous, real-time visibility and governance over how data is used by AI systems.

Features

Data Ingestion: Multi-source data ingestion (RDMS, NoSQL DBs, SQLite, JSON, CSV, etc)
AI Enrichment: Automatic metadata generation and classification using Gemini LLM
Multi-label Tagging: Intelligent categorization (product, sales, hr, finance, etc.)
Collection-based Queries: Organize data by business domains
Analytics Dashboard: Comprehensive statistics and insights
Review Workflow: Human-in-the-loop for low-confidence records
AI Agents System: Intelligent agents powered by LangChain & Gemini
- Supervisor Agent for multi-agent coordination
- Data Discovery Agent for finding datasets
- Metadata Agent for querying metadata & lineage
- Compliance Agent for PII and governance checks
- Analytics Agent for insights and statistics
DataHub Integration: Query metadata from DataHub catalogs
Compliance, Monitoring & Governance of AI Agents: Track AI Agents , Cost of Compute, Full Trace and Compliance Violations
Real-time Chat: WebSocket support for live agent interactions
Conversation Memory: Persistent conversation history
REST API: Complete FastAPI backend with 20+ endpoints

Quick Start

1. Install Dependencies

pip install -r requirements.txt

2. Configure API Key

Create a .env file:

GEMINI_API_KEY=your_gemini_api_key_here

3. Run Tests

# Run all tests
pytest tests/

# Run specific test
python tests/integration/test_enrichment_api.py

4. Start API Server

python scripts/start_api_server.py

API available at: http://localhost:8000 Docs: http://localhost:8000/docs

5. Run Complete Workflow

python scripts/run_ingestion_with_enrichment.py

Architecture

backend/
├── api/              # HTTP layer (routes, schemas)
├── services/         # Business logic
├── repositories/     # Data access layer
├── ingestion/        # Data ingestion pipeline
└── core/             # Shared utilities

Clean 3-layer architecture:
API Layer → Service Layer → Repository Layer → Database

API Endpoints

Data Enrichment

POST /api/enrich - Enrich records with AI
GET /api/enriched - Query enriched data
GET /api/collections - List collections
GET /api/collections/{name} - Get collection data
GET /api/analytics - View statistics
GET /api/review - Review queue
GET /api/taxonomy - Available tags
GET /api/health - Health check

AI Agents

POST /api/agents - Create new agent
GET /api/agents - List all agents
GET /api/agents/{id} - Get agent details
PUT /api/agents/{id} - Update agent configuration
DELETE /api/agents/{id} - Delete agent
POST /api/agents/query - Query an agent (supervisor, data_discovery, metadata, compliance, analytics)
GET /api/agents/{id}/conversations - Get conversation history
GET /api/agents/{id}/stats - Agent statistics
GET /api/agents/tools/stats - Tool execution statistics
WS /api/ws/agent/chat - Real-time agent chat (WebSocket)
WS /api/ws/agent/stream - Streaming responses (WebSocket)

Documentation

Quick Start: docs/QUICKSTART.md
Full Docs: docs/README_ENRICHMENT.md
Implementation: docs/IMPLEMENTATION_COMPLETE.md
Refactoring: REFACTORING_SUMMARY.md

Project Structure

├── backend/            # Backend API and services
├── frontend/           # React frontend
├── tests/              # Test suite
├── scripts/            # Utility scripts
├── docs/               # Documentation
├── data/               # Data files and database
└── output/             # Ingestion output

Tech Stack

Backend: FastAPI, SQLAlchemy, Pydantic
AI: Google Gemini LLM
Database: SQLite (PostgreSQL-ready)
Frontend: React, Vite, TailwindCSS
Testing: Pytest

Development

Run Tests

pytest tests/ -v

Code Quality

# Type checking
mypy backend/

# Linting
flake8 backend/

Export Data

python scripts/export_enriched_output.py

DataHub Integration

OverSight integrates with DataHub for enhanced data discovery and governance. DataHub provides a web UI to browse, search, and visualize your enriched metadata.

Setup DataHub

1. Deploy DataHub Locally

# Install DataHub CLI (already included in requirements.txt)
pip install acryl-datahub[datahub-rest]

# Deploy DataHub using Docker (requires 4GB+ RAM)
datahub docker quickstart

DataHub UI will be available at: http://localhost:9002 (username: datahub, password: datahub)

2. Initialize Tags and Domains

Create OverSight taxonomy in DataHub:

python scripts/initialize_datahub.py

This creates:

14 tags: product, sales, hr, finance, marketing, operations, customer_data, transaction, analytics, logs, pii, sensitive, public, structured, unstructured, media
5 domains: Sales, HR, Finance, Operations, Product

3. Sync Enriched Data

Push enriched data to DataHub:

python scripts/sync_to_datahub.py

Or use the API:

curl -X POST http://localhost:8000/api/datahub/sync

DataHub API Endpoints

POST /api/datahub/sync - Sync all enriched data to DataHub
GET /api/datahub/status - Check DataHub connectivity and sync statistics
POST /api/datahub/initialize - Initialize tags and domains (one-time setup)

Using DataHub UI

Open http://localhost:9002 and login
Browse → Platform: "oversight"
View synced datasets with:
- AI-generated descriptions
- Applied tags from taxonomy
- Inferred schema from raw data
- Custom properties (confidence scores, record counts)
- Link to OverSight API for detailed records

DataHub Architecture

OverSight Enriched Data → DataHub Sync Service → DataHub GMS → DataHub UI
                                                              → ElasticSearch (Search)

Each source system (sqlite_products, json_sales, csv_users) becomes a separate dataset entity in DataHub with:

Aggregated metadata from all records
Unique tags collected from enriched data
Inferred schema from raw_data fields
Custom properties linking back to OverSight API

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
backend		backend
data		data
deployment		deployment
frontend		frontend
scripts		scripts
web		web
.gitignore		.gitignore
OverSight-Architecture.jpeg		OverSight-Architecture.jpeg
QUICKSTART_AGENTS.md		QUICKSTART_AGENTS.md
README.md		README.md
deploy-vercel.sh		deploy-vercel.sh
new.csv		new.csv
package-lock.json		package-lock.json
package.json		package.json
requirements.txt		requirements.txt
system_architecture.png		system_architecture.png
vercel.json		vercel.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OverSight - AI & Data Governance Platform

Features

Quick Start

1. Install Dependencies

2. Configure API Key

3. Run Tests

4. Start API Server

5. Run Complete Workflow

Architecture

API Endpoints

Data Enrichment

AI Agents

Documentation

Project Structure

Tech Stack

Development

Run Tests

Code Quality

Export Data

DataHub Integration

Setup DataHub

DataHub API Endpoints

Using DataHub UI

DataHub Architecture

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OverSight - AI & Data Governance Platform

Features

Quick Start

1. Install Dependencies

2. Configure API Key

3. Run Tests

4. Start API Server

5. Run Complete Workflow

Architecture

API Endpoints

Data Enrichment

AI Agents

Documentation

Project Structure

Tech Stack

Development

Run Tests

Code Quality

Export Data

DataHub Integration

Setup DataHub

DataHub API Endpoints

Using DataHub UI

DataHub Architecture

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages