Enterprise AI control plane providing continuous, real-time visibility and governance over how data is used by AI systems.
- Data Ingestion: Multi-source data ingestion (RDMS, NoSQL DBs, SQLite, JSON, CSV, etc)
- AI Enrichment: Automatic metadata generation and classification using Gemini LLM
- Multi-label Tagging: Intelligent categorization (product, sales, hr, finance, etc.)
- Collection-based Queries: Organize data by business domains
- Analytics Dashboard: Comprehensive statistics and insights
- Review Workflow: Human-in-the-loop for low-confidence records
- AI Agents System: Intelligent agents powered by LangChain & Gemini
- Supervisor Agent for multi-agent coordination
- Data Discovery Agent for finding datasets
- Metadata Agent for querying metadata & lineage
- Compliance Agent for PII and governance checks
- Analytics Agent for insights and statistics
- DataHub Integration: Query metadata from DataHub catalogs
- Compliance, Monitoring & Governance of AI Agents: Track AI Agents , Cost of Compute, Full Trace and Compliance Violations
- Real-time Chat: WebSocket support for live agent interactions
- Conversation Memory: Persistent conversation history
- REST API: Complete FastAPI backend with 20+ endpoints
pip install -r requirements.txtCreate a .env file:
GEMINI_API_KEY=your_gemini_api_key_here# Run all tests
pytest tests/
# Run specific test
python tests/integration/test_enrichment_api.pypython scripts/start_api_server.pyAPI available at: http://localhost:8000 Docs: http://localhost:8000/docs
python scripts/run_ingestion_with_enrichment.pybackend/
├── api/ # HTTP layer (routes, schemas)
├── services/ # Business logic
├── repositories/ # Data access layer
├── ingestion/ # Data ingestion pipeline
└── core/ # Shared utilities
Clean 3-layer architecture:
API Layer → Service Layer → Repository Layer → Database
POST /api/enrich- Enrich records with AIGET /api/enriched- Query enriched dataGET /api/collections- List collectionsGET /api/collections/{name}- Get collection dataGET /api/analytics- View statisticsGET /api/review- Review queueGET /api/taxonomy- Available tagsGET /api/health- Health check
POST /api/agents- Create new agentGET /api/agents- List all agentsGET /api/agents/{id}- Get agent detailsPUT /api/agents/{id}- Update agent configurationDELETE /api/agents/{id}- Delete agentPOST /api/agents/query- Query an agent (supervisor, data_discovery, metadata, compliance, analytics)GET /api/agents/{id}/conversations- Get conversation historyGET /api/agents/{id}/stats- Agent statisticsGET /api/agents/tools/stats- Tool execution statisticsWS /api/ws/agent/chat- Real-time agent chat (WebSocket)WS /api/ws/agent/stream- Streaming responses (WebSocket)
- Quick Start:
docs/QUICKSTART.md - Full Docs:
docs/README_ENRICHMENT.md - Implementation:
docs/IMPLEMENTATION_COMPLETE.md - Refactoring:
REFACTORING_SUMMARY.md
├── backend/ # Backend API and services
├── frontend/ # React frontend
├── tests/ # Test suite
├── scripts/ # Utility scripts
├── docs/ # Documentation
├── data/ # Data files and database
└── output/ # Ingestion output
- Backend: FastAPI, SQLAlchemy, Pydantic
- AI: Google Gemini LLM
- Database: SQLite (PostgreSQL-ready)
- Frontend: React, Vite, TailwindCSS
- Testing: Pytest
pytest tests/ -v# Type checking
mypy backend/
# Linting
flake8 backend/python scripts/export_enriched_output.pyOverSight integrates with DataHub for enhanced data discovery and governance. DataHub provides a web UI to browse, search, and visualize your enriched metadata.
1. Deploy DataHub Locally
# Install DataHub CLI (already included in requirements.txt)
pip install acryl-datahub[datahub-rest]
# Deploy DataHub using Docker (requires 4GB+ RAM)
datahub docker quickstartDataHub UI will be available at: http://localhost:9002 (username: datahub, password: datahub)
2. Initialize Tags and Domains
Create OverSight taxonomy in DataHub:
python scripts/initialize_datahub.pyThis creates:
- 14 tags: product, sales, hr, finance, marketing, operations, customer_data, transaction, analytics, logs, pii, sensitive, public, structured, unstructured, media
- 5 domains: Sales, HR, Finance, Operations, Product
3. Sync Enriched Data
Push enriched data to DataHub:
python scripts/sync_to_datahub.pyOr use the API:
curl -X POST http://localhost:8000/api/datahub/syncPOST /api/datahub/sync- Sync all enriched data to DataHubGET /api/datahub/status- Check DataHub connectivity and sync statisticsPOST /api/datahub/initialize- Initialize tags and domains (one-time setup)
- Open http://localhost:9002 and login
- Browse → Platform: "oversight"
- View synced datasets with:
- AI-generated descriptions
- Applied tags from taxonomy
- Inferred schema from raw data
- Custom properties (confidence scores, record counts)
- Link to OverSight API for detailed records
OverSight Enriched Data → DataHub Sync Service → DataHub GMS → DataHub UI
→ ElasticSearch (Search)
Each source system (sqlite_products, json_sales, csv_users) becomes a separate dataset entity in DataHub with:
- Aggregated metadata from all records
- Unique tags collected from enriched data
- Inferred schema from raw_data fields
- Custom properties linking back to OverSight API
Copyright © 2026 OverSight