SentinelAI — Enterprise AI Reliability & Governance Platform

SentinelAI

SentinelAI — Enterprise AI Reliability & Governance Platform

SentinelAI is a distributed AI reliability and monitoring platform designed to:

Detect model drift
Monitor inference anomalies
Track LLM hallucination risk
Provide real-time observability
Automate AI governance workflows

It combines statistical ML monitoring with LLM-powered incident intelligence.

🚀 Quickstart — Docker Compose (recommended)

Prerequisites: Docker 24+ with Compose v2 (docker compose version).

# 1. Copy environment defaults
cp .env.example .env

# 2. Start the full local stack
docker compose up --build

Once running, open:

Service	URL
Streamlit Dashboard	http://localhost:8501
Prometheus	http://localhost:9090
Grafana (admin / admin)	http://localhost:3000
Ingestion API	http://localhost:8080
Drift Engine API	http://localhost:7070
LLM Guard API	http://localhost:8000

Send a sample inference log

curl -X POST http://localhost:8080/log \
  -H "Content-Type: application/json" \
  -d '{"model_id":"demo","model_version":"v1","latency_ms":120,"tokens_in":32,"tokens_out":64,"status":"ok"}'

Run a drift computation

curl -X POST http://localhost:7070/drift \
  -H "Content-Type: application/json" \
  -d '{"model_id":"demo","feature_name":"latency","expected":[0.2,0.3,0.25,0.25],"actual":[0.1,0.35,0.30,0.25]}'

Summarize an incident (Ollama optional — falls back to rule-based stub)

curl -X POST http://localhost:8000/summarize \
  -H "Content-Type: application/json" \
  -d '{"log_data":"PSI 0.35 on latency feature, model demo v1","persist":false}'

⚙️ Configuration

All configuration is via environment variables. Copy .env.example to .env and adjust.

Variable	Default	Description
`WAREHOUSE_MODE`	`postgres`	`postgres` (local) or `snowflake`
`DATABASE_URL`	Postgres DSN	Full Postgres connection string
`POSTGRES_USER`	`sentinel`	Postgres user
`POSTGRES_PASSWORD`	`sentinel`	Postgres password
`POSTGRES_DB`	`sentinel`	Postgres database
`OLLAMA_HOST`	`http://ollama:11434`	Ollama endpoint (optional)
`LLM_MODEL`	`llama2`	LLM model name

Snowflake (optional): set WAREHOUSE_MODE=snowflake and fill in SNOWFLAKE_ACCOUNT, SNOWFLAKE_USER, SNOWFLAKE_PASSWORD, SNOWFLAKE_DATABASE, SNOWFLAKE_SCHEMA, SNOWFLAKE_WAREHOUSE.

🏗️ Architecture

User → Go Ingestion API (8080) → Postgres (local) / Snowflake (optional)
                                        ↓
                              Drift Engine C++ (7070)
                                        ↓
                               LLM Guard Python (8000)
                                        ↓
                           Streamlit Dashboard (8501)
                                        ↓
                         Prometheus (9090) + Grafana (3000)

Services

Service	Language	Port	Description
`ingestion-service`	Go	8080	Receives inference logs, writes to warehouse
`drift-engine`	C++ + Python	7070	PSI/KS drift detection
`llm-guard`	Python	8000	LLM-powered incident summarization
`streamlit-dashboard`	Python	8501	Control plane UI
`postgres`	—	5432	Local warehouse (default)
`prometheus`	—	9090	Metrics scraping
`grafana`	—	3000	Dashboards

📊 Metrics

Metric	Value
PSI Detection Threshold	0.20
P95 API Latency	180ms
Throughput	150 RPS
Drift Engine Compute	<2ms
LLM Summarization	~1.2s

🧠 Extended Q&A

Why use C++ for drift detection?

To achieve sub-millisecond statistical scoring at scale.

Why Go for ingestion?

Go provides efficient concurrency and low-latency HTTP handling.

Why Postgres locally (not Snowflake)?

Postgres is free, runs in Docker, and supports the same SQL schema. Switch to WAREHOUSE_MODE=snowflake when you're ready to push to production.

Why MLflow?

Experiment tracking, reproducibility, and version control.

Why LangChain + Ollama?

LLM-powered root cause summarization and RAG over historical incidents.

Why Kubernetes?

Horizontal scaling and production-grade orchestration.

Why Terraform?

Reproducible infrastructure as code.

🏢 Enterprise Value

SentinelAI demonstrates:

AI system lifecycle management
Drift monitoring
MLOps integration
Distributed systems engineering
Cloud-native architecture
LLM augmentation
Observability & metrics-driven design

🔧 Recent Code Improvements

The following fixes were applied to improve reliability, correctness, and security:

File	Issue	Fix
`api/core/model.py` (new)	`core.model` module was missing, crashing on import	Created `SentinelModel` (two-layer MLP) as a proper PyTorch module
`api/__init__.py` (new)	Package not importable as `api.*`	Added package init file
`api/inference.py`	`from core.model import …` caused `ModuleNotFoundError`	Updated to `from api.core.model import SentinelModel`
`api/main.py`	`from core.inference import …` caused `ModuleNotFoundError`; missing `GET /` route	Updated import to `from api.inference import run_inference`; added root route
`api/auth.py`	Hardcoded `admin/admin` credentials	Reads `API_USERNAME` / `API_PASSWORD` from environment; rejects auth when `API_PASSWORD` is unset
`api/routes/inference.py`	Model loaded at import time (blocks startup, crashes without GPU/HuggingFace access); `device_map="auto"` forced CUDA	Lazy-loads model on first request; model name configurable via `LLM_MODEL_NAME` env var; CPU fallback added
`backend/app/main.py`	`.cuda()` called unconditionally (crashes on CPU-only hosts); MLflow `start_run()` ran at module level (import fails if MLflow unreachable)	Added `cpu/cuda` device selection; wrapped MLflow block in `try/except`
`llm-guard/app.py`	`except Exception: pass` silently swallowed DB errors	Replaced with `logger.exception(…)` + `conn.rollback()`
`tests/conftest.py`	`from app.main import app` — wrong package path, caused all tests to fail	Fixed to `from api.main import app`
`requirements.txt`	Missing `httpx` (required by FastAPI `TestClient`) and `pydantic`	Added both packages

Running tests locally

pip install -r requirements.txt
pytest tests/ -v

Environment variables added

Variable	Default	Description
`API_USERNAME`	`admin`	Login username for the API auth endpoint
`API_PASSWORD`	(unset — auth disabled until set)	Login password; must be set to enable auth
`LLM_MODEL_NAME`	`meta-llama/Meta-Llama-3-8B`	HuggingFace model used by the inference route

Add automated retraining pipeline
Add Shadow Model Deployment
Add Cost Optimization Engine
Add Hallucination Classifier Model

Name		Name	Last commit message	Last commit date
Latest commit History 126 Commits
.github/workflows		.github/workflows
api		api
backend/app		backend/app
dashboard		dashboard
docker		docker
drift-engine		drift-engine
frontend		frontend
helm/sentinel/templates		helm/sentinel/templates
ingestion-service		ingestion-service
ingestion_cpp		ingestion_cpp
k8s		k8s
llm-guard		llm-guard
load_test		load_test
mlflow		mlflow
monitoring		monitoring
n8n		n8n
orchestration		orchestration
streamlit-dashboard		streamlit-dashboard
terraform		terraform
tests		tests
training-pipeline		training-pipeline
.env.example		.env.example
.gitignore		.gitignore
Contributing.md		Contributing.md
DailyLog.md		DailyLog.md
FILE Structure		FILE Structure
LICENSE		LICENSE
Metrics.md		Metrics.md
Prometheus ServiceMonitor (Metrics Credibility)		Prometheus ServiceMonitor (Metrics Credibility)
README.md		README.md
Test		Test
bindings.cpp		bindings.cpp
docker-compose.yml		docker-compose.yml
drift_engine.cpp		drift_engine.cpp
mlflow_tracking.py		mlflow_tracking.py
pytest.ini		pytest.ini
render.yaml		render.yaml
requirements.txt		requirements.txt
system_design.md		system_design.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SentinelAI — Enterprise AI Reliability & Governance Platform

🚀 Quickstart — Docker Compose (recommended)

Send a sample inference log

Run a drift computation

Summarize an incident (Ollama optional — falls back to rule-based stub)

⚙️ Configuration

🏗️ Architecture

Services

📊 Metrics

🧠 Extended Q&A

Why use C++ for drift detection?

Why Go for ingestion?

Why Postgres locally (not Snowflake)?

Why MLflow?

Why LangChain + Ollama?

Why Kubernetes?

Why Terraform?

🏢 Enterprise Value

🔧 Recent Code Improvements

Running tests locally

Environment variables added

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SentinelAI — Enterprise AI Reliability & Governance Platform

🚀 Quickstart — Docker Compose (recommended)

Send a sample inference log

Run a drift computation

Summarize an incident (Ollama optional — falls back to rule-based stub)

⚙️ Configuration

🏗️ Architecture

Services

📊 Metrics

🧠 Extended Q&A

Why use C++ for drift detection?

Why Go for ingestion?

Why Postgres locally (not Snowflake)?

Why MLflow?

Why LangChain + Ollama?

Why Kubernetes?

Why Terraform?

🏢 Enterprise Value

🔧 Recent Code Improvements

Running tests locally

Environment variables added

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages