SentinelAI is a distributed AI reliability and monitoring platform designed to:
- Detect model drift
- Monitor inference anomalies
- Track LLM hallucination risk
- Provide real-time observability
- Automate AI governance workflows
It combines statistical ML monitoring with LLM-powered incident intelligence.
Prerequisites: Docker 24+ with Compose v2 (
docker compose version).
# 1. Copy environment defaults
cp .env.example .env
# 2. Start the full local stack
docker compose up --buildOnce running, open:
| Service | URL |
|---|---|
| Streamlit Dashboard | http://localhost:8501 |
| Prometheus | http://localhost:9090 |
| Grafana (admin / admin) | http://localhost:3000 |
| Ingestion API | http://localhost:8080 |
| Drift Engine API | http://localhost:7070 |
| LLM Guard API | http://localhost:8000 |
curl -X POST http://localhost:8080/log \
-H "Content-Type: application/json" \
-d '{"model_id":"demo","model_version":"v1","latency_ms":120,"tokens_in":32,"tokens_out":64,"status":"ok"}'curl -X POST http://localhost:7070/drift \
-H "Content-Type: application/json" \
-d '{"model_id":"demo","feature_name":"latency","expected":[0.2,0.3,0.25,0.25],"actual":[0.1,0.35,0.30,0.25]}'curl -X POST http://localhost:8000/summarize \
-H "Content-Type: application/json" \
-d '{"log_data":"PSI 0.35 on latency feature, model demo v1","persist":false}'All configuration is via environment variables. Copy .env.example to .env and adjust.
| Variable | Default | Description |
|---|---|---|
WAREHOUSE_MODE |
postgres |
postgres (local) or snowflake |
DATABASE_URL |
Postgres DSN | Full Postgres connection string |
POSTGRES_USER |
sentinel |
Postgres user |
POSTGRES_PASSWORD |
sentinel |
Postgres password |
POSTGRES_DB |
sentinel |
Postgres database |
OLLAMA_HOST |
http://ollama:11434 |
Ollama endpoint (optional) |
LLM_MODEL |
llama2 |
LLM model name |
Snowflake (optional): set WAREHOUSE_MODE=snowflake and fill in SNOWFLAKE_ACCOUNT, SNOWFLAKE_USER, SNOWFLAKE_PASSWORD, SNOWFLAKE_DATABASE, SNOWFLAKE_SCHEMA, SNOWFLAKE_WAREHOUSE.
User β Go Ingestion API (8080) β Postgres (local) / Snowflake (optional)
β
Drift Engine C++ (7070)
β
LLM Guard Python (8000)
β
Streamlit Dashboard (8501)
β
Prometheus (9090) + Grafana (3000)
| Service | Language | Port | Description |
|---|---|---|---|
ingestion-service |
Go | 8080 | Receives inference logs, writes to warehouse |
drift-engine |
C++ + Python | 7070 | PSI/KS drift detection |
llm-guard |
Python | 8000 | LLM-powered incident summarization |
streamlit-dashboard |
Python | 8501 | Control plane UI |
postgres |
β | 5432 | Local warehouse (default) |
prometheus |
β | 9090 | Metrics scraping |
grafana |
β | 3000 | Dashboards |
| Metric | Value |
|---|---|
| PSI Detection Threshold | 0.20 |
| P95 API Latency | 180ms |
| Throughput | 150 RPS |
| Drift Engine Compute | <2ms |
| LLM Summarization | ~1.2s |
To achieve sub-millisecond statistical scoring at scale.
Go provides efficient concurrency and low-latency HTTP handling.
Postgres is free, runs in Docker, and supports the same SQL schema. Switch to WAREHOUSE_MODE=snowflake when you're ready to push to production.
Experiment tracking, reproducibility, and version control.
LLM-powered root cause summarization and RAG over historical incidents.
Horizontal scaling and production-grade orchestration.
Reproducible infrastructure as code.
SentinelAI demonstrates:
- AI system lifecycle management
- Drift monitoring
- MLOps integration
- Distributed systems engineering
- Cloud-native architecture
- LLM augmentation
- Observability & metrics-driven design
The following fixes were applied to improve reliability, correctness, and security:
| File | Issue | Fix |
|---|---|---|
api/core/model.py (new) |
core.model module was missing, crashing on import |
Created SentinelModel (two-layer MLP) as a proper PyTorch module |
api/__init__.py (new) |
Package not importable as api.* |
Added package init file |
api/inference.py |
from core.model import β¦ caused ModuleNotFoundError |
Updated to from api.core.model import SentinelModel |
api/main.py |
from core.inference import β¦ caused ModuleNotFoundError; missing GET / route |
Updated import to from api.inference import run_inference; added root route |
api/auth.py |
Hardcoded admin/admin credentials |
Reads API_USERNAME / API_PASSWORD from environment; rejects auth when API_PASSWORD is unset |
api/routes/inference.py |
Model loaded at import time (blocks startup, crashes without GPU/HuggingFace access); device_map="auto" forced CUDA |
Lazy-loads model on first request; model name configurable via LLM_MODEL_NAME env var; CPU fallback added |
backend/app/main.py |
.cuda() called unconditionally (crashes on CPU-only hosts); MLflow start_run() ran at module level (import fails if MLflow unreachable) |
Added cpu/cuda device selection; wrapped MLflow block in try/except |
llm-guard/app.py |
except Exception: pass silently swallowed DB errors |
Replaced with logger.exception(β¦) + conn.rollback() |
tests/conftest.py |
from app.main import app β wrong package path, caused all tests to fail |
Fixed to from api.main import app |
requirements.txt |
Missing httpx (required by FastAPI TestClient) and pydantic |
Added both packages |
pip install -r requirements.txt
pytest tests/ -v| Variable | Default | Description |
|---|---|---|
API_USERNAME |
admin |
Login username for the API auth endpoint |
API_PASSWORD |
(unset β auth disabled until set) | Login password; must be set to enable auth |
LLM_MODEL_NAME |
meta-llama/Meta-Llama-3-8B |
HuggingFace model used by the inference route |
- Add automated retraining pipeline
- Add Shadow Model Deployment
- Add Cost Optimization Engine
- Add Hallucination Classifier Model
