Rewrite README as unified platform hub for all 12 engines

Rowusuduah · Rowusuduah · commit 2cf2d39a9c6a · 2026-03-28T00:10:46.000-04:00
diff --git a/README.md b/README.md
@@ -1,167 +1,71 @@
-# llmguardrail
+# llm-sentry
 
 **One install. 12 diagnostic engines. Your AI pipeline's immune system.**
 
-Stop guessing why your LLM app is broken. `llmguardrail` runs 12 specialized diagnostic engines across your entire AI stack — RAG pipelines, agent loops, chain-of-thought reasoning, prompt stability, model swaps, and output drift — in a single scan.
+Stop guessing why your LLM app is broken. `llm-sentry` runs 12 specialized diagnostic engines across your entire AI stack — RAG pipelines, agent loops, chain-of-thought reasoning, prompt stability, model migrations, and output drift — in a single scan.
 
 ```bash
 pip install llm-sentry
 ```
 
-## The Problem
-
-You have an LLM in production. It breaks. You don't know why.
-
-- Is it retrieval? Generation? Both?
-- Is your agent stuck in a loop?
-- Did your last prompt change break something?
-- Is your chain-of-thought reasoning actually coherent?
-- Did swapping models change behavior?
-
-You're left stitching together 5+ tools, each with different APIs, different install processes, different report formats.
-
-**llmguardrail** gives you one API, one report, one answer.
-
-## Quick Start
-
 ```python
 import llmguardrail as lg
 
-# Run a full diagnostic scan
 report = lg.scan(
-    pipeline_name="my_rag_app",
-    checks=["rag", "coherence", "agents"],
-    rag_queries=[
-        ("What is the return policy?",
-         [("Returns accepted within 30 days", 0.95)],
-         "Our return policy allows returns within 30 days."),
-    ],
-    coherence_traces=[
-        ("1. User asked about returns\n2. Policy doc found\n3. Answer generated",
-         "Returns are allowed within 30 days"),
-    ],
-    agent_task="answer customer questions",
-    agent_actions=[
-        ("search_docs", "found 3 results"),
-        ("generate_answer", "answer produced"),
-    ],
+    pipeline_name="my_app",
+    checks=["rag", "coherence", "agents", "prompts"],
+    ...
 )
-
 print(report.summary())
-# Pipeline: my_rag_app
-# Health: HEALTHY (85%)
-# Checks: 3 run
-#   [+] rag: healthy (90%)
-#   [+] coherence: healthy (88%)
-#   [+] agents: healthy (100%)
 ```
 
-## 12 Diagnostic Engines
+---
 
-| Engine | What it catches | Package |
-|--------|----------------|---------|
-| **RAG Pathology** | Retrieval miss, poor grounding, context noise (Four Soils) | `rag-pathology` |
-| **Chain Probe** | CASCADE fault analysis — finds root cause in multi-step pipelines | `chain-probe` |
-| **Agent Patrol** | Futile cycles, oscillation, stall, drift, abandonment in agents | `agent-patrol` |
-| **CoT Coherence** | Reasoning gaps, contradictions, unsupported conclusions | `cot-coherence` |
-| **Prompt Brittleness** | Prompts that break under paraphrase stress | `prompt-brittleness` |
-| **Inject Lock** | Prompt injection vulnerability detection | `inject-lock` |
-| **LLM Mutation** | Mutation testing for prompt robustness | `llm-mutation` |
-| **Model Parity** | Behavioral drift when swapping models | `model-parity` |
-| **Spec Drift** | Output schema violations in production | `spec-drift` |
-| **Drift Sentinel** | PR intent vs. actual code drift | `drift-sentinel` |
-| **LLM Contract** | Runtime output contract enforcement | `llm-contract` |
-| **Context Recall** | Context window position bias auditing | `context-recall` |
+## The 12 Diagnostic Engines
 
-Every engine is zero-dependency. No OpenAI key required. No LLM calls to evaluate LLMs.
+| # | Engine | What It Detects | Module |
+|---|--------|----------------|--------|
+| 1 | **RAG Pathology** | Retrieval failures by type and location (Four Soils classification) | `rag_pathology` |
+| 2 | **Agent Patrol** | Agent loops, stalls, oscillation, drift, and abandonment | `agent_patrol` |
+| 3 | **Chain Probe** | Root-cause step in multi-step pipeline failures (CASCADE analysis) | `chain_probe` |
+| 4 | **Context Lens** | Lost-in-the-middle — LLM failing to retrieve from context positions | `context_lens` |
+| 5 | **LLM Mutation** | Gaps in prompt test coverage via semantic mutation testing | `llm_mutation` |
+| 6 | **Prompt Shield** | Brittle prompts that break under paraphrase stress testing | `prompt_shield` |
+| 7 | **LLM Contract** | Behavioral contract violations on LLM function calls | `llm_contract` |
+| 8 | **Drift Guard** | PR intent drift — code changes that don't match stated purpose | `drift_guard` |
+| 9 | **Spec Drift** | Semantic specification drift even when structural validation passes | `spec_drift` |
+| 10 | **Prompt Lock** | Prompt regression detection with judge calibration and CI gate | `prompt_lock` |
+| 11 | **Model Parity** | Behavioral divergence when swapping LLM providers (7 dimensions) | `model_parity` |
+| 12 | **CoT Coherence** | Silent incoherence in chain-of-thought reasoning between steps | `cot_coherence` |
 
-## Use Individual Engines
+---
 
-```python
-# RAG diagnosis
-from llmguardrail.rag import RAGDiagnoser, RAGQuery, Chunk
-
-diagnoser = RAGDiagnoser("my_pipeline")
-diagnosis = diagnoser.diagnose_query(RAGQuery(
-    query="What is GDP?",
-    retrieved_chunks=[Chunk("GDP is 6%", score=0.9)],
-    generated_answer="GDP is 6%",
-))
-print(diagnosis.soil_type)  # SoilType.GOOD
-
-# Agent monitoring
-from llmguardrail.agents import PatrolMonitor
-
-monitor = PatrolMonitor(task_description="answer questions")
-report = monitor.observe(action="searching", result="no results")
-
-# Chain fault analysis
-from llmguardrail.chains import Pipeline
-
-pipeline = Pipeline("my_chain")
-# ... add steps with @pipeline.probe ...
-result = pipeline.cascade(initial_input="data")
-print(result.root_cause_step)
-```
+## Why One Platform?
 
-## Scan History & Trends
+Most teams discover LLM failures in production, then stitch together 5+ tools with different APIs, install processes, and report formats.
 
-```python
-from llmguardrail import ScanStore, scan
+**llm-sentry** gives you:
+- **One install** — `pip install llm-sentry`
+- **One API** — `lg.scan()` with check selection
+- **One report** — unified diagnostics across all failure modes
+- **One CI gate** — `llm-sentry ci` blocks merges on regressions
 
-report = scan(pipeline_name="prod", checks=["rag", "coherence"])
+---
 
-store = ScanStore("guardrail.db")
-store.save(report)
+## Use Cases
 
-# Track health over time
-trend = store.trend("prod", last_n=10)
-print(trend)  # [0.72, 0.75, 0.78, 0.81, ...]
+- **RAG apps**: retrieval quality + generation faithfulness + context window coverage
+- **Agent systems**: loop detection + drift monitoring + abandonment alerts
+- **Prompt engineering**: brittleness testing + regression gating + mutation coverage
+- **Model migrations**: behavioral parity certification across 7 dimensions
+- **Production monitoring**: continuous semantic drift detection + contract enforcement
 
-# Full history
-history = store.get_history("prod")
-```
-
-## CLI
+---
 
-```bash
-# Check which engines are installed
-llmguardrail status
-
-# View scan history
-llmguardrail history --pipeline my_app
-```
-
-## Custom Checks
-
-```python
-from llmguardrail import register_check, CheckResult, HealthStatus, scan
-
-def my_custom_check(**kwargs) -> CheckResult:
-    # Your custom diagnostic logic
-    score = run_my_diagnostics()
-    return CheckResult(
-        check_name="my_check",
-        score=score,
-        status=HealthStatus.from_score(score),
-        recommendations=["Fix X"] if score < 0.7 else [],
-    )
-
-register_check("my_check", my_custom_check)
-report = scan(checks=["rag", "my_check"])
-```
+## Requirements
 
-## Why not RAGAS / DeepEval / TruLens?
-
-| | llmguardrail | RAGAS | DeepEval | TruLens |
-|---|---|---|---|---|
-| Needs OpenAI key | No | Yes | Yes | Yes |
-| Diagnoses failure type | Yes | No (just scores) | No | No |
-| Agent monitoring | Yes | No | No | No |
-| Chain fault analysis | Yes | No | No | No |
-| Prompt robustness | Yes | No | Partial | No |
-| Zero dependencies | Yes | No | No | No |
-| Works offline | Yes | No | No | No |
+- Python 3.10+
+- Zero required dependencies (LLM-powered checks optional)
 
 ## License