Skip to content

Commit 2cf2d39

Browse files
committed
Rewrite README as unified platform hub for all 12 engines
1 parent bcdf8d3 commit 2cf2d39

File tree

1 file changed

+40
-136
lines changed

1 file changed

+40
-136
lines changed

README.md

Lines changed: 40 additions & 136 deletions
Original file line numberDiff line numberDiff line change
@@ -1,167 +1,71 @@
1-
# llmguardrail
1+
# llm-sentry
22

33
**One install. 12 diagnostic engines. Your AI pipeline's immune system.**
44

5-
Stop guessing why your LLM app is broken. `llmguardrail` runs 12 specialized diagnostic engines across your entire AI stack — RAG pipelines, agent loops, chain-of-thought reasoning, prompt stability, model swaps, and output drift — in a single scan.
5+
Stop guessing why your LLM app is broken. `llm-sentry` runs 12 specialized diagnostic engines across your entire AI stack — RAG pipelines, agent loops, chain-of-thought reasoning, prompt stability, model migrations, and output drift — in a single scan.
66

77
```bash
88
pip install llm-sentry
99
```
1010

11-
## The Problem
12-
13-
You have an LLM in production. It breaks. You don't know why.
14-
15-
- Is it retrieval? Generation? Both?
16-
- Is your agent stuck in a loop?
17-
- Did your last prompt change break something?
18-
- Is your chain-of-thought reasoning actually coherent?
19-
- Did swapping models change behavior?
20-
21-
You're left stitching together 5+ tools, each with different APIs, different install processes, different report formats.
22-
23-
**llmguardrail** gives you one API, one report, one answer.
24-
25-
## Quick Start
26-
2711
```python
2812
import llmguardrail as lg
2913

30-
# Run a full diagnostic scan
3114
report = lg.scan(
32-
pipeline_name="my_rag_app",
33-
checks=["rag", "coherence", "agents"],
34-
rag_queries=[
35-
("What is the return policy?",
36-
[("Returns accepted within 30 days", 0.95)],
37-
"Our return policy allows returns within 30 days."),
38-
],
39-
coherence_traces=[
40-
("1. User asked about returns\n2. Policy doc found\n3. Answer generated",
41-
"Returns are allowed within 30 days"),
42-
],
43-
agent_task="answer customer questions",
44-
agent_actions=[
45-
("search_docs", "found 3 results"),
46-
("generate_answer", "answer produced"),
47-
],
15+
pipeline_name="my_app",
16+
checks=["rag", "coherence", "agents", "prompts"],
17+
...
4818
)
49-
5019
print(report.summary())
51-
# Pipeline: my_rag_app
52-
# Health: HEALTHY (85%)
53-
# Checks: 3 run
54-
# [+] rag: healthy (90%)
55-
# [+] coherence: healthy (88%)
56-
# [+] agents: healthy (100%)
5720
```
5821

59-
## 12 Diagnostic Engines
22+
---
6023

61-
| Engine | What it catches | Package |
62-
|--------|----------------|---------|
63-
| **RAG Pathology** | Retrieval miss, poor grounding, context noise (Four Soils) | `rag-pathology` |
64-
| **Chain Probe** | CASCADE fault analysis — finds root cause in multi-step pipelines | `chain-probe` |
65-
| **Agent Patrol** | Futile cycles, oscillation, stall, drift, abandonment in agents | `agent-patrol` |
66-
| **CoT Coherence** | Reasoning gaps, contradictions, unsupported conclusions | `cot-coherence` |
67-
| **Prompt Brittleness** | Prompts that break under paraphrase stress | `prompt-brittleness` |
68-
| **Inject Lock** | Prompt injection vulnerability detection | `inject-lock` |
69-
| **LLM Mutation** | Mutation testing for prompt robustness | `llm-mutation` |
70-
| **Model Parity** | Behavioral drift when swapping models | `model-parity` |
71-
| **Spec Drift** | Output schema violations in production | `spec-drift` |
72-
| **Drift Sentinel** | PR intent vs. actual code drift | `drift-sentinel` |
73-
| **LLM Contract** | Runtime output contract enforcement | `llm-contract` |
74-
| **Context Recall** | Context window position bias auditing | `context-recall` |
24+
## The 12 Diagnostic Engines
7525

76-
Every engine is zero-dependency. No OpenAI key required. No LLM calls to evaluate LLMs.
26+
| # | Engine | What It Detects | Module |
27+
|---|--------|----------------|--------|
28+
| 1 | **RAG Pathology** | Retrieval failures by type and location (Four Soils classification) | `rag_pathology` |
29+
| 2 | **Agent Patrol** | Agent loops, stalls, oscillation, drift, and abandonment | `agent_patrol` |
30+
| 3 | **Chain Probe** | Root-cause step in multi-step pipeline failures (CASCADE analysis) | `chain_probe` |
31+
| 4 | **Context Lens** | Lost-in-the-middle — LLM failing to retrieve from context positions | `context_lens` |
32+
| 5 | **LLM Mutation** | Gaps in prompt test coverage via semantic mutation testing | `llm_mutation` |
33+
| 6 | **Prompt Shield** | Brittle prompts that break under paraphrase stress testing | `prompt_shield` |
34+
| 7 | **LLM Contract** | Behavioral contract violations on LLM function calls | `llm_contract` |
35+
| 8 | **Drift Guard** | PR intent drift — code changes that don't match stated purpose | `drift_guard` |
36+
| 9 | **Spec Drift** | Semantic specification drift even when structural validation passes | `spec_drift` |
37+
| 10 | **Prompt Lock** | Prompt regression detection with judge calibration and CI gate | `prompt_lock` |
38+
| 11 | **Model Parity** | Behavioral divergence when swapping LLM providers (7 dimensions) | `model_parity` |
39+
| 12 | **CoT Coherence** | Silent incoherence in chain-of-thought reasoning between steps | `cot_coherence` |
7740

78-
## Use Individual Engines
41+
---
7942

80-
```python
81-
# RAG diagnosis
82-
from llmguardrail.rag import RAGDiagnoser, RAGQuery, Chunk
83-
84-
diagnoser = RAGDiagnoser("my_pipeline")
85-
diagnosis = diagnoser.diagnose_query(RAGQuery(
86-
query="What is GDP?",
87-
retrieved_chunks=[Chunk("GDP is 6%", score=0.9)],
88-
generated_answer="GDP is 6%",
89-
))
90-
print(diagnosis.soil_type) # SoilType.GOOD
91-
92-
# Agent monitoring
93-
from llmguardrail.agents import PatrolMonitor
94-
95-
monitor = PatrolMonitor(task_description="answer questions")
96-
report = monitor.observe(action="searching", result="no results")
97-
98-
# Chain fault analysis
99-
from llmguardrail.chains import Pipeline
100-
101-
pipeline = Pipeline("my_chain")
102-
# ... add steps with @pipeline.probe ...
103-
result = pipeline.cascade(initial_input="data")
104-
print(result.root_cause_step)
105-
```
43+
## Why One Platform?
10644

107-
## Scan History & Trends
45+
Most teams discover LLM failures in production, then stitch together 5+ tools with different APIs, install processes, and report formats.
10846

109-
```python
110-
from llmguardrail import ScanStore, scan
47+
**llm-sentry** gives you:
48+
- **One install**`pip install llm-sentry`
49+
- **One API**`lg.scan()` with check selection
50+
- **One report** — unified diagnostics across all failure modes
51+
- **One CI gate**`llm-sentry ci` blocks merges on regressions
11152

112-
report = scan(pipeline_name="prod", checks=["rag", "coherence"])
53+
---
11354

114-
store = ScanStore("guardrail.db")
115-
store.save(report)
55+
## Use Cases
11656

117-
# Track health over time
118-
trend = store.trend("prod", last_n=10)
119-
print(trend) # [0.72, 0.75, 0.78, 0.81, ...]
57+
- **RAG apps**: retrieval quality + generation faithfulness + context window coverage
58+
- **Agent systems**: loop detection + drift monitoring + abandonment alerts
59+
- **Prompt engineering**: brittleness testing + regression gating + mutation coverage
60+
- **Model migrations**: behavioral parity certification across 7 dimensions
61+
- **Production monitoring**: continuous semantic drift detection + contract enforcement
12062

121-
# Full history
122-
history = store.get_history("prod")
123-
```
124-
125-
## CLI
63+
---
12664

127-
```bash
128-
# Check which engines are installed
129-
llmguardrail status
130-
131-
# View scan history
132-
llmguardrail history --pipeline my_app
133-
```
134-
135-
## Custom Checks
136-
137-
```python
138-
from llmguardrail import register_check, CheckResult, HealthStatus, scan
139-
140-
def my_custom_check(**kwargs) -> CheckResult:
141-
# Your custom diagnostic logic
142-
score = run_my_diagnostics()
143-
return CheckResult(
144-
check_name="my_check",
145-
score=score,
146-
status=HealthStatus.from_score(score),
147-
recommendations=["Fix X"] if score < 0.7 else [],
148-
)
149-
150-
register_check("my_check", my_custom_check)
151-
report = scan(checks=["rag", "my_check"])
152-
```
65+
## Requirements
15366

154-
## Why not RAGAS / DeepEval / TruLens?
155-
156-
| | llmguardrail | RAGAS | DeepEval | TruLens |
157-
|---|---|---|---|---|
158-
| Needs OpenAI key | No | Yes | Yes | Yes |
159-
| Diagnoses failure type | Yes | No (just scores) | No | No |
160-
| Agent monitoring | Yes | No | No | No |
161-
| Chain fault analysis | Yes | No | No | No |
162-
| Prompt robustness | Yes | No | Partial | No |
163-
| Zero dependencies | Yes | No | No | No |
164-
| Works offline | Yes | No | No | No |
67+
- Python 3.10+
68+
- Zero required dependencies (LLM-powered checks optional)
16569

16670
## License
16771

0 commit comments

Comments
 (0)