docs: refresh READMEs with updated stats and new features

imran-siddique · Copilot · web-flow · commit b5478bb991b6 · 2026-03-15T10:46:39.000-07:00
- Root: add By The Numbers section (6,100+ tests, 7 packages, 12+ integrations)
- Root: update Known Limitations (observability now implemented, behavioral detection done)
- Root: add NIST RFI mapping and benchmarks to Documentation section
- agent-os: update test count 1,680+ -&gt; 2,573+
- agent-mesh: update test count 1,300+ -&gt; 1,669+
- agent-hypervisor: update test count 457+ -&gt; 644+, add behavioral anomaly detection
- agent-sre: update test count 1,089+ -&gt; 1,240+, add PagerDuty/Grafana/OTel to roadmap
- agent-sre: update observability platform count 11 -&gt; 13

Co-authored-by: Copilot &lt;223556219+Copilot@users.noreply.github.com&gt;
diff --git a/README.md b/README.md
@@ -24,6 +24,17 @@
 > composes with container/VM isolation for defense-in-depth.
 > See [Architecture Notes](#architecture-notes) for details.
 
+## By The Numbers
+
+| Metric | Value |
+|---|---|
+| **Tests Passing** | 6,100+ across all packages |
+| **Packages** | 7 (kernel, trust mesh, runtime, SRE, compliance, marketplace, lightning) |
+| **Framework Integrations** | 12+ (LangChain, CrewAI, AutoGen, Dify, LlamaIndex, OpenAI Agents, Google ADK, …) |
+| **Policy Eval Latency** | 0.012 ms p50 — [full benchmarks](BENCHMARKS.md) |
+| **OWASP Coverage** | 10/10 Agentic Top 10 risks |
+| **Observability** | Prometheus, OpenTelemetry, PagerDuty, Grafana |
+
 ## Why Agent Governance?
 
 AI agent frameworks (LangChain, AutoGen, CrewAI, Google ADK, OpenAI Agents SDK) enable agents to call tools, spawn sub-agents, and take real-world actions — but provide **no runtime security model**. The Agent Governance Toolkit provides:
@@ -173,8 +184,10 @@ Full methodology, per-adapter breakdowns, and memory profiling: **[BENCHMARKS.md
 ## Documentation
 
 - **[Azure Deployment Guides](docs/deployment/README.md)** — AKS, Azure AI Foundry, Container Apps, OpenClaw sidecar
+- **[NIST RFI Mapping](docs/nist-rfi-mapping.md)** — Question-by-question mapping to NIST AI Agent Security RFI (2026-00206)
 - [OWASP Compliance Mapping](docs/OWASP-COMPLIANCE.md)
 - [CSA Agentic Trust Framework Mapping](docs/CSA-ATF-PROPOSAL.md)
+- [Performance Benchmarks](BENCHMARKS.md)
 - [Changelog](CHANGELOG.md)
 - [Contributing Guide](CONTRIBUTING.md)
 - [Security Policy](SECURITY.md)
@@ -218,10 +231,10 @@ Policy enforcement benchmarks are measured on a **30-scenario test suite** cover
 
 ### Known Limitations & Roadmap
 
-- **ASI-10 Behavioral Detection**: Fully implemented in Agent SRE — tool-call frequency analysis (z-score spike detection), action entropy scoring, and capability profile violation detection. See [`packages/agent-sre/src/agent_sre/anomaly/`](packages/agent-sre/src/agent_sre/anomaly/) (72 tests passing)
+- **ASI-10 Behavioral Detection**: Fully implemented — tool-call frequency analysis (z-score spike detection), action entropy scoring, capability profile violation detection, and behavioral anomaly detection with ring-distance amplification. See [`packages/agent-sre/src/agent_sre/anomaly/`](packages/agent-sre/src/agent_sre/anomaly/) and [`packages/agent-hypervisor/src/hypervisor/rings/breach_detector.py`](packages/agent-hypervisor/src/hypervisor/rings/breach_detector.py)
 - **Audit Trail Integrity**: Current hash-chain is in-process; external append-only log integration is planned
 - **Framework Integration Depth**: Current adapters wrap agent execution at the function level; deeper hooks into framework-native tool dispatch and sub-agent spawning are planned
-- **Observability**: OpenTelemetry integration for policy decision tracing is planned
+- **Observability**: Prometheus metrics collection, OpenTelemetry span export, PagerDuty alerting, and Grafana dashboards are implemented. See [`packages/agent-hypervisor/src/hypervisor/observability/`](packages/agent-hypervisor/src/hypervisor/observability/) and [`packages/agent-sre/src/agent_sre/integrations/`](packages/agent-sre/src/agent_sre/integrations/)
 
 ## Contributing
 
diff --git a/packages/agent-hypervisor/README.md b/packages/agent-hypervisor/README.md
@@ -22,7 +22,7 @@
 [![PyPI](https://img.shields.io/pypi/v/agent-hypervisor)](https://pypi.org/project/agent-hypervisor/)
 [![Downloads](https://img.shields.io/pypi/dm/agent-hypervisor)](https://pypi.org/project/agent-hypervisor/)
 [![OWASP](https://img.shields.io/badge/OWASP_Agentic_Top_10-ASI--05,_10-brightgreen)](https://github.com/microsoft/agent-governance-toolkit/blob/master/docs/OWASP-COMPLIANCE.md)
-[![Tests](https://img.shields.io/badge/tests-457%20passing-brightgreen)](https://github.com/microsoft/agent-governance-toolkit)
+[![Tests](https://img.shields.io/badge/tests-644%20passing-brightgreen)](https://github.com/microsoft/agent-governance-toolkit)
 [![Benchmark](https://img.shields.io/badge/latency-268%CE%BCs%20pipeline-orange)](benchmarks/)
 [![Discussions](https://img.shields.io/github/discussions/microsoft/agent-governance-toolkit)](https://github.com/microsoft/agent-governance-toolkit/discussions)
 
@@ -50,7 +50,7 @@
 
 <table>
 <tr>
-<td align="center"><h3>457+</h3><sub>Tests Passing</sub></td>
+<td align="center"><h3>644+</h3><sub>Tests Passing</sub></td>
 <td align="center"><h3>4</h3><sub>Execution Rings<br/>(Ring 0–3)</sub></td>
 <td align="center"><h3>268μs</h3><sub>Full Governance<br/>Pipeline Latency</sub></td>
 <td align="center"><h3>v2.0</h3><sub>Saga Compensation<br/>Kill Switch · Rate Limits</sub></td>
@@ -587,7 +587,7 @@ Forensic-grade delta trails — semantic diffs, hash-chained entries, summary co
 <td width="50%">
 
 ### 📡 Observability
-Structured event bus emits typed events for every action. Causal trace IDs with full delegation tree encoding. Version counters for causal consistency.
+Structured event bus emits typed events for every action. Causal trace IDs with full delegation tree encoding. Version counters for causal consistency. **Prometheus metrics collector** for ring transitions and breaches. **OpenTelemetry span exporter** for saga-to-span mapping with distributed trace context.
 
 </td>
 </tr>
@@ -605,7 +605,7 @@ Ring 2 (Standard)   — Reversible actions — requires eff_score > 0.60
 Ring 3 (Sandbox)    — Read-only / research — default for unknown agents
 ```
 
-**v2.0 additions:** Dynamic ring elevation (sudo with TTL), ring breach detection with circuit breakers, ring inheritance for spawned agents.
+**v2.0 additions:** Dynamic ring elevation (sudo with TTL), ring breach detection with circuit breakers, ring inheritance for spawned agents, **behavioral anomaly detection** with sliding-window rate analysis and ring-distance amplification.
 
 ### 🔄 Saga Orchestrator — Deep Dive
 
@@ -659,7 +659,7 @@ pip install agent-hypervisor
 | `hypervisor.integrations` | Nexus, Verification, IATP cross-module adapters | -- |
 | **Integration** | End-to-end lifecycle, edge cases, security | **24** |
 | **Scenarios** | Cross-module governance pipelines (7 suites) | **18** |
-| **Total** | | **457** |
+| **Total** | | **644** |
 
 ## Test Suite
 
@@ -728,7 +728,7 @@ graph TB
 | [Agent OS](https://github.com/microsoft/agent-governance-toolkit) | Policy enforcement kernel | 1,500+ tests |
 | [Agent Mesh](https://github.com/microsoft/agent-governance-toolkit) | Cryptographic trust network | 1,400+ tests |
 | [Agent SRE](https://github.com/microsoft/agent-governance-toolkit) | SLO, chaos, cost guardrails | 1,070+ tests |
-| **Agent Hypervisor** | Session isolation & governance runtime | 457+ tests |
+| **Agent Hypervisor** | Session isolation & governance runtime | 644+ tests |
 
 ## 🗺️ Roadmap
 
diff --git a/packages/agent-mesh/README.md b/packages/agent-mesh/README.md
@@ -62,7 +62,7 @@
 
 <table>
 <tr>
-<td align="center"><h3>1,300+</h3><sub>Tests Passing</sub></td>
+<td align="center"><h3>1,669+</h3><sub>Tests Passing</sub></td>
 <td align="center"><h3>6</h3><sub>Framework Integrations</sub></td>
 <td align="center"><h3>170K+</h3><sub>Combined Stars of<br/>Integrated Projects</sub></td>
 <td align="center"><h3>4</h3><sub>Protocol Bridges<br/>(A2A · MCP · IATP · AI Card)</sub></td>
diff --git a/packages/agent-os/README.md b/packages/agent-os/README.md
@@ -65,7 +65,7 @@
 
 <table>
 <tr>
-<td align="center"><h3>1,680+</h3><sub>Tests Passing</sub></td>
+<td align="center"><h3>2,573+</h3><sub>Tests Passing</sub></td>
 <td align="center"><h3>12</h3><sub>Framework Integrations</sub></td>
 <td align="center"><h3>170K+</h3><sub>Combined Stars of<br/>Integrated Projects</sub></td>
 <td align="center"><h3>&lt;0.1ms p99</h3><sub>Governance Latency<br/><a href="benchmarks/results/BENCHMARKS.md">Benchmarks</a></sub></td>
diff --git a/packages/agent-sre/README.md b/packages/agent-sre/README.md
@@ -50,9 +50,9 @@ Reliability layer across **170K+ combined GitHub stars** of integrated projects
 
 <table>
 <tr>
-<td align="center"><h3>1,089+</h3><sub>Tests Passing</sub></td>
+<td align="center"><h3>1,240+</h3><sub>Tests Passing</sub></td>
 <td align="center"><h3>12+</h3><sub>Framework Adapters<br/><sub>LangChain · CrewAI · AutoGen<br/>LangGraph · Dify · more</sub></sub></td>
-<td align="center"><h3>11</h3><sub>Observability Platforms<br/><sub>Langfuse · LangSmith · Arize<br/>Datadog · Prometheus · more</sub></sub></td>
+<td align="center"><h3>13</h3><sub>Observability Platforms<br/><sub>Langfuse · LangSmith · Arize<br/>Datadog · Prometheus · PagerDuty<br/>Grafana · OTel · more</sub></sub></td>
 <td align="center"><h3>OpenTelemetry</h3><sub>Native OTLP Export</sub></td>
 </tr>
 <tr>
@@ -479,7 +479,7 @@ agent-sre/
 ├── operator/              # Kubernetes CRDs (AgentSLO, CostBudget)
 ├── .github/actions/       # GitHub Actions (canary deployment)
 ├── examples/              # 4 runnable demos
-├── tests/                 # 1,089 tests
+├── tests/                 # 1,240 tests
 ├── docs/                  # Getting started, concepts, integration guide
 └── specs/                 # SLO templates (coming soon)
 ```
@@ -512,7 +512,7 @@ Agent SRE tells you *if it was within budget* and *what to do about it*.
 
 ## Status & Maturity
 
-### ✅ Fully Implemented (20,000+ lines, 1,089 tests)
+### ✅ Fully Implemented (20,000+ lines, 1,240 tests)
 
 | Component | Status | Description |
 |---|---|---|
@@ -650,7 +650,8 @@ See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
 | Quarter | Milestone |
 |---------|-----------|
 | **Q1 2026** | ✅ Core 7 engines, OTel integration, Prometheus dashboards |
-| **Q2 2026** | Kubernetes operator, PagerDuty/OpsGenie integration |
+| **Q1 2026** | ✅ PagerDuty alerting, Grafana SLO dashboards, org budget enforcement, bounded ErrorBudget events |
+| **Q2 2026** | Kubernetes operator, OpsGenie integration |
 | **Q3 2026** | ML-powered anomaly detection, auto-remediation |
 | **Q4 2026** | Managed cloud service, SOC2 compliance automation |