Skip to content

Commit b0fb63e

Browse files
committed
docs(sdk): scope-honesty README pass for 0.11
- Cut 'Eval traces' section (module removed in 0.11) - Cut governance-sdk/eval-trace, eval-scorer, eval-types from Export Paths - Cut governance-sdk/plugins/mcp-annotations from Export Paths - Demote behavioral-scorer mention; flag dynamic trust scoring as future work - Demote metrics, otel-hooks under 'Optional observability primitives' header with explicit note that they are NOT OpenInference-compliant and a real OTel exporter is on the roadmap - Move runWithOutcome to a one-liner under the Export Paths block - Update Export Paths header: 49 → 44 targeted exports - Update Project Stats: 1,348 → 1,328 tests; 49 → 44 export paths - Replace stale 'Eval is in-memory' bullet in 'What this is NOT' with two sharper bullets: no built-in observability/eval pipeline; no built-in eval store - Adapter footer: 10 featured + 3 specialty (was 4 before mcp-annotations cut) - CHANGELOG: full 0.11.0 entry covering removals, demotions, migration
1 parent 500c832 commit b0fb63e

2 files changed

Lines changed: 82 additions & 47 deletions

File tree

README.md

Lines changed: 28 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,14 @@ detection — nothing more. To pre-empt scope questions:
5454
≈ 0.48 on the 6,931-sample LIB corpus. Layer in an ML classifier via the
5555
`InjectionClassifier` interface for production coverage.
5656
- **Compliance mapping is self-assessment**, not legal advice or certification.
57-
- **Eval is in-memory**, not a durable eval store.
57+
- **No built-in observability or eval pipeline.** The `metrics` and
58+
`otel-hooks` exports produce passive in-memory data structures you serialize
59+
to your own monitoring system; they are NOT OpenInference-compliant and NOT
60+
a replacement for Phoenix, Langfuse, Braintrust, or a real OpenTelemetry
61+
exporter. A first-class OTel/OpenInference exporter is on the roadmap.
62+
- **No built-in eval store.** `gov.eval.*` was removed in 0.11. Use inspect-ai,
63+
PyRIT, Garak, Phoenix, Langfuse, or your harness of choice and route results
64+
into your audit stream via `gov.audit.log()`.
5865
- **Simulator does not replay side effects** — it evaluates policy outcomes
5966
against synthetic scenarios, it does not execute tools.
6067
- **`enforce()` does not hash-chain by default** — opt in with
@@ -208,11 +215,12 @@ getGovernanceLevel(assessment.compositeScore);
208215
// => { level: 4, label: 'Certified', description: '...' }
209216
```
210217

211-
Behavioral signals (block rate, injection hits, approval misses) can be
212-
fed in via `behavioral-scorer.computeBehavioralAdjustments()` so the score
213-
reflects how an agent *has* behaved, not just its configured posture. This
214-
is an opt-in call — the SDK does not automatically ingest live audit
215-
events; you query your audit stream and pass results to the adjuster.
218+
Behavioral signals (block rate, injection hits, approval misses) are
219+
available via the optional `behavioral-scorer` module — feed them in to
220+
adjust the score against how the agent *has* behaved, not just its
221+
configured posture. This is opt-in and not wired by default; we plan to
222+
promote dynamic trust scoring as a first-class feature in a future
223+
release.
216224

217225
**Weight rationale + inflation risk**: the default weights
218226
(identity/permissions 1.5; guardrails 1.3; observability 1.2;
@@ -457,32 +465,6 @@ const result = await simulateFleetPolicy(gov, scenarios);
457465
// => { fleetSummary: { agentsAffected: 11, blockRate: 0.12 }, results: [...] }
458466
```
459467

460-
### Eval traces
461-
462-
Capture agent operation traces (spans, tool calls, LLM invocations) into an
463-
in-memory collector, retrieve them per-agent, and pipe them into your own
464-
metric evaluator. The SDK does **not** ship a built-in LLM-as-judge; metric
465-
generation is your responsibility (wire in your Claude/OpenAI/local model of
466-
choice).
467-
468-
```typescript
469-
import { createTraceCollector, submitTrace } from 'governance-sdk/eval-trace';
470-
471-
const traces = createTraceCollector({ maxTraces: 200 });
472-
submitTrace(traces, {
473-
agentId: 'luna',
474-
input: 'What deals closed this week?',
475-
output: '3 deals totaling $45k',
476-
spans: [{ operation: 'tool_call', toolName: 'search', success: true, latencyMs: 120 }],
477-
});
478-
```
479-
480-
For adversarial-LLM / jailbreak testing, use a dedicated harness like
481-
[inspect-ai](https://github.com/UKGovernmentBEIS/inspect_ai),
482-
[PyRIT](https://github.com/Azure/PyRIT), or
483-
[Garak](https://github.com/leondz/garak) and submit results via your own
484-
pipeline.
485-
486468
## Framework Adapters
487469

488470
Governance needs three things to be real: a **point of interception** (we sit
@@ -617,7 +599,7 @@ const middleware = createGovernanceMiddleware(gov, {
617599

618600
## Export Paths
619601

620-
The SDK ships **49 targeted exports** so you can import only what you need:
602+
The SDK ships **44 targeted exports** so you can import only what you need:
621603

622604
```
623605
# Core
@@ -641,7 +623,6 @@ governance-sdk/injection-benchmark LIB — 6.9K-sample benchmark runner
641623
# Audit + identity
642624
governance-sdk/audit-integrity HMAC hash-chain primitives (createIntegrityAudit, verifyAuditIntegrity)
643625
governance-sdk/audit-integrity-verify standalone chain verifier (for offline audit)
644-
governance-sdk/action-recorder runWithOutcome() — record action success/failure into the chain
645626
governance-sdk/agent-identity agent identity tokens
646627
governance-sdk/agent-identity-ed25519 Ed25519 signing + verification
647628
governance-sdk/kill-switch priority-999 emergency halt
@@ -652,23 +633,21 @@ governance-sdk/owasp-agentic OWASP Top 10 for LLMs / Agentic
652633
governance-sdk/nist-ai-rmf NIST AI RMF (Govern/Map/Measure/Manage)
653634
governance-sdk/iso-42001 ISO/IEC 42001 controls
654635
655-
# Eval
656-
governance-sdk/eval-types shared eval types
657-
governance-sdk/eval-scorer trace scoring
658-
governance-sdk/eval-trace trace submission
659-
660-
# Runtime + storage
661-
governance-sdk/events typed event emitter
662-
governance-sdk/metrics in-memory counter / timing snapshots (serialize to your monitoring system)
663-
governance-sdk/otel-hooks OTel-compatible span data (zero OTel deps; wire to your own tracer)
636+
# Storage
664637
governance-sdk/storage-postgres PostgreSQL storage adapter
665638
governance-sdk/storage-postgres-schema schema DDL + migrations
666639
640+
# Optional observability primitives — passive in-memory, host wires to its own
641+
# monitoring; NOT OpenInference-compliant. A real OTel exporter is on the roadmap.
642+
governance-sdk/events typed event emitter
643+
governance-sdk/metrics in-memory counter / timing snapshots
644+
governance-sdk/otel-hooks governance-prefixed span shape (passive — user must wire)
645+
667646
# Scanner + type surface
668647
governance-sdk/scanner-plugins scanner plugin interface
669648
governance-sdk/token-types token type guards
670649
671-
# Framework adapters (10 featured + 4 specialty)
650+
# Framework adapters (10 featured + 3 specialty)
672651
governance-sdk/plugins/mastra
673652
governance-sdk/plugins/mastra-processor
674653
governance-sdk/plugins/vercel-ai
@@ -680,17 +659,19 @@ governance-sdk/plugins/llamaindex
680659
governance-sdk/plugins/mistral
681660
governance-sdk/plugins/ollama
682661
governance-sdk/plugins/mcp
683-
governance-sdk/plugins/mcp-annotations
684662
governance-sdk/plugins/mcp-trust
685663
governance-sdk/plugins/mcp-chain-audit
686664
governance-sdk/plugins/bedrock
687665
```
688666

667+
`runWithOutcome()` (a thin helper around `gov.recordOutcome`) is exposed at the
668+
top-level package export — `import { runWithOutcome } from 'governance-sdk'`.
669+
689670
## Project Stats
690671

691672
- **0** runtime dependencies
692-
- **1,348** tests, 0 failures (`npm test`)
693-
- **49** export paths — tree-shakeable, import only what you use
673+
- **1,328** tests, 0 failures (`npm test`)
674+
- **44** export paths — tree-shakeable, import only what you use
694675
- **TypeScript strict mode**, no `any` types in source
695676
- **MIT licensed**
696677

packages/governance/CHANGELOG.md

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,59 @@
11
# Changelog
22

3+
## [0.11.0] - 2026-04-15 — Scope honesty pass 2
4+
5+
This release follows up the 0.10 cleanup with another round of cuts based on
6+
a feature-by-feature audit against actual `governance-cloud` consumers and
7+
the major competitors (Microsoft `agent-governance-toolkit`, NeMo Guardrails,
8+
Phoenix, Langfuse, Braintrust). Removes 5 modules with no consumers and no
9+
competitor treating them as load-bearing features, and clarifies framing
10+
around 4 more that ship but were oversold as built-in observability / eval
11+
infrastructure. **1,328 tests** pass with **0 failures**.
12+
13+
### Removed (BREAKING)
14+
15+
- **`governance-sdk/eval-trace`**, **`governance-sdk/eval-scorer`**,
16+
**`governance-sdk/eval-types`**, and the **`gov.eval`** field on
17+
`GovernanceInstance`. The in-memory trace ring buffer + naive
18+
eval-adjustment scoring loop was unused by every audited consumer and
19+
easily mistaken for a real eval pipeline. Use a dedicated harness
20+
(inspect-ai, PyRIT, Garak, Phoenix, Langfuse, Braintrust) and route
21+
results to your audit stream via `gov.audit.log()`.
22+
- **`governance-sdk/plugins/mcp-annotations`** — annotation-rule generator
23+
was a static template, not a runtime governance feature.
24+
- **`governance-sdk/supply-chain-sbom`** — proprietary `LuaAgentSBOM`
25+
capability manifest with no producers or consumers. The CycloneDX
26+
exporter (`governance-sdk/supply-chain-cyclonedx`) and the supply-chain
27+
policy primitive (`governance-sdk/supply-chain`) remain.
28+
- **`GovernMCPConfig.traceCollector`** field — removed alongside `gov.eval`.
29+
Tool-call audit events still fire via `gov.audit`.
30+
31+
### Demoted (no API change — README framing only)
32+
33+
- **`metrics`**, **`otel-hooks`**, **`action-recorder`**,
34+
**`behavioral-scorer`** — remain shipped, but no longer headlined as
35+
built-in observability / eval / dynamic-trust features. A real OTel +
36+
OpenInference exporter and a TrustEngine promotion of behavioral
37+
scoring are on the roadmap.
38+
39+
### Migration
40+
41+
- `gov.eval.submit(...)` callers: stop calling. Eval results should land
42+
in your existing audit stream or your harness's own store.
43+
- `import { generateAgentSBOM } from 'governance-sdk/supply-chain-sbom'`:
44+
if you need an SBOM, use `governance-sdk/supply-chain-cyclonedx` instead
45+
(CycloneDX 1.5, validates against the official schema).
46+
- `import { generateAnnotationRules } from 'governance-sdk/plugins/mcp-annotations'`:
47+
no replacement; build annotation-aware rules directly with `policy-builder`
48+
or `policy-yaml`.
49+
- `traceCollector` in `createGovernedMCP(...)` config: drop the field.
50+
51+
### Stats
52+
53+
- 49 → **44** export paths
54+
- 1,358 → **1,328** tests (drop of 30 from removed test files)
55+
- 0 runtime dependencies (unchanged)
56+
357
## [0.10.0] - 2026-04-15 — Scope honesty release
458

559
This release tightens the SDK to the surface we can defend, and is honest

0 commit comments

Comments
 (0)