@@ -54,7 +54,14 @@ detection — nothing more. To pre-empt scope questions:
5454 ≈ 0.48 on the 6,931-sample LIB corpus. Layer in an ML classifier via the
5555 ` InjectionClassifier ` interface for production coverage.
5656- ** Compliance mapping is self-assessment** , not legal advice or certification.
57- - ** Eval is in-memory** , not a durable eval store.
57+ - ** No built-in observability or eval pipeline.** The ` metrics ` and
58+ ` otel-hooks ` exports produce passive in-memory data structures you serialize
59+ to your own monitoring system; they are NOT OpenInference-compliant and NOT
60+ a replacement for Phoenix, Langfuse, Braintrust, or a real OpenTelemetry
61+ exporter. A first-class OTel/OpenInference exporter is on the roadmap.
62+ - ** No built-in eval store.** ` gov.eval.* ` was removed in 0.11. Use inspect-ai,
63+ PyRIT, Garak, Phoenix, Langfuse, or your harness of choice and route results
64+ into your audit stream via ` gov.audit.log() ` .
5865- ** Simulator does not replay side effects** — it evaluates policy outcomes
5966 against synthetic scenarios, it does not execute tools.
6067- ** ` enforce() ` does not hash-chain by default** — opt in with
@@ -208,11 +215,12 @@ getGovernanceLevel(assessment.compositeScore);
208215// => { level: 4, label: 'Certified', description: '...' }
209216```
210217
211- Behavioral signals (block rate, injection hits, approval misses) can be
212- fed in via ` behavioral-scorer.computeBehavioralAdjustments() ` so the score
213- reflects how an agent * has* behaved, not just its configured posture. This
214- is an opt-in call — the SDK does not automatically ingest live audit
215- events; you query your audit stream and pass results to the adjuster.
218+ Behavioral signals (block rate, injection hits, approval misses) are
219+ available via the optional ` behavioral-scorer ` module — feed them in to
220+ adjust the score against how the agent * has* behaved, not just its
221+ configured posture. This is opt-in and not wired by default; we plan to
222+ promote dynamic trust scoring as a first-class feature in a future
223+ release.
216224
217225** Weight rationale + inflation risk** : the default weights
218226(identity/permissions 1.5; guardrails 1.3; observability 1.2;
@@ -457,32 +465,6 @@ const result = await simulateFleetPolicy(gov, scenarios);
457465// => { fleetSummary: { agentsAffected: 11, blockRate: 0.12 }, results: [...] }
458466```
459467
460- ### Eval traces
461-
462- Capture agent operation traces (spans, tool calls, LLM invocations) into an
463- in-memory collector, retrieve them per-agent, and pipe them into your own
464- metric evaluator. The SDK does ** not** ship a built-in LLM-as-judge; metric
465- generation is your responsibility (wire in your Claude/OpenAI/local model of
466- choice).
467-
468- ``` typescript
469- import { createTraceCollector , submitTrace } from ' governance-sdk/eval-trace' ;
470-
471- const traces = createTraceCollector ({ maxTraces: 200 });
472- submitTrace (traces , {
473- agentId: ' luna' ,
474- input: ' What deals closed this week?' ,
475- output: ' 3 deals totaling $45k' ,
476- spans: [{ operation: ' tool_call' , toolName: ' search' , success: true , latencyMs: 120 }],
477- });
478- ```
479-
480- For adversarial-LLM / jailbreak testing, use a dedicated harness like
481- [ inspect-ai] ( https://github.com/UKGovernmentBEIS/inspect_ai ) ,
482- [ PyRIT] ( https://github.com/Azure/PyRIT ) , or
483- [ Garak] ( https://github.com/leondz/garak ) and submit results via your own
484- pipeline.
485-
486468## Framework Adapters
487469
488470Governance needs three things to be real: a ** point of interception** (we sit
@@ -617,7 +599,7 @@ const middleware = createGovernanceMiddleware(gov, {
617599
618600## Export Paths
619601
620- The SDK ships ** 49 targeted exports** so you can import only what you need:
602+ The SDK ships ** 44 targeted exports** so you can import only what you need:
621603
622604```
623605# Core
@@ -641,7 +623,6 @@ governance-sdk/injection-benchmark LIB — 6.9K-sample benchmark runner
641623# Audit + identity
642624governance-sdk/audit-integrity HMAC hash-chain primitives (createIntegrityAudit, verifyAuditIntegrity)
643625governance-sdk/audit-integrity-verify standalone chain verifier (for offline audit)
644- governance-sdk/action-recorder runWithOutcome() — record action success/failure into the chain
645626governance-sdk/agent-identity agent identity tokens
646627governance-sdk/agent-identity-ed25519 Ed25519 signing + verification
647628governance-sdk/kill-switch priority-999 emergency halt
@@ -652,23 +633,21 @@ governance-sdk/owasp-agentic OWASP Top 10 for LLMs / Agentic
652633governance-sdk/nist-ai-rmf NIST AI RMF (Govern/Map/Measure/Manage)
653634governance-sdk/iso-42001 ISO/IEC 42001 controls
654635
655- # Eval
656- governance-sdk/eval-types shared eval types
657- governance-sdk/eval-scorer trace scoring
658- governance-sdk/eval-trace trace submission
659-
660- # Runtime + storage
661- governance-sdk/events typed event emitter
662- governance-sdk/metrics in-memory counter / timing snapshots (serialize to your monitoring system)
663- governance-sdk/otel-hooks OTel-compatible span data (zero OTel deps; wire to your own tracer)
636+ # Storage
664637governance-sdk/storage-postgres PostgreSQL storage adapter
665638governance-sdk/storage-postgres-schema schema DDL + migrations
666639
640+ # Optional observability primitives — passive in-memory, host wires to its own
641+ # monitoring; NOT OpenInference-compliant. A real OTel exporter is on the roadmap.
642+ governance-sdk/events typed event emitter
643+ governance-sdk/metrics in-memory counter / timing snapshots
644+ governance-sdk/otel-hooks governance-prefixed span shape (passive — user must wire)
645+
667646# Scanner + type surface
668647governance-sdk/scanner-plugins scanner plugin interface
669648governance-sdk/token-types token type guards
670649
671- # Framework adapters (10 featured + 4 specialty)
650+ # Framework adapters (10 featured + 3 specialty)
672651governance-sdk/plugins/mastra
673652governance-sdk/plugins/mastra-processor
674653governance-sdk/plugins/vercel-ai
@@ -680,17 +659,19 @@ governance-sdk/plugins/llamaindex
680659governance-sdk/plugins/mistral
681660governance-sdk/plugins/ollama
682661governance-sdk/plugins/mcp
683- governance-sdk/plugins/mcp-annotations
684662governance-sdk/plugins/mcp-trust
685663governance-sdk/plugins/mcp-chain-audit
686664governance-sdk/plugins/bedrock
687665```
688666
667+ ` runWithOutcome() ` (a thin helper around ` gov.recordOutcome ` ) is exposed at the
668+ top-level package export — ` import { runWithOutcome } from 'governance-sdk' ` .
669+
689670## Project Stats
690671
691672- ** 0** runtime dependencies
692- - ** 1,348 ** tests, 0 failures (` npm test ` )
693- - ** 49 ** export paths — tree-shakeable, import only what you use
673+ - ** 1,328 ** tests, 0 failures (` npm test ` )
674+ - ** 44 ** export paths — tree-shakeable, import only what you use
694675- ** TypeScript strict mode** , no ` any ` types in source
695676- ** MIT licensed**
696677
0 commit comments