Skip to content

Commit cde6636

Browse files
feat(demo): real LLM governance demo with multi-backend support (#186)
* feat(demo): replace mock demo with real LLM governance demo - Replace simulated agents with real LLM API calls (OpenAI, Azure OpenAI, Google Gemini) - Auto-detect backend from environment variables (GOOGLE_API_KEY, OPENAI_API_KEY, AZURE_OPENAI_*) - Add Scenario 4: pre-LLM content filtering (blocks dangerous prompts before API call) - Normalize responses across backends for consistent tool-call handling - Add --model and --verbose flags, remove --live flag - Remove internal references (Show & Tell, VP review) - Update demo README with new architecture and usage docs Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix(demo): add Gemini backend support and graceful LLM error handling - Add Google Gemini as LLM backend (auto-detect GOOGLE_API_KEY) - Normalize tool calls across OpenAI/Azure/Gemini backends - Gracefully catch API errors (quota, auth) with simulated fallback - Governance middleware always runs real, even when LLM fails Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent dc26188 commit cde6636

File tree

2 files changed

+796
-291
lines changed

2 files changed

+796
-291
lines changed

demo/README.md

Lines changed: 35 additions & 89 deletions
Original file line numberDiff line numberDiff line change
@@ -1,129 +1,75 @@
1-
# Agent Governance Toolkit × MAF — Runtime Demo
1+
# Agent Governance Toolkit — Live Governance Demo
22

3-
> **Show & Tell demo** for the Microsoft Agent Framework VP and team.
4-
> Demonstrates real-time governance enforcement across a multi-agent
5-
> research pipeline using four composable middleware layers.
3+
> Demonstrates real-time governance enforcement using **real LLM calls**
4+
> (OpenAI / Azure OpenAI) with the full governance middleware stack.
5+
> Every API call, policy decision, and audit entry is real.
66
77
## What This Shows
88

99
| Scenario | Layer | What Happens |
1010
|----------|-------|--------------|
11-
| **1. Policy Enforcement** | `GovernancePolicyMiddleware` | Declarative YAML policy allows web search but blocks access to `**/internal/**` paths |
12-
| **2. Capability Sandboxing** | `CapabilityGuardMiddleware` | Ring-2 tool guard allows `run_code` but denies `write_file` |
13-
| **3. Rogue Detection** | `RogueDetectionMiddleware` | Behavioral anomaly engine detects a 50-call email burst and auto-quarantines the agent |
14-
| **Audit Trail** | `AuditTrailMiddleware` + Merkle chain | Every decision is logged with cryptographic integrity verification |
15-
16-
All governance decisions are made by **real middleware** from the Agent
17-
Governance Toolkit — the same code that runs in production.
11+
| **1. Policy Enforcement** | `GovernancePolicyMiddleware` | YAML policy allows a search prompt but blocks `**/internal/**`**before the LLM is called** |
12+
| **2. Capability Sandboxing** | `CapabilityGuardMiddleware` | LLM requests tool calls; governance allows `run_code` but denies `write_file` |
13+
| **3. Rogue Detection** | `RogueDetectionMiddleware` | Behavioral anomaly engine detects a 50-call burst and auto-quarantines |
14+
| **4. Content Filtering** | `GovernancePolicyMiddleware` | Multiple prompts evaluated — dangerous ones blocked, safe ones forwarded |
15+
| **Audit Trail** | `AuditLog` + Merkle chain | Every decision is cryptographically chained and verifiable |
1816

1917
## Architecture
2018

2119
```
22-
┌───────────────────────────────────────────────────┐
23-
│ MAF Agent (agent_framework.Agent) │
24-
│ ┌─────────────────────────────────────────────┐ │
25-
│ │ AuditTrailMiddleware (AgentMiddleware) │◄── Tamper-proof logging
26-
│ │ GovernancePolicyMiddleware (AgentMiddleware)│◄── YAML policy eval
27-
│ │ CapabilityGuardMiddleware (FuncMiddleware) │◄── Tool allow/deny
28-
│ │ RogueDetectionMiddleware (FuncMiddleware) │◄── Anomaly scoring
29-
│ └─────────────────────────────────────────────┘ │
30-
│ ▼ │
31-
│ Agent / Tool Execution │
32-
└───────────────────────────────────────────────────┘
33-
│ │
34-
▼ ▼
20+
+-------------------------------------------------------+
21+
| Agent (with real OpenAI / Azure OpenAI backend) |
22+
| +--------------------------------------------------+ |
23+
| | GovernancePolicyMiddleware (YAML policy eval) | <-- Blocks before LLM
24+
| | CapabilityGuardMiddleware (tool allow/deny) | <-- Intercepts tools
25+
| | RogueDetectionMiddleware (anomaly scoring) | <-- Behavioral SRE
26+
| +--------------------------------------------------+ |
27+
| | |
28+
| Real LLM API Call (gpt-4o-mini) |
29+
+-------------------------------------------------------+
30+
| |
31+
v v
3532
AuditLog (Merkle) RogueAgentDetector
3633
agentmesh.governance agent_sre.anomaly
3734
```
3835

3936
## Prerequisites
4037

4138
```bash
42-
# From the repo root (packages are already installed in editable mode)
43-
pip install agent-os-kernel agentmesh-platform agent-sre agent-hypervisor
44-
45-
# Or install everything at once
46-
pip install ai-agent-compliance[full]
39+
pip install agent-os-kernel agentmesh-platform agent-sre openai
4740

48-
# For YAML policy loading
49-
pip install pyyaml
41+
# Set your API key (pick one)
42+
export OPENAI_API_KEY="sk-..."
43+
# or for Azure OpenAI:
44+
export AZURE_OPENAI_API_KEY="..."
45+
export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com"
5046
```
5147

5248
## Running
5349

5450
```bash
5551
cd agent-governance-toolkit
5652

57-
# Default mode — simulated agents, REAL governance middleware
53+
# Default (gpt-4o-mini, ~$0.01 per run)
5854
python demo/maf_governance_demo.py
5955

60-
# Live mode — uses real LLM calls (requires OPENAI_API_KEY)
61-
python demo/maf_governance_demo.py --live
62-
```
63-
64-
## Expected Output
65-
66-
You'll see colourful terminal output with three scenarios:
67-
68-
```
69-
╔════════════════════════════════════════════════════════════════════╗
70-
║ Agent Governance Toolkit × Microsoft Agent Framework ║
71-
║ Runtime Governance Demo — Show & Tell Edition ║
72-
╚════════════════════════════════════════════════════════════════════╝
73-
74-
━━━ Scenario 1: Policy Enforcement ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
75-
76-
🤖 Research Agent → "Search for AI governance papers"
77-
├── ✅ Policy Check: ALLOWED (web_search permitted)
78-
├── 🔧 Tool: web_search("AI governance papers")
79-
├── 📝 Audit: Entry #audit_a1b2c3 logged
80-
└── 📦 Result: "Found 15 papers on AI governance..."
81-
82-
🤖 Research Agent → "Read /internal/secrets/api_keys.txt"
83-
├── ⛔ Policy Check: DENIED (blocked pattern: **/internal/**)
84-
├── 📝 Audit: Entry #audit_d4e5f6 logged (VIOLATION)
85-
└── 📦 Agent received: "Policy violation: Access restricted"
86-
87-
━━━ Scenario 2: Capability Sandboxing ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
88-
89-
🤖 Analysis Agent → run_code("import pandas; ...")
90-
├── ✅ Capability Guard: ALLOWED
91-
└── 📦 Result: "DataFrame loaded: 1,000 rows × 5 columns"
92-
93-
🤖 Analysis Agent → write_file("/output/results.csv")
94-
├── ⛔ Capability Guard: DENIED (not in permitted tools)
95-
└── 📦 Agent received: "Tool not permitted by governance policy"
96-
97-
━━━ Scenario 3: Rogue Agent Detection ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
98-
99-
🤖 Report Agent → send_email (normal)
100-
├── ✅ Rogue Check: LOW RISK (score: 0.00)
101-
└── 📦 Result: "Email sent"
102-
103-
🤖 Report Agent → send_email × 50 — rapid burst
104-
├── 🚨 Rogue Check: CRITICAL (score: 3.42)
105-
├── 🛑 Action: QUARANTINED — Agent execution halted
106-
└── 📦 Agent received: "Agent quarantined: anomalous frequency"
107-
108-
━━━ Audit Trail Summary ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
56+
# Use a specific model
57+
python demo/maf_governance_demo.py --model gpt-4o
10958

110-
📋 Total entries: 8
111-
✅ Allowed: 4 │ ⛔ Denied: 2 │ 🚨 Quarantined: 1 │ 📝 Info: 1
112-
🔒 Merkle chain integrity: VERIFIED ✓
113-
🔗 Root hash: a3f7c2...b2d1e8
59+
# Show raw LLM responses
60+
python demo/maf_governance_demo.py --verbose
11461
```
11562

11663
## Key Files
11764

11865
| File | Purpose |
11966
|------|---------|
120-
| `demo/maf_governance_demo.py` | Main demo script |
67+
| `demo/maf_governance_demo.py` | Main demo script (real LLM calls) |
12168
| `demo/policies/research_policy.yaml` | Declarative governance policy |
122-
| `packages/agent-os/src/agent_os/integrations/maf_adapter.py` | MAF middleware integration |
69+
| `packages/agent-os/src/agent_os/integrations/maf_adapter.py` | Governance middleware |
12370
| `packages/agent-mesh/src/agentmesh/governance/audit.py` | Merkle-chained audit log |
12471
| `packages/agent-sre/src/agent_sre/anomaly/rogue_detector.py` | Rogue agent detector |
12572

12673
## Links
12774

128-
- **Agent Governance Toolkit**: [github.com/imran-siddique/agent-governance-toolkit](https://github.com/imran-siddique/agent-governance-toolkit)
129-
- **Microsoft Agent Framework**: [github.com/microsoft/agent-framework](https://github.com/microsoft/agent-framework)
75+
- [Agent Governance Toolkit](https://github.com/microsoft/agent-governance-toolkit)

0 commit comments

Comments
 (0)