Skip to content

Commit 5e704ef

Browse files
Merge pull request #134 from dylan-mccarthy/copilot/add-c4-diagrams
E8-T2: Add C4 architecture diagrams
2 parents 94afeb3 + 11f6f99 commit 5e704ef

File tree

3 files changed

+332
-0
lines changed

3 files changed

+332
-0
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1172,6 +1172,7 @@ cd agents
11721172
## Documentation
11731173

11741174
- [System Architecture Document (SAD)](sad.md) - High-level system design and architecture
1175+
- [Architecture Diagrams](docs/ARCHITECTURE_DIAGRAMS.md) - C4 context and container diagrams
11751176
- [Invoice Classifier Agent](docs/INVOICE_CLASSIFIER.md) - Technical documentation for the MVP Invoice Classifier agent
11761177
- [Agent Definitions Guide](agents/README.md) - Guide to agent definitions and seeding agents
11771178
- [Agent Versioning and Validation](docs/VERSIONING.md) - Guide to agent versioning, semantic versioning, and spec validation

docs/ARCHITECTURE_DIAGRAMS.md

Lines changed: 329 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,329 @@
1+
# Architecture Diagrams
2+
3+
This document contains C4 architecture diagrams for the Business Process Agents MVP system.
4+
5+
## Overview
6+
7+
The C4 model provides a hierarchical view of the system architecture:
8+
- **Context**: System context and external dependencies
9+
- **Container**: High-level technology choices and container interactions
10+
11+
---
12+
13+
## C4 Context Diagram
14+
15+
The context diagram shows how the Business Process Agents platform fits into the broader enterprise ecosystem, including external actors and systems.
16+
17+
```mermaid
18+
C4Context
19+
title Business Process Agents MVP - System Context
20+
21+
Person(admin, "Platform Admin", "Manages agents, monitors system health, configures deployments")
22+
Person(developer, "Agent Developer", "Creates and deploys business process agents")
23+
24+
System(bpa, "Business Process Agents Platform", "Orchestrates AI agents for business process automation using Microsoft Agent Framework")
25+
26+
System_Ext(azureai, "Azure AI Foundry", "Provides LLM models (GPT-4, etc.) for agent reasoning")
27+
System_Ext(servicebus, "Azure Service Bus", "Message queue for input events and DLQ")
28+
System_Ext(keyvault, "Azure Key Vault", "Stores secrets and connection strings")
29+
System_Ext(targetapi, "Target Business APIs", "Downstream systems that agents interact with (e.g., Invoice API)")
30+
System_Ext(identity, "Identity Provider", "OIDC authentication (Keycloak/Entra)")
31+
32+
Rel(admin, bpa, "Monitors and manages", "HTTPS/UI")
33+
Rel(developer, bpa, "Deploys agents", "API/UI")
34+
35+
Rel(bpa, azureai, "Calls LLM", "HTTPS/OpenAI SDK")
36+
Rel(bpa, servicebus, "Consumes messages, publishes to DLQ", "AMQP")
37+
Rel(bpa, keyvault, "Retrieves secrets", "HTTPS")
38+
Rel(bpa, targetapi, "Invokes business logic", "HTTPS")
39+
Rel(bpa, identity, "Authenticates users", "OIDC")
40+
41+
Rel(servicebus, bpa, "Triggers agent runs", "Event notification")
42+
```
43+
44+
**Key External Dependencies:**
45+
46+
- **Azure AI Foundry**: Hosts LLM models (e.g., GPT-4) used by agents for reasoning and decision-making via Microsoft Agent Framework
47+
- **Azure Service Bus**: Input queue for business events (invoices, orders, etc.) and dead-letter queue for failed messages
48+
- **Azure Key Vault**: Secure storage for connection strings, API keys, and certificates
49+
- **Target Business APIs**: External REST APIs that agents call to perform business actions (e.g., creating invoices, updating records)
50+
- **Identity Provider**: OIDC provider for admin/developer authentication (Keycloak for dev, Entra for production)
51+
52+
---
53+
54+
## C4 Container Diagram
55+
56+
The container diagram shows the internal structure of the Business Process Agents platform, including key components and their interactions.
57+
58+
```mermaid
59+
C4Container
60+
title Business Process Agents MVP - Container View
61+
62+
Person(admin, "Platform Admin", "Manages agents and monitors system")
63+
64+
Container_Boundary(control, "Control Plane (Kubernetes)") {
65+
Container(api, "Control API", "ASP.NET Core + gRPC", "Manages agents, nodes, runs, and deployments")
66+
Container(scheduler, "Scheduler", "Hosted Service", "Least-loaded scheduling with placement constraints")
67+
Container(database, "PostgreSQL", "Relational Database", "Stores agents, versions, deployments, nodes, runs")
68+
Container(cache, "Redis", "In-Memory Store", "Manages leases, locks, and rate limits")
69+
Container(otel, "OTel Collector", "Telemetry Hub", "Collects and exports metrics, traces, logs")
70+
Container(ui, "Admin UI", "Next.js SPA", "Fleet dashboard, runs viewer, agent editor")
71+
}
72+
73+
Container_Boundary(worker, "Worker Node") {
74+
Container(runtime, "Node Runtime", ".NET Worker Service", "Pulls leases, executes agents in sandboxes, reports results")
75+
Container(connectors, "Connectors SDK", ".NET Libraries", "Service Bus input, HTTP output, DLQ handling")
76+
}
77+
78+
Container_Boundary(observability, "Observability Stack") {
79+
Container(prometheus, "Prometheus", "Metrics Store", "Stores time-series metrics")
80+
Container(tempo, "Tempo", "Trace Store", "Stores distributed traces")
81+
Container(loki, "Loki", "Log Aggregation", "Stores structured logs")
82+
Container(grafana, "Grafana", "Visualization", "Dashboards for metrics, traces, logs")
83+
}
84+
85+
System_Ext(servicebus, "Azure Service Bus", "Message queue and DLQ")
86+
System_Ext(azureai, "Azure AI Foundry", "LLM inference")
87+
System_Ext(targetapi, "Target Business API", "Downstream systems")
88+
System_Ext(keycloak, "Keycloak/Entra", "Identity provider")
89+
90+
Rel(admin, ui, "Uses", "HTTPS")
91+
Rel(ui, api, "Calls", "REST/gRPC")
92+
Rel(api, scheduler, "Invokes", "In-process")
93+
Rel(api, database, "Reads/Writes", "SQL")
94+
Rel(scheduler, cache, "Manages leases", "Redis protocol")
95+
Rel(scheduler, database, "Queries nodes/runs", "SQL")
96+
97+
Rel(runtime, api, "Registers, heartbeats", "gRPC")
98+
Rel(api, runtime, "Streams leases", "gRPC")
99+
Rel(runtime, connectors, "Orchestrates", "In-process")
100+
Rel(connectors, servicebus, "Receives/Acks/Nacks", "AMQP")
101+
Rel(connectors, azureai, "Calls LLM via MAF", "HTTPS")
102+
Rel(connectors, targetapi, "Posts results", "HTTPS")
103+
104+
Rel(runtime, otel, "Sends telemetry", "OTLP")
105+
Rel(api, otel, "Sends telemetry", "OTLP")
106+
Rel(otel, prometheus, "Exports metrics", "Prometheus Remote Write")
107+
Rel(otel, tempo, "Exports traces", "OTLP")
108+
Rel(otel, loki, "Exports logs", "Loki API")
109+
Rel(grafana, prometheus, "Queries", "PromQL")
110+
Rel(grafana, tempo, "Queries", "TraceQL")
111+
Rel(grafana, loki, "Queries", "LogQL")
112+
113+
Rel(ui, keycloak, "Authenticates", "OIDC")
114+
Rel(api, keycloak, "Validates tokens", "JWT")
115+
```
116+
117+
**Key Containers:**
118+
119+
### Control Plane
120+
- **Control API**: REST and gRPC endpoints for managing agents, nodes, and runs; integrates with Microsoft Agent Framework SDK
121+
- **Scheduler**: Selects optimal node for each run based on capacity and placement constraints
122+
- **PostgreSQL**: Persistent storage for all system state
123+
- **Redis**: Distributed locks and lease management with TTL
124+
- **OTel Collector**: Central telemetry aggregation point
125+
- **Admin UI**: Web interface for operators and developers
126+
127+
### Worker Node
128+
- **Node Runtime**: Long-running worker service that pulls leases, executes agents via MAF, and reports status
129+
- **Connectors SDK**: Pluggable input/output adapters (Service Bus, HTTP, DLQ)
130+
131+
### Observability Stack
132+
- **Prometheus**: Metrics storage and querying (runs, latency, tokens, cost)
133+
- **Tempo**: Distributed tracing backend
134+
- **Loki**: Log aggregation with trace correlation
135+
- **Grafana**: Unified dashboards for all telemetry
136+
137+
---
138+
139+
## Additional Diagrams
140+
141+
### Sequence Diagram: Agent Run Flow
142+
143+
Shows the end-to-end flow of processing a message through an agent run.
144+
145+
```mermaid
146+
sequenceDiagram
147+
participant SB as Azure Service Bus
148+
participant API as Control API
149+
participant Sched as Scheduler
150+
participant Node as Node Runtime
151+
participant Agent as Agent (MAF)
152+
participant LLM as Azure AI Foundry
153+
participant TargetAPI as Target Business API
154+
155+
SB->>API: Queue depth notification
156+
API->>Sched: Create run request
157+
Sched->>Sched: Select node (least-loaded)
158+
Sched->>API: Return lease assignment
159+
API->>Node: Stream lease (gRPC)
160+
Node->>Node: Start sandbox process
161+
Node->>SB: Receive message
162+
Node->>Agent: Execute with message payload
163+
Agent->>LLM: LLM reasoning call
164+
LLM-->>Agent: Response with tool calls
165+
Agent->>TargetAPI: POST with idempotency key
166+
TargetAPI-->>Agent: 200 OK
167+
Agent-->>Node: Execution complete
168+
Node->>SB: Complete message (ack)
169+
Node->>API: Report run complete
170+
API->>Sched: Release lease
171+
```
172+
173+
### Sequence Diagram: Failure and DLQ Flow
174+
175+
Shows how failures are handled and messages are routed to the dead-letter queue.
176+
177+
```mermaid
178+
sequenceDiagram
179+
participant SB as Azure Service Bus
180+
participant Node as Node Runtime
181+
participant Agent as Agent (MAF)
182+
participant TargetAPI as Target Business API
183+
participant DLQ as Dead Letter Queue
184+
185+
SB->>Node: Receive message
186+
Node->>Agent: Execute agent run
187+
Agent->>TargetAPI: POST /api/endpoint
188+
TargetAPI-->>Agent: 500 Internal Server Error
189+
Agent-->>Node: Retry 1/3
190+
Node->>Agent: Execute agent run
191+
Agent->>TargetAPI: POST /api/endpoint (retry)
192+
TargetAPI-->>Agent: 500 Internal Server Error
193+
Agent-->>Node: Retry 2/3
194+
Node->>Agent: Execute agent run
195+
Agent->>TargetAPI: POST /api/endpoint (retry)
196+
TargetAPI-->>Agent: 500 Internal Server Error
197+
Agent-->>Node: Retry 3/3 (failed)
198+
Node->>SB: Abandon message
199+
SB->>DLQ: Move to dead-letter queue
200+
Node->>API: Report run failed
201+
```
202+
203+
---
204+
205+
## Deployment View
206+
207+
### Local Development (k3d)
208+
209+
```mermaid
210+
graph TB
211+
subgraph "k3d Cluster"
212+
subgraph "Control Plane Namespace"
213+
API[Control API]
214+
Sched[Scheduler]
215+
UI[Admin UI]
216+
PG[PostgreSQL]
217+
Redis[Redis Cache]
218+
OTel[OTel Collector]
219+
end
220+
221+
subgraph "Worker Namespace"
222+
Node1[Node Runtime 1]
223+
Node2[Node Runtime 2]
224+
end
225+
226+
subgraph "Observability Namespace"
227+
Prom[Prometheus]
228+
Tempo[Tempo]
229+
Loki[Loki]
230+
Grafana[Grafana]
231+
end
232+
end
233+
234+
subgraph "External Services"
235+
SB[Azure Service Bus]
236+
AzureAI[Azure AI Foundry]
237+
KC[Keycloak]
238+
end
239+
240+
API --> PG
241+
API --> Redis
242+
Sched --> Redis
243+
Node1 --> API
244+
Node2 --> API
245+
Node1 --> SB
246+
Node2 --> SB
247+
Node1 --> AzureAI
248+
Node2 --> AzureAI
249+
API --> OTel
250+
Node1 --> OTel
251+
OTel --> Prom
252+
OTel --> Tempo
253+
OTel --> Loki
254+
UI --> KC
255+
```
256+
257+
### Production (AKS)
258+
259+
```mermaid
260+
graph TB
261+
subgraph "Azure"
262+
subgraph "AKS Cluster"
263+
subgraph "Control Plane"
264+
API[Control API<br/>2 replicas]
265+
Sched[Scheduler]
266+
UI[Admin UI]
267+
end
268+
269+
subgraph "Worker Nodes"
270+
Node1[Node 1]
271+
Node2[Node 2]
272+
NodeN[Node N]
273+
end
274+
275+
subgraph "Observability"
276+
OTel[OTel Collector]
277+
Grafana[Grafana]
278+
end
279+
end
280+
281+
PG[Azure Database<br/>for PostgreSQL]
282+
Redis[Azure Cache<br/>for Redis]
283+
SB[Azure Service Bus]
284+
AzureAI[Azure AI Foundry]
285+
KV[Azure Key Vault]
286+
Monitor[Azure Monitor]
287+
Entra[Entra ID]
288+
end
289+
290+
API --> PG
291+
API --> Redis
292+
API --> KV
293+
Sched --> Redis
294+
Node1 --> API
295+
Node2 --> API
296+
NodeN --> API
297+
Node1 --> SB
298+
Node1 --> AzureAI
299+
API --> OTel
300+
Node1 --> OTel
301+
OTel --> Monitor
302+
UI --> Entra
303+
```
304+
305+
---
306+
307+
## Technology Stack Summary
308+
309+
| Layer | Technologies |
310+
|-------|-------------|
311+
| **Control Plane** | ASP.NET Core, gRPC, Microsoft Agent Framework SDK |
312+
| **Worker Runtime** | .NET Worker Service, Microsoft Agent Framework |
313+
| **Storage** | PostgreSQL, Redis |
314+
| **Messaging** | Azure Service Bus, NATS JetStream |
315+
| **AI/LLM** | Azure AI Foundry (GPT-4, etc.) |
316+
| **Observability** | OpenTelemetry, Prometheus, Tempo, Loki, Grafana |
317+
| **UI** | Next.js, React, Tailwind CSS, shadcn/ui |
318+
| **Auth** | Keycloak (dev), Entra ID (prod), OIDC |
319+
| **Infrastructure** | Kubernetes (k3d/AKS), Helm, Docker |
320+
| **Secrets** | Azure Key Vault, External Secrets Operator |
321+
322+
---
323+
324+
## References
325+
326+
- [System Architecture Document (SAD)](../sad.md)
327+
- [Microsoft Agent Framework Documentation](https://learn.microsoft.com/en-us/agent-framework/)
328+
- [C4 Model](https://c4model.com/)
329+
- [Azure AI Foundry Integration](./AZURE_AI_FOUNDRY_INTEGRATION.md)

sad.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,8 @@ Prove the core concepts of a **business process agents** platform built on the *
3737

3838
## 3. High‑Level Architecture (MVP)
3939

40+
> **Note**: For detailed C4 diagrams including context and container views, see [Architecture Diagrams](docs/ARCHITECTURE_DIAGRAMS.md).
41+
4042
```mermaid
4143
C4Container
4244
title MVP – Container View

0 commit comments

Comments
 (0)