Skip to content

Commit ef3ff32

Browse files
committed
Add design doc for debugging-focused MCP server pivot
Captures the design to replace manifest-generation tools with debugging prompts and embedded resources for troubleshooting Kuadrant installations.
1 parent 748a158 commit ef3ff32

File tree

1 file changed

+180
-0
lines changed

1 file changed

+180
-0
lines changed
Lines changed: 180 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,180 @@
1+
# Feature: Debugging-Focused MCP Server
2+
3+
## Summary
4+
5+
Pivot the kuadrant-mcp-server from a manifest-generation MCP to a debugging-focused MCP that serves structured prompts and embedded resources for troubleshooting Kuadrant installations. The server provides guided debugging workflows via MCP prompts, backed by embedded markdown documentation, and delegates all cluster interaction to a companion Kubernetes MCP server.
6+
7+
## Goals
8+
9+
- Provide structured debugging prompts for all Kuadrant policy types and installation issues
10+
- Embed debugging guides into the binary for offline, zero-network operation
11+
- Cover both platform engineers (production issues) and developers (local/staging issues)
12+
- Direct the LLM to use a companion Kubernetes MCP server for cluster queries — no kubectl instructions
13+
- Support Istio as the primary gateway provider with Istio-specific debugging guidance
14+
15+
## Non-Goals
16+
17+
- Active cluster interaction (no tools that query Kubernetes directly)
18+
- Manifest generation (removed entirely)
19+
- Runtime doc fetching from upstream GitHub repos
20+
- Supporting gateway providers other than Istio (may be added later)
21+
22+
## Design
23+
24+
### Backwards Compatibility
25+
26+
This is a complete pivot. All existing tools and resources are removed. Users relying on the manifest-generation tools will need to stop using this server for that purpose. The server name and transport options remain the same.
27+
28+
### Architecture Changes
29+
30+
```
31+
kuadrant-mcp-server (before)
32+
├── main.go — 6 manifest-generation tools + transport
33+
├── resources.go — 13 resources fetched via HTTP from GitHub
34+
└── process-docs.go / update-docs.sh — doc extraction tooling
35+
36+
kuadrant-mcp-server (after)
37+
├── main.go — Server setup, prompt + resource registration, transport
38+
├── prompts.go — 10 prompt definitions and handlers
39+
├── resources.go — 10 resources served from embedded filesystem
40+
└── docs/
41+
└── debugging/ — Embedded markdown debugging guides (10 files)
42+
```
43+
44+
Key changes:
45+
- `go:embed` bundles `docs/debugging/*.md` into the binary
46+
- No HTTP client, no caching, no network dependency for serving content
47+
- `gopkg.in/yaml.v3` dependency removed (no YAML generation)
48+
- `process-docs.go` and `update-docs.sh` removed (no upstream doc extraction)
49+
50+
### API Changes
51+
52+
No CRD or Kubernetes API changes. The MCP API surface changes as follows:
53+
54+
**Tools: all removed**
55+
56+
The 6 manifest-generation tools (`create_gateway`, `create_httproute`, `create_dnspolicy`, `create_tlspolicy`, `create_ratelimitpolicy`, `create_authpolicy`) are removed.
57+
58+
**Prompts: 10 new**
59+
60+
| Prompt Name | Description | Arguments |
61+
|---|---|---|
62+
| `debug-installation` | Verify operator, CRDs, Kuadrant CR, Istio, Limitador, Authorino | `namespace` (default: `kuadrant-system`) |
63+
| `debug-gateway` | Gateway not accepting traffic, listeners, Istio proxy | `gateway-name` (required), `namespace` (default: `kuadrant-system`) |
64+
| `debug-dnspolicy` | DNS records not created, provider config, zone issues | `policy-name` (required), `namespace` (optional) |
65+
| `debug-tlspolicy` | Certificates not issuing, issuer problems, cert-manager | `policy-name` (required), `namespace` (optional) |
66+
| `debug-ratelimitpolicy` | Rate limits not enforced, Limitador health, targeting | `policy-name` (required), `namespace` (optional) |
67+
| `debug-authpolicy` | Auth not enforced, Authorino health, rule matching | `policy-name` (required), `namespace` (optional) |
68+
| `debug-telemetrypolicy` | Custom metrics not appearing, CEL expression issues | `policy-name` (required), `namespace` (optional) |
69+
| `debug-tokenratelimitpolicy` | Token-based rate limiting not working | `policy-name` (required), `namespace` (optional) |
70+
| `debug-policy-status` | Interpret status conditions on any policy | `policy-name` (required), `namespace` (optional), `policy-kind` (required) |
71+
| `debug-policy-conflicts` | Override/default conflicts, policy hierarchy | `namespace` (optional) |
72+
73+
Each prompt follows a consistent output structure:
74+
1. **Context** — what the component/policy does and common failure modes
75+
2. **Prerequisites** — what should be checked first
76+
3. **Diagnostic steps** — ordered steps directing the LLM to use the Kubernetes MCP server (get resource, list pods, get events, read logs)
77+
4. **Resource references** — URIs to embedded debugging resources
78+
5. **Common fixes** — most frequent resolutions
79+
80+
Prompts use generic Kubernetes MCP tool patterns (e.g. "use the kubernetes MCP server to get the resource") rather than tool names from a specific implementation.
81+
82+
Arguments with no default instruct the LLM to ask the user if not provided.
83+
84+
**Resources: 10 new (replacing 13 old)**
85+
86+
| URI | Embedded File | Description |
87+
|---|---|---|
88+
| `kuadrant://debug/installation` | `docs/debugging/installation.md` | Operator, CRDs, Kuadrant CR, Istio health |
89+
| `kuadrant://debug/gateway-istio` | `docs/debugging/gateway-istio.md` | Istio gateway proxy, listeners, envoy config |
90+
| `kuadrant://debug/dnspolicy` | `docs/debugging/dnspolicy.md` | DNS provider, zone config, record creation |
91+
| `kuadrant://debug/tlspolicy` | `docs/debugging/tlspolicy.md` | cert-manager, issuer, certificate lifecycle |
92+
| `kuadrant://debug/ratelimitpolicy` | `docs/debugging/ratelimitpolicy.md` | Limitador health, rate limit enforcement |
93+
| `kuadrant://debug/authpolicy` | `docs/debugging/authpolicy.md` | Authorino health, auth rule matching |
94+
| `kuadrant://debug/telemetrypolicy` | `docs/debugging/telemetrypolicy.md` | Custom metrics, CEL expressions |
95+
| `kuadrant://debug/tokenratelimitpolicy` | `docs/debugging/tokenratelimitpolicy.md` | Token-based rate limiting |
96+
| `kuadrant://debug/status-conditions` | `docs/debugging/status-conditions.md` | All status conditions across all policy types |
97+
| `kuadrant://debug/policy-conflicts` | `docs/debugging/policy-conflicts.md` | Override/default hierarchy, multi-policy resolution |
98+
99+
### Component Changes
100+
101+
**main.go:**
102+
- Remove all tool parameter structs and handler functions
103+
- Remove `server.AddTools()` calls
104+
- Remove `validateWindow` helper
105+
- Remove `gopkg.in/yaml.v3` import
106+
- Add calls to `addDebugPrompts(server)` and `addDebugResources(server)`
107+
- Transport handling (stdio/sse/http) remains unchanged
108+
109+
**resources.go (rewrite):**
110+
- Remove `docCache`, `cachedDoc`, HTTP client, `fetch()`, `fallbackOrError()`
111+
- Remove `docSource` struct and HTTP-based `resourceMapping`
112+
- Add `//go:embed docs/debugging/*.md` directive
113+
- New `resourceDef` struct mapping URI to embedded path, name, description
114+
- `addDebugResources()` reads from `embed.FS` and registers as MCP resources
115+
116+
**prompts.go (new):**
117+
- Prompt definitions with argument schemas
118+
- Handler functions that build the structured debugging output
119+
- Template substitution for arguments (policy-name, namespace)
120+
- `addDebugPrompts()` registers all prompts with the server
121+
122+
**Removed files:**
123+
- `process-docs.go`
124+
- `update-docs.sh`
125+
126+
### Security Considerations
127+
128+
- No cluster access — the server never touches Kubernetes directly
129+
- Embedded content is static and compiled in — no risk of fetching malicious content at runtime
130+
- No credentials or secrets involved
131+
- The companion Kubernetes MCP server handles its own RBAC/auth
132+
133+
## Testing Strategy
134+
135+
- **Unit tests**: Prompt handlers return expected output structure for given arguments. Resource handlers serve correct embedded content for each URI.
136+
- **Integration tests**: MCP protocol-level tests — initialize server, call `prompts/list`, invoke a prompt, call `resources/list`, read a resource. Verify JSON-RPC responses.
137+
- **Manual testing**: End-to-end with Claude Code + Kubernetes MCP server against a live cluster.
138+
139+
## Open Questions
140+
141+
- None currently
142+
143+
## Execution
144+
145+
### Todo
146+
147+
- [ ] Scaffold embedded docs structure and write debugging guides
148+
- [ ] `docs/debugging/installation.md`
149+
- [ ] `docs/debugging/gateway-istio.md`
150+
- [ ] `docs/debugging/dnspolicy.md`
151+
- [ ] `docs/debugging/tlspolicy.md`
152+
- [ ] `docs/debugging/ratelimitpolicy.md`
153+
- [ ] `docs/debugging/authpolicy.md`
154+
- [ ] `docs/debugging/telemetrypolicy.md`
155+
- [ ] `docs/debugging/tokenratelimitpolicy.md`
156+
- [ ] `docs/debugging/status-conditions.md`
157+
- [ ] `docs/debugging/policy-conflicts.md`
158+
- [ ] Rewrite `resources.go` to serve from embedded FS
159+
- [ ] Unit tests for resource serving
160+
- [ ] Create `prompts.go` with all prompt definitions and handlers
161+
- [ ] Unit tests for prompt handlers
162+
- [ ] Simplify `main.go` — remove tools, wire up prompts + resources
163+
- [ ] Remove `process-docs.go`, `update-docs.sh`, `gopkg.in/yaml.v3` dependency
164+
- [ ] Integration tests (MCP protocol-level)
165+
- [ ] Update `CLAUDE.md` and `README.md`
166+
- [ ] Update Dockerfile if needed
167+
168+
### Completed
169+
170+
## Change Log
171+
172+
### 2026-03-20 — Initial design
173+
174+
- Decided to pivot from manifest-generation to debugging-focused MCP
175+
- Chose Approach A: prompts as structured workflows + resources as reference material
176+
- Embedded docs (go:embed) over runtime HTTP fetching for offline reliability
177+
- No tools — all cluster interaction via companion Kubernetes MCP server
178+
- Istio as primary gateway provider
179+
- Generic Kubernetes MCP tool references in prompts (not vendor-specific)
180+
- Namespace arguments default sensibly but are always overridable

0 commit comments

Comments
 (0)