|
| 1 | +# Feature: Debugging-Focused MCP Server |
| 2 | + |
| 3 | +## Summary |
| 4 | + |
| 5 | +Pivot the kuadrant-mcp-server from a manifest-generation MCP to a debugging-focused MCP that serves structured prompts and embedded resources for troubleshooting Kuadrant installations. The server provides guided debugging workflows via MCP prompts, backed by embedded markdown documentation, and delegates all cluster interaction to a companion Kubernetes MCP server. |
| 6 | + |
| 7 | +## Goals |
| 8 | + |
| 9 | +- Provide structured debugging prompts for all Kuadrant policy types and installation issues |
| 10 | +- Embed debugging guides into the binary for offline, zero-network operation |
| 11 | +- Cover both platform engineers (production issues) and developers (local/staging issues) |
| 12 | +- Direct the LLM to use a companion Kubernetes MCP server for cluster queries — no kubectl instructions |
| 13 | +- Support Istio as the primary gateway provider with Istio-specific debugging guidance |
| 14 | + |
| 15 | +## Non-Goals |
| 16 | + |
| 17 | +- Active cluster interaction (no tools that query Kubernetes directly) |
| 18 | +- Manifest generation (removed entirely) |
| 19 | +- Runtime doc fetching from upstream GitHub repos |
| 20 | +- Supporting gateway providers other than Istio (may be added later) |
| 21 | + |
| 22 | +## Design |
| 23 | + |
| 24 | +### Backwards Compatibility |
| 25 | + |
| 26 | +This is a complete pivot. All existing tools and resources are removed. Users relying on the manifest-generation tools will need to stop using this server for that purpose. The server name and transport options remain the same. |
| 27 | + |
| 28 | +### Architecture Changes |
| 29 | + |
| 30 | +``` |
| 31 | +kuadrant-mcp-server (before) |
| 32 | +├── main.go — 6 manifest-generation tools + transport |
| 33 | +├── resources.go — 13 resources fetched via HTTP from GitHub |
| 34 | +└── process-docs.go / update-docs.sh — doc extraction tooling |
| 35 | +
|
| 36 | +kuadrant-mcp-server (after) |
| 37 | +├── main.go — Server setup, prompt + resource registration, transport |
| 38 | +├── prompts.go — 10 prompt definitions and handlers |
| 39 | +├── resources.go — 10 resources served from embedded filesystem |
| 40 | +└── docs/ |
| 41 | + └── debugging/ — Embedded markdown debugging guides (10 files) |
| 42 | +``` |
| 43 | + |
| 44 | +Key changes: |
| 45 | +- `go:embed` bundles `docs/debugging/*.md` into the binary |
| 46 | +- No HTTP client, no caching, no network dependency for serving content |
| 47 | +- `gopkg.in/yaml.v3` dependency removed (no YAML generation) |
| 48 | +- `process-docs.go` and `update-docs.sh` removed (no upstream doc extraction) |
| 49 | + |
| 50 | +### API Changes |
| 51 | + |
| 52 | +No CRD or Kubernetes API changes. The MCP API surface changes as follows: |
| 53 | + |
| 54 | +**Tools: all removed** |
| 55 | + |
| 56 | +The 6 manifest-generation tools (`create_gateway`, `create_httproute`, `create_dnspolicy`, `create_tlspolicy`, `create_ratelimitpolicy`, `create_authpolicy`) are removed. |
| 57 | + |
| 58 | +**Prompts: 10 new** |
| 59 | + |
| 60 | +| Prompt Name | Description | Arguments | |
| 61 | +|---|---|---| |
| 62 | +| `debug-installation` | Verify operator, CRDs, Kuadrant CR, Istio, Limitador, Authorino | `namespace` (default: `kuadrant-system`) | |
| 63 | +| `debug-gateway` | Gateway not accepting traffic, listeners, Istio proxy | `gateway-name` (required), `namespace` (default: `kuadrant-system`) | |
| 64 | +| `debug-dnspolicy` | DNS records not created, provider config, zone issues | `policy-name` (required), `namespace` (optional) | |
| 65 | +| `debug-tlspolicy` | Certificates not issuing, issuer problems, cert-manager | `policy-name` (required), `namespace` (optional) | |
| 66 | +| `debug-ratelimitpolicy` | Rate limits not enforced, Limitador health, targeting | `policy-name` (required), `namespace` (optional) | |
| 67 | +| `debug-authpolicy` | Auth not enforced, Authorino health, rule matching | `policy-name` (required), `namespace` (optional) | |
| 68 | +| `debug-telemetrypolicy` | Custom metrics not appearing, CEL expression issues | `policy-name` (required), `namespace` (optional) | |
| 69 | +| `debug-tokenratelimitpolicy` | Token-based rate limiting not working | `policy-name` (required), `namespace` (optional) | |
| 70 | +| `debug-policy-status` | Interpret status conditions on any policy | `policy-name` (required), `namespace` (optional), `policy-kind` (required) | |
| 71 | +| `debug-policy-conflicts` | Override/default conflicts, policy hierarchy | `namespace` (optional) | |
| 72 | + |
| 73 | +Each prompt follows a consistent output structure: |
| 74 | +1. **Context** — what the component/policy does and common failure modes |
| 75 | +2. **Prerequisites** — what should be checked first |
| 76 | +3. **Diagnostic steps** — ordered steps directing the LLM to use the Kubernetes MCP server (get resource, list pods, get events, read logs) |
| 77 | +4. **Resource references** — URIs to embedded debugging resources |
| 78 | +5. **Common fixes** — most frequent resolutions |
| 79 | + |
| 80 | +Prompts use generic Kubernetes MCP tool patterns (e.g. "use the kubernetes MCP server to get the resource") rather than tool names from a specific implementation. |
| 81 | + |
| 82 | +Arguments with no default instruct the LLM to ask the user if not provided. |
| 83 | + |
| 84 | +**Resources: 10 new (replacing 13 old)** |
| 85 | + |
| 86 | +| URI | Embedded File | Description | |
| 87 | +|---|---|---| |
| 88 | +| `kuadrant://debug/installation` | `docs/debugging/installation.md` | Operator, CRDs, Kuadrant CR, Istio health | |
| 89 | +| `kuadrant://debug/gateway-istio` | `docs/debugging/gateway-istio.md` | Istio gateway proxy, listeners, envoy config | |
| 90 | +| `kuadrant://debug/dnspolicy` | `docs/debugging/dnspolicy.md` | DNS provider, zone config, record creation | |
| 91 | +| `kuadrant://debug/tlspolicy` | `docs/debugging/tlspolicy.md` | cert-manager, issuer, certificate lifecycle | |
| 92 | +| `kuadrant://debug/ratelimitpolicy` | `docs/debugging/ratelimitpolicy.md` | Limitador health, rate limit enforcement | |
| 93 | +| `kuadrant://debug/authpolicy` | `docs/debugging/authpolicy.md` | Authorino health, auth rule matching | |
| 94 | +| `kuadrant://debug/telemetrypolicy` | `docs/debugging/telemetrypolicy.md` | Custom metrics, CEL expressions | |
| 95 | +| `kuadrant://debug/tokenratelimitpolicy` | `docs/debugging/tokenratelimitpolicy.md` | Token-based rate limiting | |
| 96 | +| `kuadrant://debug/status-conditions` | `docs/debugging/status-conditions.md` | All status conditions across all policy types | |
| 97 | +| `kuadrant://debug/policy-conflicts` | `docs/debugging/policy-conflicts.md` | Override/default hierarchy, multi-policy resolution | |
| 98 | + |
| 99 | +### Component Changes |
| 100 | + |
| 101 | +**main.go:** |
| 102 | +- Remove all tool parameter structs and handler functions |
| 103 | +- Remove `server.AddTools()` calls |
| 104 | +- Remove `validateWindow` helper |
| 105 | +- Remove `gopkg.in/yaml.v3` import |
| 106 | +- Add calls to `addDebugPrompts(server)` and `addDebugResources(server)` |
| 107 | +- Transport handling (stdio/sse/http) remains unchanged |
| 108 | + |
| 109 | +**resources.go (rewrite):** |
| 110 | +- Remove `docCache`, `cachedDoc`, HTTP client, `fetch()`, `fallbackOrError()` |
| 111 | +- Remove `docSource` struct and HTTP-based `resourceMapping` |
| 112 | +- Add `//go:embed docs/debugging/*.md` directive |
| 113 | +- New `resourceDef` struct mapping URI to embedded path, name, description |
| 114 | +- `addDebugResources()` reads from `embed.FS` and registers as MCP resources |
| 115 | + |
| 116 | +**prompts.go (new):** |
| 117 | +- Prompt definitions with argument schemas |
| 118 | +- Handler functions that build the structured debugging output |
| 119 | +- Template substitution for arguments (policy-name, namespace) |
| 120 | +- `addDebugPrompts()` registers all prompts with the server |
| 121 | + |
| 122 | +**Removed files:** |
| 123 | +- `process-docs.go` |
| 124 | +- `update-docs.sh` |
| 125 | + |
| 126 | +### Security Considerations |
| 127 | + |
| 128 | +- No cluster access — the server never touches Kubernetes directly |
| 129 | +- Embedded content is static and compiled in — no risk of fetching malicious content at runtime |
| 130 | +- No credentials or secrets involved |
| 131 | +- The companion Kubernetes MCP server handles its own RBAC/auth |
| 132 | + |
| 133 | +## Testing Strategy |
| 134 | + |
| 135 | +- **Unit tests**: Prompt handlers return expected output structure for given arguments. Resource handlers serve correct embedded content for each URI. |
| 136 | +- **Integration tests**: MCP protocol-level tests — initialize server, call `prompts/list`, invoke a prompt, call `resources/list`, read a resource. Verify JSON-RPC responses. |
| 137 | +- **Manual testing**: End-to-end with Claude Code + Kubernetes MCP server against a live cluster. |
| 138 | + |
| 139 | +## Open Questions |
| 140 | + |
| 141 | +- None currently |
| 142 | + |
| 143 | +## Execution |
| 144 | + |
| 145 | +### Todo |
| 146 | + |
| 147 | +- [ ] Scaffold embedded docs structure and write debugging guides |
| 148 | + - [ ] `docs/debugging/installation.md` |
| 149 | + - [ ] `docs/debugging/gateway-istio.md` |
| 150 | + - [ ] `docs/debugging/dnspolicy.md` |
| 151 | + - [ ] `docs/debugging/tlspolicy.md` |
| 152 | + - [ ] `docs/debugging/ratelimitpolicy.md` |
| 153 | + - [ ] `docs/debugging/authpolicy.md` |
| 154 | + - [ ] `docs/debugging/telemetrypolicy.md` |
| 155 | + - [ ] `docs/debugging/tokenratelimitpolicy.md` |
| 156 | + - [ ] `docs/debugging/status-conditions.md` |
| 157 | + - [ ] `docs/debugging/policy-conflicts.md` |
| 158 | +- [ ] Rewrite `resources.go` to serve from embedded FS |
| 159 | + - [ ] Unit tests for resource serving |
| 160 | +- [ ] Create `prompts.go` with all prompt definitions and handlers |
| 161 | + - [ ] Unit tests for prompt handlers |
| 162 | +- [ ] Simplify `main.go` — remove tools, wire up prompts + resources |
| 163 | +- [ ] Remove `process-docs.go`, `update-docs.sh`, `gopkg.in/yaml.v3` dependency |
| 164 | +- [ ] Integration tests (MCP protocol-level) |
| 165 | +- [ ] Update `CLAUDE.md` and `README.md` |
| 166 | +- [ ] Update Dockerfile if needed |
| 167 | + |
| 168 | +### Completed |
| 169 | + |
| 170 | +## Change Log |
| 171 | + |
| 172 | +### 2026-03-20 — Initial design |
| 173 | + |
| 174 | +- Decided to pivot from manifest-generation to debugging-focused MCP |
| 175 | +- Chose Approach A: prompts as structured workflows + resources as reference material |
| 176 | +- Embedded docs (go:embed) over runtime HTTP fetching for offline reliability |
| 177 | +- No tools — all cluster interaction via companion Kubernetes MCP server |
| 178 | +- Istio as primary gateway provider |
| 179 | +- Generic Kubernetes MCP tool references in prompts (not vendor-specific) |
| 180 | +- Namespace arguments default sensibly but are always overridable |
0 commit comments