Key architectural choices and trade-offs behind the Claude Prompts MCP Server. Read this to understand why things are built this way.
Effective LLM interaction is personal. There's no universal prompt that works for everyone.
The system is an unopinionated engine for composability:
- Workflow Atomization: Split workflows into discrete units—single prompts or multi-step chains. You choose the granularity.
- Focus on Context: We handle the plumbing (parsing, routing, validation). You focus on Context Engineering—curating templates and logic.
- Agent-First Navigation: Strict typing and Zod schemas make the codebase navigable by LLM coding agents. The AI is a first-class maintainer.
| Aspect | Decision | Rationale |
|---|---|---|
| Runtime | Node.js (v18+) | I/O-bound workload (file watching, hot-reload). Mature fs ecosystem. |
| Language | TypeScript (strict mode) | Enables contract-driven development. Zod schemas bridge deterministic runtime ↔ probabilistic LLM. |
| Module System | ESM | Modern, tree-shakeable, better tooling support. |
| Transport | Protocol | Use Case | Status |
|---|---|---|---|
| STDIO | Line-based JSON | Claude Desktop, Cursor, CLI tools. Server feels like a local extension. | Active |
| Streamable HTTP | HTTP POST/GET with SSE streams | Web dashboards, remote APIs. One /mcp endpoint. |
Recommended |
| SSE | HTTP Server-Sent Events | Legacy integrations. | Deprecated |
Transport auto-detects at startup. For HTTP, use Streamable HTTP—SSE is deprecated.
| Aspect | Decision | Trade-off |
|---|---|---|
| Storage | JSON files + Markdown templates | Pro: Zero-dependency deployment. Git-versionable prompts. Con: Parsing overhead at scale. |
| State | runtime-state/*.json |
Sessions survive STDIO process restarts. |
| Hot-Reload | File watchers with debouncing | Changes propagate without server restart. |
Why file-based?
git clone && npm start— no database setup- Version prompts alongside code
- Human-readable: debug by reading files, not SQL queries
- File watchers work natively for hot-reload
The in-memory registry caches parsed content. JSON parsing (~5-20ms for hundreds of prompts) is negligible for single-user MCP servers.
Instead of monolithic execution functions, requests flow through a staged pipeline:
Request → Normalize → Parse → Plan → Enhance → Execute → Format → Response
Why Stages?
- Safety: LLM interactions have many "soft" failure points (syntax errors, missing files, validation). Stages enforce interfaces and provide diagnostics at each step.
- Observability: Each stage logs entry/exit with timing and memory metrics. Debugging is straightforward.
- Extensibility: Add a stage file, register it in the orchestrator, done.
Why Not Middleware?
Traditional middleware (like Express) uses next() callbacks. Our pipeline uses explicit stage registration with controlled execution order. This provides:
- Predictable ordering (stage 1 always runs before stage 2)
- Type-safe context passing between stages
- Early exit when response is ready
We expose 3 MCP tools instead of 20+ specialized tools:
| Tool | Purpose |
|---|---|
prompt_engine |
Execute prompts and chains |
resource_manager |
CRUD for prompts, gates, methodologies |
system_control |
Status, framework switching, analytics |
Why Consolidation?
- Token Economy: Every tool definition consumes context window. 3 tools vs 20+ is ~85% reduction in tool schema overhead.
- Intent Accuracy: LLMs route better through a single "Manager" tool with distinct actions than guessing parameters for 20 functions.
- Maintainability: Internal structure can evolve without changing external API.
Internally, resource_manager routes to specialized handlers (PromptResourceHandler, GateToolHandler, FrameworkToolHandler) based on resource_type.
Tool parameters and descriptions are generated from JSON contract files:
server/tooling/contracts/*.json → npm run generate:contracts → mcp-contracts/schemas/_generated/mcp-schemas.ts
Why Contracts?
- Single Source of Truth: No drift between validation, types, and documentation.
- Type Safety: Zod schemas ensure runtime validation matches compile-time types.
- LLM Consumption: Contracts inform tool descriptions that LLMs read.
- Versioning: Contracts enable tracking breaking changes.
We implemented a custom parser for symbolic commands:
>>analysis --> >>summary :: "strict" @CAGEERF #analytical
| Operator | Purpose | Example |
|---|---|---|
>> |
Prompt reference | >>my_prompt |
--> |
Chain steps | >>a --> >>b --> >>c |
:: |
Inline gate | >>prompt :: "validate citations" |
@ |
Framework override | >>prompt @CAGEERF |
# |
Style override | #analytical >>report |
Why a DSL?
- Developer Experience: JSON payloads break flow. Symbolic syntax reads naturally.
- Composability: Operators combine:
#lean >>a --> >>b :: "quality" @ReACT - Discoverability: Syntax is self-documenting in tool descriptions.
Instead of expecting users to memorize resource_manager parameters, we provide wizard-style prompts:
>>create_gate— Guided gate creation>>create_prompt— Prompt/chain authoring>>create_methodology— Framework authoring
Two-Phase UX:
- Design phase: Partial args → template shows guidance and examples
- Validation phase: Complete args → script validates → auto-executes creation
Why Meta-Prompts?
Users don't read documentation—they explore interactively. The prompts teach their own API.
| Type | Lifecycle | Storage | Use Case |
|---|---|---|---|
| Ephemeral | Dies after request | ExecutionContext |
Pipeline state, intermediate results |
| Session | Survives session requests | chain-sessions.json |
Chain step progress, gate reviews |
| Global | Survives restarts | runtime-state/*.json |
Framework selection, system config |
Key Insight: The most common state bug is storing cross-request state in ExecutionContext. Use session managers for persistence.
Three components prevent distributed state bugs:
| Component | Purpose | Anti-Pattern Prevented |
|---|---|---|
GateAccumulator |
Priority-based gate deduplication | Duplicate gates from multiple sources |
DiagnosticAccumulator |
Audit trail across stages | Lost diagnostics in async flows |
FrameworkDecisionAuthority |
Single framework resolution | Multiple stages making conflicting framework decisions |
| Resource | Watch Location | Manager |
|---|---|---|
| Prompts | server/prompts/**/*.md |
FileObserver → PromptAssetManager |
| Gates | server/resources/gates/*/gate.yaml |
GateHotReloadCoordinator |
| Styles | server/resources/styles/*/style.yaml |
StyleHotReloadCoordinator |
| Methodologies | server/resources/methodologies/*/*.yaml |
MethodologyHotReload |
| Tool Descriptions | _generated/tool-descriptions.contracts.json |
ToolDescriptionLoader |
- Debouncing: Multiple rapid changes trigger single reload (100ms window)
- Validation First: Parse and validate before swapping registry
- Atomic Swap: Old registry → new registry in single operation
- Graceful Degradation: Invalid files logged, valid files still loaded
| Type | Content | Default Frequency |
|---|---|---|
system-prompt |
Methodology guidance (CAGEERF, ReACT) | Every 2 chain steps |
gate-guidance |
Quality validation criteria | Every step |
style-guidance |
Response formatting | First step only |
Modifier → Runtime Override → Step Config → Chain Config → Category Config → Global Config → System Default
Why Hierarchical?
Different granularities need different defaults:
- Quick ad-hoc prompt: Use global defaults
- Specific chain step: Override for that step
- Entire category: Set category-wide config
The hierarchy resolves independently per injection type, allowing fine-grained control.
server/resources/gates/
└── {gate-id}/
├── gate.yaml # Configuration (id, criteria, severity)
└── guidance.md # Guidance content (inlined at load)
| Priority | Source | Example |
|---|---|---|
| 100 | Inline operator (::) |
>>prompt :: "validate citations" |
| 90 | Client selection | gates: ["research-quality"] |
| 80 | Temporary request | Request-scoped gates |
| 60 | Prompt config | Gates in prompt metadata |
| 50 | Chain-level | Gates for entire chain |
| 40 | Methodology | Framework-specific gates |
| 20 | Registry auto | Default gates |
Why Priority-Based?
User intent should override defaults. Higher-priority sources (inline, client) represent explicit user decisions.
| Layer | Responsibility |
|---|---|
| Services | Throw on failure (no swallowing) |
| Stages | Propagate errors (don't catch) |
| Pipeline | Catch, log, format response |
| Transport | Format MCP error response |
// WRONG: Swallow and log
await persist().catch(e => log(e)); // Caller thinks it succeeded!
// RIGHT: Let errors propagate
await persist(); // Throws on failureState operations that fail silently cause in-memory/file state divergence—bugs that are nearly impossible to reproduce.
| Layer | Purpose | Location |
|---|---|---|
| Unit | Edge cases, complex logic | tests/unit/ |
| Integration | Module boundaries | tests/integration/ |
| E2E | Full MCP transport | tests/e2e/ |
For new features, write integration tests first:
- Integration tests catch boundary bugs (where most issues live)
- Unit tests add coverage for edge cases
- E2E validates complete user journeys
Why Integration-First?
Unit tests with mocked dependencies can pass while real integration fails. Integration tests use real collaborators, mock only I/O.
| Operation | Target | Actual |
|---|---|---|
| Server startup | <3s | ~2s |
| Tool response | <500ms | ~200-400ms |
| Hot-reload | <100ms | ~50ms |
| Framework switch | <100ms | ~20ms |
- Session cleanup: 24h default expiry
- Argument history: Configurable retention (default: 1000 entries)
- Template cache: LRU with 100-entry limit
- Temporary gates: Auto-expire after execution
This codebase balances strict software engineering patterns (pipelines, contracts, Zod validation) with the flexible nature of AI workflows. It prioritizes:
- User autonomy: Define your own process, don't inherit ours
- Observability: Every stage, every decision is traceable
- Safety: Validation at boundaries, graceful degradation on errors
- Evolvability: Internal structure changes without breaking external API
The architecture enables experimentation (try different methodologies, gates, styles) while maintaining the guard rails that make production use safe.