Add adr

crivetimihai · crivetimihai · commit a0807d89df96 · 2025-06-01T18:35:38.000+01:00
diff --git a/docs/docs/.pages b/docs/docs/.pages
@@ -6,3 +6,4 @@ nav:
   - "🛡️ Manage": manage
   - "💻 Development": development
   - "🧪 Testing": testing
+  - "📐 Architecture": architecture
diff --git a/docs/docs/architecture/.pages b/docs/docs/architecture/.pages
@@ -0,0 +1,4 @@
+title: Architecture
+nav:
+  - Overview: index.md
+  - Decision Records: adr
diff --git a/docs/docs/architecture/adr/.pages b/docs/docs/architecture/adr/.pages
@@ -0,0 +1,12 @@
+title: Decision Records
+nav:
+  - 1 Adopt FastAPI + Pydantic: 001-adopt-fastapi-pydantic.md
+  - 2 Use Async SQLAlchemy ORM: 002-use-async-sqlalchemy-orm.md
+  - 3 Expose Multi-Transport Endpoints: 003-expose-multi-transport-endpoints.md
+  - 4 Combine JWT & Basic Auth: 004-combine-jwt-and-basic-auth.md
+  - 5 Structured JSON Logging: 005-structured-json-logging.md
+  - 6 Gateway & Tool-Level Rate Limiting: 006-gateway-tool-rate-limiting.md
+  - 7 Pluggable Cache Backend: 007-pluggable-cache-backend.md
+  - 8 Federation & Auto-Discovery via DNS-SD: 008-federation-discovery.md
+  - 9 Built-in Health Checks: 000-built-in-health-checks.md
+  - 10 Observability via Prometheus: 010-observability-prometheus.md
diff --git a/docs/docs/architecture/adr/001-adopt-fastapi-pydantic.md b/docs/docs/architecture/adr/001-adopt-fastapi-pydantic.md
@@ -0,0 +1,41 @@
+# ADR-0001: Adopt FastAPI + Pydantic
+
+- *Status:* Accepted
+- *Date:* 2025-02-01
+- *Deciders:* Mihai Criveti
+
+## Context
+
+The MCP Gateway must serve both human and machine clients with low-latency HTTP and WebSocket endpoints. Payloads require runtime validation and schema documentation, while internal data types must align with environment-driven settings and JSON models.
+
+We explored Python-native frameworks that support async-first operation, data validation, OpenAPI generation, and modular service layout.
+
+## Decision
+
+We will adopt:
+
+- **FastAPI** as the core web framework for routing HTTP, WebSocket, and streaming endpoints.
+- **Pydantic v2** for all settings, schemas, and typed data models (e.g., `Tool`, `Resource`, `GatewayMetadata`, etc.).
+
+These will form the foundation for the application layer and public API.
+
+## Consequences
+
+- ✨ Strong typing, runtime validation, and auto-generated OpenAPI specs.
+- 🧩 Unified model structure across internal logic, external APIs, and config parsing.
+- 🚀 Excellent async performance with Uvicorn and Starlette compatibility.
+- 🔒 Tight coupling to Pydantic means future transitions (e.g., to dataclasses or attrs) would be non-trivial.
+
+## Alternatives Considered
+
+| Option | Why Not |
+|--------|---------|
+| **Flask + Marshmallow** | Sync-first architecture, weak async support, manual OpenAPI generation. |
+| **Django REST Framework** | Heavyweight, monolithic, tightly bound to Django ORM, not async-native. |
+| **Tornado or Starlette alone** | More boilerplate to assemble middlewares, validators, and routing. |
+| **Node.js + Fastify** | Excellent performance but requires a split language/runtime and loss of shared model code. |
+| **Pure `httpx` + `uvicorn` + `pydantic-core`** | Too low-level; duplicating FastAPI features manually. |
+
+## Status
+
+This decision has been implemented in the current architecture.
diff --git a/docs/docs/architecture/adr/002-use-async-sqlalchemy-orm.md b/docs/docs/architecture/adr/002-use-async-sqlalchemy-orm.md
@@ -0,0 +1,47 @@
+# ADR-0002: Use Async SQLAlchemy ORM
+
+- *Status:* Accepted
+- *Date:* 2025-02-01
+- *Deciders:* Mihai Criveti
+
+## Context
+
+The gateway must persist:
+
+- Tool metadata
+- Resource configurations
+- Usage metrics
+- Peer discovery and federation state
+
+We require a relational database with schema evolution, strong typing, and async support. The current codebase already uses SQLAlchemy ORM models with an async engine and declarative mapping style.
+
+## Decision
+
+We will use:
+
+- **SQLAlchemy 2.x (async)** for all data persistence.
+- **AsyncSession** and `async with` scoped transactions.
+- **Alembic** for migrations, with autogeneration and CLI support.
+- **SQLite** for development; **PostgreSQL or MySQL** for production via `DATABASE_URL`.
+
+This provides consistent, well-understood relational behavior and integrates cleanly with FastAPI.
+
+## Consequences
+
+- 🧱 Mature and reliable ORM with a wide developer base.
+- 🔄 Fully async I/O stack without thread-pools or blocking.
+- 🔧 Migrations handled declaratively using Alembic.
+- 📄 Pydantic models can be derived from or synchronized with SQLAlchemy models if needed.
+
+## Alternatives Considered
+
+| Option | Why Not |
+|--------|---------|
+| **Raw asyncpg / aiosqlite** | Manual query strings, error-prone joins, no built-in migrations. |
+| **Tortoise ORM / GINO** | Less widely used, more magic, lower confidence in long-term maintainability. |
+| **Django ORM** | Not async-native, tightly coupled to Django ecosystem, too heavyweight. |
+| **NoSQL (e.g., MongoDB)** | No relational guarantees, weaker query language, major refactor from current SQL-based model. |
+
+## Status
+
+This decision is in place and all gateway persistence uses SQLAlchemy 2.x with async support.
diff --git a/docs/docs/architecture/adr/003-expose-multi-transport-endpoints.md b/docs/docs/architecture/adr/003-expose-multi-transport-endpoints.md
@@ -0,0 +1,47 @@
+# ADR-0003: Expose Multi-Transport Endpoints (HTTP / WebSocket / SSE / STDIO)
+
+- *Status:* Accepted
+- *Date:* 2025-02-01
+- *Deciders:* Mihai Criveti
+
+## Context
+
+The MCP Gateway must serve diverse clients: web browsers, CLIs, language-specific SDKs, and headless daemons.
+Different use cases require support for both **request/response** and **streaming** interactions.
+
+Requirements:
+
+- Human-readable RPC over HTTP for developers
+- Low-latency streaming for long-running tools
+- IPC-style invocations for local CLI integration
+- Unified business logic regardless of transport
+
+## Decision
+
+The gateway will support the following built-in transports:
+
+- **HTTP JSON-RPC** (primary RPC interface)
+- **WebSocket** (bidirectional messaging)
+- **SSE (Server-Sent Events)** (for push-only event streaming)
+- **STDIO** (optional local CLI / subprocess transport)
+
+Transport selection is dynamic, based on environment (`TRANSPORT_TYPE`) and route grouping. All transports share the same service layer and authentication mechanisms.
+
+## Consequences
+
+- ✅ Maximum client flexibility, supporting modern browsers and legacy CLI tools.
+- 🔄 Business logic remains decoupled from transport implementation.
+- 📶 Streaming transports (WS, SSE) require timeout, reconnection, and back-pressure handling. Easy expansion with new MCP standards
+
+## Alternatives Considered
+
+| Option | Why Not |
+|--------|---------|
+| **HTTP-only JSON API** | Poor fit for long-lived streaming tasks; requires polling. |
+| **gRPC (HTTP/2)** | Not browser-friendly; requires generated stubs; less discoverable. |
+| **Separate microservices per transport** | Code duplication, diverging implementations, and operational complexity. |
+| **Single transport abstraction** | Reduces explicitness; transport-specific needs get buried in generic interfaces. |
+
+## Status
+
+All four transports are implemented in the current FastAPI application and are toggleable via configuration.
diff --git a/docs/docs/architecture/adr/004-combine-jwt-and-basic-auth.md b/docs/docs/architecture/adr/004-combine-jwt-and-basic-auth.md
@@ -0,0 +1,51 @@
+# ADR-0004: Combine JWT & Basic Auth
+
+- *Status:* Accepted
+- *Date:* 2025-02-01
+- *Deciders:* Core Engineering Team
+
+## Context
+
+The gateway needs to support two types of clients:
+
+- **Browser-based users** using the Admin UI
+- **Headless clients** such as scripts, services, and tools
+
+These use cases require different authentication workflows:
+
+- Browsers prefer form-based login and session cookies.
+- Automation prefers stateless, token-based access.
+
+The current config exposes both:
+
+- `BASIC_AUTH_USER` and `BASIC_AUTH_PASSWORD`
+- `JWT_SECRET_KEY`, `JWT_EXPIRY_SECONDS`, and cookie settings
+
+## Decision
+
+We will combine both authentication modes as follows:
+
+- **Basic Auth** secures access to `/admin`. Upon success, a short-lived **JWT cookie** is issued.
+- **JWT Bearer token** (via header or cookie) is required for all API, WebSocket, and SSE requests.
+- Tokens are signed using the shared `JWT_SECRET_KEY` and include standard claims (sub, exp, scopes).
+- When `AUTH_REQUIRED=false`, the gateway allows unauthenticated access (dev only).
+
+## Consequences
+
+- ✅ Developers can log in once via browser and obtain an authenticated session.
+- ✅ Scripts can use a generated JWT directly, with no credential storage.
+- ❌ Tokens must be signed, rotated, and verified securely (TLS required).
+- 🔄 JWTs expire and must be refreshed periodically by clients.
+
+## Alternatives Considered
+
+| Option | Why Not |
+|--------|---------|
+| **JWT only** | CLI tools need a pre-acquired token; not friendly for interactive login. |
+| **Basic only** | Password sent on every request; cannot easily revoke or expire credentials. |
+| **OAuth2 / OpenID Connect** | Too complex for self-hosted setups; requires external identity provider. |
+| **mTLS client auth** | Secure but heavy; not usable in browsers or simple HTTP clients. |
+
+## Status
+
+This combined authentication mechanism is implemented and enabled by default in the gateway.
diff --git a/docs/docs/architecture/adr/005-structured-json-logging.md b/docs/docs/architecture/adr/005-structured-json-logging.md
@@ -0,0 +1,50 @@
+# ADR-0005: Structured JSON Logging
+
+- *Status:* Accepted
+- *Date:* 2025-02-21
+- *Deciders:* Core Engineering Team
+
+## Context
+
+The gateway must emit logs that:
+
+- Are machine-readable and parseable by tools like ELK, Loki, or Datadog
+- Include rich context (e.g., request ID, auth user, duration)
+- Can be viewed in plaintext locally and JSON in production
+
+Our configuration supports:
+
+- `LOG_FORMAT`: `json` or `plain`
+- `LOG_LEVEL`: standard Python levels
+- `LOG_FILE`: optional log file destination
+
+Logs are initialized at startup via `LoggingService`.
+
+## Decision
+
+Use the Python standard `logging` module with:
+
+- A **custom JSON formatter** for structured logs (e.g. `{"level": "INFO", "msg": ..., "request_id": ...}`)
+- **Plain text output** when `LOG_FORMAT=plain`
+- Per-request context via filters or middleware
+- Global setup at app startup to avoid late binding issues
+
+## Consequences
+
+- 📋 Easily parsed logs suitable for production observability pipelines
+- ⚙️ Compatible with `stdout`, file, or syslog targets
+- 🧪 Local development uses plain logs for readability
+- 🧱 Minimal dependency footprint (no third-party logging libraries)
+
+## Alternatives Considered
+
+| Option | Why Not |
+|--------|---------|
+| **loguru** | Elegant syntax, but non-standard; poor compatibility with Python ecosystem. |
+| **structlog** | Adds context pipeline complexity; not needed for current log volume. |
+| **External sidecar (e.g. Fluent Bit)** | Useful downstream but doesn't solve app-side structure. |
+| **Raw print() statements** | Unstructured, difficult to manage at scale. |
+
+## Status
+
+Structured logging is implemented in `LoggingService`, configurable via environment variables.
diff --git a/docs/docs/architecture/adr/006-gateway-tool-rate-limiting.md b/docs/docs/architecture/adr/006-gateway-tool-rate-limiting.md
@@ -0,0 +1,50 @@
+# ADR-0006: Gateway & Tool-Level Rate Limiting
+
+- *Status:* Accepted
+- *Date:* 2025-02-21
+- *Deciders:* Core Engineering Team
+
+## Context
+
+The MCP Gateway may serve hundreds of concurrent clients accessing multiple tools.
+Without protection, a single client or misbehaving tool could monopolize resources or overwhelm upstream services.
+
+The configuration includes:
+
+- `TOOL_RATE_LIMIT`: default limit in requests/min per tool/client
+- Planned support for Redis-based or database-backed counters
+
+Current implementation is an in-memory token bucket.
+
+## Decision
+
+Implement a **rate limiter at the tool invocation level**, keyed by:
+
+- Tool name
+- Authenticated user / client identity (JWT or Basic)
+- Time window (per-minute by default)
+
+Backend options:
+
+- **Memory** (default for dev / single instance)
+- **Redis** (planned for clustering / shared limits)
+- **Database** (eventually consistent fallback)
+
+## Consequences
+
+- ✅ Prevents abuse, controls cost, and provides predictable fairness
+- 📉 Failed requests return `429 Too Many Requests` with retry headers
+- ❌ Memory backend does not scale across instances; Redis required for HA
+- 🔄 Optional override of limits via config/env for testing
+
+## Alternatives Considered
+
+| Option | Why Not |
+|--------|---------|
+| **No rate limiting** | Leaves gateway and tools vulnerable to overload or accidental DoS. |
+| **Global rate limit only** | Heavy tools can starve lightweight tools; no fine-grained control. |
+| **Proxy-level throttling (e.g. NGINX, Envoy)** | Can’t distinguish tools or users inside payload; lacks granularity. |
+
+## Status
+
+Rate limiting is implemented for tool routes, with `TOOL_RATE_LIMIT` as the default policy.
diff --git a/docs/docs/architecture/adr/007-pluggable-cache-backend.md b/docs/docs/architecture/adr/007-pluggable-cache-backend.md
@@ -0,0 +1,51 @@
+# ADR-0007: Pluggable Cache Backend (memory / Redis / database)
+
+- *Status:* Accepted
+- *Date:* 2025-02-21
+- *Deciders:* Core Engineering Team
+
+## Context
+
+The MCP Gateway uses short-lived caching for:
+
+- Tool responses and resource lookups
+- Peer discovery metadata
+- Temporary session state and rate-limiting
+
+Different deployments require different caching characteristics:
+
+- Dev mode: no external services (in-memory only)
+- Production: clustered and persistent (Redis)
+- Air-gapped: embedded fallback (database table)
+
+The config exposes `CACHE_TYPE=memory|redis|database`.
+
+## Decision
+
+Abstract the caching system via a `CacheBackend` interface and support the following pluggable backends:
+
+- `MemoryCacheBackend`: simple `dict` with TTL, for dev and unit tests
+- `RedisCacheBackend`: shared, centralized cache for multi-node clusters
+- `DatabaseCacheBackend`: uses SQLAlchemy ORM to persist TTL-based records
+
+Selection is driven by the `CACHE_TYPE` environment variable. Code paths use a consistent interface regardless of backend.
+
+## Consequences
+
+- 🔄 Easy to switch cache backend per environment or load profile
+- 🚀 Redis allows horizontal scaling and persistent shared state
+- ❌ Memory cache does not survive restarts or share state
+- 🐢 Database cache is slower, but useful in restricted networks
+
+## Alternatives Considered
+
+| Option | Why Not |
+|--------|---------|
+| **Hardcoded Redis** | Adds operational overhead and single point of failure for dev. |
+| **Memory-only cache** | Incompatible with horizontal scale or restart resilience. |
+| **External CDN or HTTP cache** | Doesn’t address in-process sessions, discovery, or tool state. |
+| **Disk-based cache (e.g., shelve, pickle)** | Complex invalidation and concurrency issues; not cloud-ready. |
+
+## Status
+
+All three cache backends are implemented and the gateway selects one dynamically based on configuration.
diff --git a/docs/docs/architecture/adr/008-federation-discovery.md b/docs/docs/architecture/adr/008-federation-discovery.md
diff --git a/docs/docs/architecture/adr/009-built-in-health-checks.md b/docs/docs/architecture/adr/009-built-in-health-checks.md
diff --git a/docs/docs/architecture/adr/010-observability-prometheus.md b/docs/docs/architecture/adr/010-observability-prometheus.md
diff --git a/docs/docs/architecture/adr/index.md b/docs/docs/architecture/adr/index.md
diff --git a/docs/docs/architecture/index.md b/docs/docs/architecture/index.md