Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions .cursor/rules/mcp-auth.mdc
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
---
description: MCP OAuth auth verification and CIMD conventions
globs: subgraphs/users/src/index.ts, deploy/apollo-mcp-server/mcp.yaml, tests/mcp-auth-verification.md
alwaysApply: false
---

# MCP Auth Configuration

## After making changes to OAuth or MCP auth

When modifying `subgraphs/users/src/index.ts` (the authorization server) or `deploy/apollo-mcp-server/mcp.yaml`, verify changes are working by following `tests/mcp-auth-verification.md`.

To run verification: port-forwards must be active on `localhost:5001` (MCP) and `localhost:4001` (auth). Use the curl commands from the test doc to validate each section. All checks should pass before considering the change complete.

## Key architecture facts

- The Apollo MCP Server binary handles Protected Resource Metadata (RFC 9728) automatically via the `resource` config field in `mcp.yaml`. Do not add `/.well-known/oauth-protected-resource` to the users subgraph.
- The authorization server supports both Client ID Metadata Documents (CIMD) and Dynamic Client Registration (RFC 7591). CIMD is the preferred approach per the MCP spec.
- URL-formatted `client_id` values (HTTPS with a path) trigger the CIMD flow. Non-URL values fall back to the `registeredClients` map.
- `allow_anonymous_mcp_discovery: true` lets clients call `initialize` and `tools/list` without auth. All tool invocations still require a valid OAuth token.
- After rebuilding the users subgraph image, tag it to match the deployment tag in `.image-tag` and restart the deployment.
8 changes: 8 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,14 @@ Learn about the authorization implementation, including:
- Resource-level authorization patterns
- Testing authorization scenarios

### [MCP Production Guide](/docs/mcp-production.md)

Guidance for deploying the Apollo MCP Server in production with a real OAuth 2.1 identity provider:
- Configuring Auth0, Okta, Keycloak, or other IdPs
- Scope strategy and per-operation access control
- Security considerations (HTTPS, token passthrough, audience validation)
- Networking and DNS (no `/etc/hosts` workarounds)

### [Response Caching Guide](/docs/response-caching-guide.md)

Learn about response caching in this architecture, including:
Expand Down
6 changes: 6 additions & 0 deletions deploy/apollo-mcp-server/Chart.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
apiVersion: v2
name: apollo-mcp-server
description: A Helm chart for the Apollo MCP Server in the reference architecture
type: application
version: 0.1.0
appVersion: "latest"
32 changes: 32 additions & 0 deletions deploy/apollo-mcp-server/mcp.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
endpoint: ${env.ROUTER_ENDPOINT:-http://reference-architecture-dev.apollo.svc.cluster.local:80}

transport:
type: streamable_http
port: 8000
stateful_mode: false # Required for mcp-remote compatibility; can be enabled in production
host_validation:
enabled: false # Local dev only — enable with allowed_hosts in production
auth:
servers:
- http://graphql.users.svc.cluster.local:4001
audiences:
- apollo-mcp
allow_any_audience: false
resource: ${env.MCP_RESOURCE_URL:-http://localhost:5001/mcp}
scopes:
- user:read:email
scope_mode: require_any
allow_anonymous_mcp_discovery: true

logging:
level: debug

introspection:
introspect:
enabled: true

operations:
source: local
paths:
- /data/operations/myCart.graphql
- /data/operations/myProfileDetails.graphql
31 changes: 31 additions & 0 deletions deploy/apollo-mcp-server/operations/myCart.graphql
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Fetches the authenticated user's shopping cart with full product details.
# Requires an Authorization header with a valid Bearer token.
query MyCart {
me {
id
cart {
items {
product {
id
upc
title
description
mediaUrl
releaseDate
variants {
id
price
colorway
size
dimensions
weight
}
reviews {
id
body
}
}
}
}
}
}
12 changes: 12 additions & 0 deletions deploy/apollo-mcp-server/operations/myProfileDetails.graphql
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Fetches the authenticated user's profile information.
# Requires an Authorization header with a valid Bearer token.
query MyProfileDetails {
me {
id
shippingAddress
username
email
previousSessions
loyaltyPoints
}
}
7 changes: 7 additions & 0 deletions deploy/apollo-mcp-server/templates/configmap-mcp-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: apollo-mcp-config
data:
mcp.yaml: |-
{{ .Files.Get "mcp.yaml" | indent 4 }}
9 changes: 9 additions & 0 deletions deploy/apollo-mcp-server/templates/configmap-operations.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: apollo-mcp-operations
data:
{{- range $path, $_ := .Files.Glob "operations/**.graphql" }}
{{ base $path }}: |-
{{ $.Files.Get $path | indent 4 }}
{{- end }}
45 changes: 45 additions & 0 deletions deploy/apollo-mcp-server/templates/deployment.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: apollo-mcp-server
labels:
app: apollo-mcp-server
spec:
replicas: 1
selector:
matchLabels:
app: apollo-mcp-server
template:
metadata:
labels:
app: apollo-mcp-server
annotations:
checksum/config: {{ .Files.Get "mcp.yaml" | sha256sum }}
spec:
enableServiceLinks: false
containers:
- name: apollo-mcp-server
image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
imagePullPolicy: {{ .Values.image.pullPolicy }}
command: ["apollo-mcp-server", "/data/mcp.yaml"]
ports:
- containerPort: {{ .Values.service.port }}
protocol: TCP
envFrom:
- secretRef:
name: apollo-mcp-credentials
volumeMounts:
- name: config-volume
mountPath: /data/mcp.yaml
subPath: mcp.yaml
- name: operations-volume
mountPath: /data/operations
resources:
{{- toYaml .Values.resources | nindent 12 }}
volumes:
- name: config-volume
configMap:
name: apollo-mcp-config
- name: operations-volume
configMap:
name: apollo-mcp-operations
14 changes: 14 additions & 0 deletions deploy/apollo-mcp-server/templates/service.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
apiVersion: v1
kind: Service
metadata:
name: apollo-mcp-server
labels:
app: apollo-mcp-server
spec:
type: {{ .Values.service.type }}
ports:
- port: {{ .Values.service.port }}
targetPort: {{ .Values.service.port }}
protocol: TCP
selector:
app: apollo-mcp-server
16 changes: 16 additions & 0 deletions deploy/apollo-mcp-server/values.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
image:
repository: ghcr.io/apollographql/apollo-mcp-server
tag: latest
pullPolicy: Always

service:
type: ClusterIP
port: 8000

resources:
limits:
cpu: 500m
memory: 512Mi
requests:
cpu: 100m
memory: 128Mi
76 changes: 76 additions & 0 deletions docs/TODO-split-auth-from-users.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# TODO: Split Auth Service from Users Subgraph

## Problem

The users subgraph (`subgraphs/users/`) currently serves three distinct roles: user data, identity/login, and OAuth 2.0 authorization server. The `index.ts` is 287 lines, ~60% auth infrastructure. In-memory OAuth state forces `replicaCount: 1`.

## Architecture After Split

```mermaid
graph TB
subgraph authService["auth subgraph (new)"]
direction TB
oauthServer["OAuth 2.0 Server<br/>/register, /authorize, /token"]
jwksEndpoint["JWKS + OAuth metadata"]
loginMutation["login mutation"]
end

subgraph usersSubgraph["users subgraph (cleaned)"]
direction TB
userType["User type, me/user queries<br/>__resolveReference"]
end

Router -->|"JWKS"| jwksEndpoint
MCPServer -->|"OAuth flow"| oauthServer
OrdersSubgraph -->|"JWKS"| jwksEndpoint
CheckoutSubgraph -->|"JWKS"| jwksEndpoint
loginMutation -.->|"federation entity ref"| userType
```

The auth subgraph participates in the supergraph (contributes `login` mutation, `LoginResponse` types) so the client app needs zero changes. It references `User` via `@key(fields: "id", resolvable: false)` -- the users subgraph resolves the full entity.

## Tasks

### 1. Create the auth subgraph (`subgraphs/auth/`)

- [ ] Create `subgraphs/auth/` with `schema.graphql` (login mutation, LoginResponse types, User entity stub), `src/index.ts` (Express with all OAuth routes moved from users, login mutation resolver, JWKS endpoint, renderLoginPage), `keys/` (copy from users), `package.json`, `tsconfig.json`, `Dockerfile`, `deploy/` Helm chart (port 4011, replicaCount 1)
- [ ] Create `subgraphs/auth/src/credentials.ts` with minimal user credential data (id, username, scopes only) for login validation -- keeps domain boundary clean vs importing full user profile data
- [ ] Carry over Client ID Metadata Document (CIMD) support: the `isUrlClientId`, `fetchClientMetadata`, `cimdCache`, SSRF guards, `CimdDisplayInfo`, and the CIMD-aware logic in `/authorize` and `/token` handlers. Ensure `client_id_metadata_document_supported: true` is included in the AS metadata endpoint

### 2. Clean up the users subgraph

- [ ] Clean `subgraphs/users/src/index.ts`: remove all OAuth routes, renderLoginPage, getIssuer, OAuthParams, in-memory OAuth stores, crypto/readFile/createPrivateKey imports, users data import. Revert from Express to `startStandaloneServer`. Keep JWT verification in context middleware (same keys work)
- [ ] Clean `subgraphs/users/src/resolvers/index.ts`: remove login mutation resolver, LoginResponse type resolver, jose/readFile/createPrivateKey imports. Keep `Query.user`, `Query.me`, `User.__resolveReference`
- [ ] Clean `subgraphs/users/schema.graphql`: remove Mutation type (login), LoginResponse union, LoginSuccessful, LoginFailed types
- [ ] Set users subgraph back to `replicaCount: 3` in `values.yaml` since it no longer holds in-memory OAuth state

### 3. Update JWKS and auth references

- [ ] Update JWKS URL from `graphql.users.svc.cluster.local:4001` to `graphql.auth.svc.cluster.local:4011` in: `deploy/operator-resources/supergraph-dev.yaml`, `deploy/operator-resources/supergraph-prod.yaml`, `subgraphs/orders/src/index.ts`, `subgraphs/checkout/src/index.ts`
- [ ] Update `deploy/apollo-mcp-server/mcp.yaml` `auth.servers` and `scripts/minikube/12-deploy-mcp-server.sh` to reference auth service instead of users

### 4. Update deployment scripts

- [ ] Add `auth` to SUBGRAPHS array in `scripts/minikube/05-deploy-subgraphs.sh` and image build list in `scripts/minikube/04-build-images.sh`

### 5. Update documentation

- [ ] Update `docs/setup.md` (port-forward auth:4011 instead of users:4001, `/etc/hosts` entry), `docs/mcp-production.md`, `README.md` references

## Key Files Reference

| File | Change |
|------|--------|
| `subgraphs/users/src/index.ts` | Remove OAuth routes, revert to `startStandaloneServer` |
| `subgraphs/users/src/resolvers/index.ts` | Remove `login` mutation, `LoginResponse` |
| `subgraphs/users/schema.graphql` | Remove `login`, `LoginResponse`, `LoginSuccessful`, `LoginFailed` |
| `deploy/operator-resources/supergraph-dev.yaml` | Change JWKS URL to auth service |
| `deploy/operator-resources/supergraph-prod.yaml` | Change JWKS URL to auth service |
| `deploy/apollo-mcp-server/mcp.yaml` | Change `auth.servers` to auth service |
| `subgraphs/orders/src/index.ts` | Update `JWKS_URL` |
| `subgraphs/checkout/src/index.ts` | Update `JWKS_URL` |
| `scripts/minikube/04-build-images.sh` | Add `auth` to build list |
| `scripts/minikube/05-deploy-subgraphs.sh` | Add `auth` to SUBGRAPHS array |
| `scripts/minikube/12-deploy-mcp-server.sh` | Update port-forward instructions |
| `docs/setup.md` | Update `/etc/hosts` and port-forward instructions |
| `docs/mcp-production.md` | Update references |
84 changes: 84 additions & 0 deletions docs/debugging.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ This guide covers common issues and debugging steps for the reference architectu
- [Schema Not Pushed to Registry](#schema-not-pushed-to-registry)
- [Image Tag Issues](#image-tag-issues)
- [Network and DNS Issues](#network-and-dns-issues)
- [Composition Failures](#composition-failures)
- [Quick Debug Scripts](#quick-debug-scripts)

## Registry Setup Issues
Expand Down Expand Up @@ -455,6 +456,85 @@ Response caching requires the router to have a configured TTL **and** for subgra
- [Response Caching Quickstart](https://www.apollographql.com/docs/graphos/routing/performance/caching/response-caching/quickstart)
- [Response Cache Customization](https://www.apollographql.com/docs/graphos/routing/performance/caching/response-caching/customization)

## Composition Failures

### `MISSING_TRANSITIVE_AUTH_REQUIREMENTS`

**Symptoms:**
- `SupergraphSchema` condition shows `MalformedSchema` / `CompositionPending: False`
- `kubectl describe supergraphschema reference-architecture-dev -n apollo` shows an error like:
```
composition failures: [CompositionError { message: "[shipping-shipping] Field \"Order.shippingCost\"
does not specify necessary @authenticated, @requiresScopes and/or @policy auth requirements to
access the transitive field \"Order.buyer\" data from @requires selection set.",
code: Some("MISSING_TRANSITIVE_AUTH_REQUIREMENTS") }]
```

**Cause:**

Federation composition enforces a transitive authorization rule: if a field uses `@requires` to read data that is transitively protected by `@authenticated`, `@requiresScopes`, or `@policy` in another subgraph, the field itself must declare the matching auth directive.

In this case:
- `users` subgraph declares `type User @authenticated` — the entire `User` type is auth-gated.
- `shipping` subgraph's `Order.shippingCost` uses `@requires(fields: "... buyer { shippingAddress }")`, which reads `User.shippingAddress` from the auth-protected `User` entity.
- Because `shippingCost` reads through an `@authenticated` boundary, it must also declare `@authenticated`.

**Debug Steps:**

1. **Identify the failing subgraph and field from the error message** — the format is `[subgraph-name] Field "Type.field" ...`.

2. **Inspect the `@requires` selection set on the failing field:**
```bash
cat subgraphs/shipping/schema.graphql
```
Look for the field's `@requires(fields: "...")` — note which external types/fields it reads.

3. **Find where those referenced fields are defined and check their auth directives:**
```bash
# Example: check if User type has @authenticated in users subgraph
grep -n "@authenticated\|@requiresScopes\|@policy" subgraphs/users/schema.graphql
```

4. **Check the `SupergraphSchema` status for the full composition error:**
```bash
kubectl describe supergraphschema reference-architecture-dev -n apollo
```

**Solution:**

Add the matching auth directive to the field that has `@requires`, and import it in the subgraph's `@link` declaration.

Example fix for `Order.shippingCost` in `subgraphs/shipping/schema.graphql`:

```graphql
# Before
extend schema
@link(
url: "https://specs.apollo.dev/federation/v2.5"
import: ["@key", "@external", "@requires"]
)

type Order @key(fields: "id") {
shippingCost: Float
@requires(fields: "items { weight } buyer { shippingAddress }")
}

# After
extend schema
@link(
url: "https://specs.apollo.dev/federation/v2.5"
import: ["@key", "@external", "@requires", "@authenticated"]
)

type Order @key(fields: "id") {
shippingCost: Float
@authenticated
@requires(fields: "items { weight } buyer { shippingAddress }")
}
```

After updating the schema, rebuild and redeploy the affected subgraph. The operator will detect the new SDL hash on the `Subgraph` CRD, re-run composition, and the `SupergraphSchema` condition should transition from `MalformedSchema` to `Available`.

## Quick Debug Scripts

### Complete Registry Debug
Expand Down Expand Up @@ -568,6 +648,10 @@ cat .image-tag 2>/dev/null || echo ".image-tag file not found"
**Cause:** Tag in `.image-tag` is empty or too short
**Solution:** Re-run `04-build-images.sh` to regenerate tag

### `MISSING_TRANSITIVE_AUTH_REQUIREMENTS` (composition error)
**Cause:** A field's `@requires` selection set transitively reads data protected by `@authenticated`, `@requiresScopes`, or `@policy` in another subgraph, but the field itself does not declare the same auth directive.
**Solution:** Add the matching auth directive to the field and import it in the subgraph's `@link`. See [Composition Failures](#composition-failures).

## Getting Help

If you're still stuck after trying these debugging steps:
Expand Down
Loading