-
Notifications
You must be signed in to change notification settings - Fork 149
Add proposal for operator registry integration via Gateway API #2591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
364577d to
7bf59f7
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2591 +/- ##
=======================================
Coverage 56.45% 56.45%
=======================================
Files 319 319
Lines 30882 30882
=======================================
Hits 17433 17433
Misses 11950 11950
Partials 1499 1499 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
rdimitrov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM but let's make sure we get this reviewed by the others too before going through with the implementation 👍
| } | ||
| }, | ||
| "_meta": { | ||
| "io.modelcontextprotocol.registry/official": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That part is probably obsolete as this metadata is supposed to come if we were to get an MCP server from the upstream registry.
dmartinol
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice proposal, reminds me a full MCP gateway now 🤔
|
|
||
| **Metadata Annotations** (on HTTPRoute or MCP resource): | ||
| 4. **`toolhive.stacklok.dev/registry-tier`**: Server classification ("Official", "Community", "Partner") | ||
| 5. **`toolhive.stacklok.dev/registry-tools`**: Comma-separated list of tool names (e.g., "create_pr,merge_pr,list_issues") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why should we repeat the tools and tags if the registry already advertise them in the registered server? (e.g. the server with OCI package should be already there, right?)
maybe the annotations should connect the MCPServer to the registered server package with annotations sharing the name and version of the server?
BTW: of course also possible tool configs should be taken into account for filtering and renaming
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, please review the possible overlaps with #2105
e.g., how many remotes are we going to advertise in the registry for a single MCPServer? I guess 1 for in-cluster consumers and, optionally, 1 for external consumers (this proposal)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the tools and tags from the proposal.
| 2. Extracts the base MCP path from HTTPRoute rules | ||
| 3. Traverses to parent Gateway resource(s) to get external hostname/IP | ||
| 4. Constructs the full external endpoint URL | ||
| 5. Creates an entry in the MCPRegistry pointing to this external endpoint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the detail we need to hash out. For the fallback approach, I think things are slightly simpler because the time which the Operator creates the MCP server, it is aware of what the URL is because its provided as an annotation. This way, the Operator can send a single request to the registry server, publishing information about the newly deployed and running MCP server, including it's externally accessible URL.
With the primary approach, I'm not sure of the sequence flow. The Operator will publish information to the registry server when it creates the MCP server, but the publishing of information around the externally accessible URL may not be known at the time because the Gateway API resources may be created by the user later on. This will result in a subsequent request to the registry server that denotes that there is now an externally accessible URL and to save it in the database.
I'm not saying this is a problem, I just think we need to iron out the specifics in the lower levels of the requests that will be sent in the scenarios. Possibly a sequence diagram will help here? This will also need to be factored into the registry server API design that @blkt has been steering so that if we do want to allow for the subsequent requests, what does the API look like for that.
|
|
||
| 1. Detects the annotated HTTPRoute and validates it's accepted by the Gateway controller | ||
| 2. Extracts the base MCP path from HTTPRoute rules | ||
| 3. Traverses to parent Gateway resource(s) to get external hostname/IP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Do we think users will be ok with the Operator being able to read the Gateway API resources and traversing them to build a URL? For MCP related resources its perhaps not a problem, but for others, it seems a bit "grabby"
- Do we need to traverse to the Gateway resource to get hostname? The hostnames are provided in the
HTTPRouteright? Perhaps we don't need to go any further to avoid grabbing what we don't need (see point 1). Or are you saying that we may need to traverse because thehostsare optional in theHTTPRouteand can contain wildcards?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- yes, I think so. We can make this optional via a flag as @dmartinol suggested.
- yes we do, I don't see any other way to get a canonical name given the optionality of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For 1), I'm wondering how we feature flag the RBAC roles that the Operator has. Currently they are auto-generated using kubebuilder, if we have any Helm conditionals they will be overwritten.
docs/proposals/THV-2591-operator-registry-integration-via-gateway-api.md
Outdated
Show resolved
Hide resolved
docs/proposals/THV-2591-operator-registry-integration-via-gateway-api.md
Outdated
Show resolved
Hide resolved
|
@dmartinol I mean the Gateway API from Kubernetes. The idea is to get MCPServers auto-enrolled in the registry using HTTPRoutes. |
Auto-enrollment was also planned by #2105 |
|
Finally, please consider that this feature should be disabled accordingly to #2564 |
7bf59f7 to
339d165
Compare
Right, this basically provides a way of implementing that which sysadmins to choose what to expose and what not. |
Right, that makes sense. Though this doesn't really introduce new CRDs |
This proposal enables the ToolHive Kubernetes operator to automatically populate the MCP registry with externally accessible endpoints for MCPServer, MCPRemoteProxy, and VirtualMCPServer resources. Key features: - Gateway API HTTPRoute annotation-based discovery (primary approach) - Direct URL annotation fallback (for non-Gateway API users) - Explicit opt-in model for administrator control - Upstream MCP registry format with ToolHive publisher extensions - Kubernetes Events for discovery tracking The approach maintains clear separation between ingress configuration and MCP resource definitions, following ToolHive's design principles. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
339d165 to
c99a64a
Compare
I see, but the auto-registration feature must be disabled if the registry is not enabled |
Implements Phase 1 of operator registry export (PR #2591). The controller watches MCPServer, MCPRemoteProxy, and VirtualMCPServer resources for registry export annotations and aggregates them into a ConfigMap per namespace in the upstream MCP registry format. Key features: - Annotation-based export using `toolhive.stacklok.dev/registry-url` - Per-namespace ConfigMap with `{namespace}-registry-export` naming - Feature toggle via Helm value `operator.features.registryExport` or env var `ENABLE_REGISTRY_EXPORT` (disabled by default) - Stable checksums computed only on server entries (not timestamps) - Deterministic output with sorted entries - Efficient ConfigMap watches using label predicate filter 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Implements Phase 1 of operator registry export (PR #2591). The controller watches MCPServer, MCPRemoteProxy, and VirtualMCPServer resources for registry export annotations and aggregates them into a ConfigMap per namespace in the upstream MCP registry format. Key features: - Annotation-based export using `toolhive.stacklok.dev/registry-url` - Per-namespace ConfigMap with `{namespace}-registry-export` naming - Feature toggle via Helm value `operator.features.registryExport` or env var `ENABLE_REGISTRY_EXPORT` (disabled by default) - Stable checksums computed only on server entries (not timestamps) - Deterministic output with sorted entries - Efficient ConfigMap watches using label predicate filter 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Implements Phase 1 of operator registry export (PR #2591). The controller watches MCPServer, MCPRemoteProxy, and VirtualMCPServer resources for registry export annotations and aggregates them into a ConfigMap per namespace in the upstream MCP registry format. Key features: - Annotation-based export using `toolhive.stacklok.dev/registry-url` - Per-namespace ConfigMap with `{namespace}-registry-export` naming - Feature toggle via Helm value `operator.features.registryExport` or env var `ENABLE_REGISTRY_EXPORT` (disabled by default) - Stable checksums computed only on server entries (not timestamps) - Deterministic output with sorted entries - Efficient ConfigMap watches using label predicate filter 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Implements Phase 1 of operator registry export (PR #2591). The controller watches MCPServer, MCPRemoteProxy, and VirtualMCPServer resources for registry export annotations and aggregates them into a ConfigMap per namespace in the upstream MCP registry format. Key features: - Annotation-based export using `toolhive.stacklok.dev/registry-url` - Per-namespace ConfigMap with `{namespace}-registry-export` naming - Feature toggle via Helm value `operator.features.registryExport` or env var `ENABLE_REGISTRY_EXPORT` (disabled by default) - Stable checksums computed only on server entries (not timestamps) - Deterministic output with sorted entries - Efficient ConfigMap watches using label predicate filter 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Implements Phase 1 of operator registry export (PR #2591). The controller watches MCPServer, MCPRemoteProxy, and VirtualMCPServer resources for registry export annotations and aggregates them into a ConfigMap per namespace in the upstream MCP registry format. Key features: - Annotation-based export using `toolhive.stacklok.dev/registry-url` - Per-namespace ConfigMap with `{namespace}-registry-export` naming - Feature toggle via Helm value `operator.features.registryExport` or env var `ENABLE_REGISTRY_EXPORT` (disabled by default) - Stable checksums computed only on server entries (not timestamps) - Deterministic output with sorted entries - Efficient ConfigMap watches using label predicate filter 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
| **For servers FROM a known registry:** | ||
| - The operator looks up the image reference in configured MCPRegistry sources | ||
| - If found, it adds a `remotes` entry to the existing server entry | ||
| - Metadata (name, description, version) comes from the source registry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
concern: I believe we need to clarify the assumption underlying this scenario.
This assumes that
- it is possible to find out whether an MCP server comes from a known Registry
- it is possible to update an existing entry
Point 1. might be complicated but achievable, but point 2. is not currently possible.
Regarding point 2., as of now we implement Upstream API in ToolHive Registry Server in pretty strictly and only add support for deleting MCP server entries (which is in Upstream Registry API spec but they do not support it). This entails that updates can only be implemented as delete-create cycles. This might not be ideal but we might live with it for now.
Regarding point 1., it is not clear to me how this can be achieved, could you clarify? It would be great if you could do it here, meanwhile I'll have a look at #2792.
This proposal enables automatic MCP registry population based on Gateway API resources, making it easy for administrators to expose and discover MCP servers.
Summary
The operator will watch Gateway API HTTPRoute resources (annotated with
toolhive.stacklok.dev/registry-export: "true") to discover which MCP servers are externally accessible. It constructs the full endpoint URL by traversing to the parent Gateway resource and creates registry entries in the upstream MCP Registry format.Key Features
registry-exportannotation are discoveredTwo Discovery Methods
Annotations
Discovery control:
toolhive.stacklok.dev/registry-export: Enable discoverytoolhive.stacklok.dev/registry-url: Direct URL (fallback)toolhive.stacklok.dev/registry-name: Override entry nameMetadata:
toolhive.stacklok.dev/registry-tier: "Official" | "Community" | "Partner"toolhive.stacklok.dev/registry-tools: Comma-separated tool namestoolhive.stacklok.dev/registry-tags: Comma-separated tagstoolhive.stacklok.dev/registry-description: Human-readable descriptionDesign Decisions
Implementation Phases
🤖 Generated with Claude Code