-
Notifications
You must be signed in to change notification settings - Fork 149
Add option to skip Virtual MCP CRDs installation #2729
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
JAORMX
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also add an environment variable to disable the controllers if this is enabled.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2729 +/- ##
==========================================
- Coverage 56.48% 56.38% -0.11%
==========================================
Files 319 319
Lines 30943 30992 +49
==========================================
- Hits 17479 17475 -4
- Misses 11960 12012 +52
- Partials 1504 1505 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
JAORMX
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amirejaz what about the virtual MCP controllers? shouldn't those be handled via a boolean as well? I was thinking the helm boolean would set the ENABLE_VMCP (or similar) environment variable in the main operator deployment and thus allow us to toggle easily. I'm not sure we should expose environment variables like this, instead, we should use helm flags with proper documentation.
|
@dmartinol FYI you were interested in reviewing this on our call |
| # -- Install Virtual MCP CRDs (VirtualMCPServer and VirtualMCPCompositeToolDefinition). | ||
| # Users who only need core MCP server management can set this to false to skip | ||
| # installing Virtual MCP aggregation features (saves ~54KB of CRDs). | ||
| virtualMCP: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls add the same implementation to disable the registry and server CRDs according to the specs at #2564
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, will update the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
dmartinol
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my concern is that there are mixed CRDs and controllers that seem not to be managed, at least not as defined in the #2564
and I fear we should also anticipate the removal of unneeded controllers from the platform, as requested in #2662 so that we can reduce the number of controllers to handle by these flags
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these files are generated, it's not enough renaming them, you should update the operator-manifests task.
anyway, these
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated operator-manifest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a file to be edited IMO, ask @ChrisJBurns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You shouldn't need to modify this file unless you're providing information that you want generated onto the README.md. In this case it may be ok. Otherwise if it's only on the readme.md it will get overridden by the gotmpl
| helm uninstall <release_name> | ||
| ``` | ||
|
|
||
| ### Skipping Virtual MCP CRDs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the scope should be extended to disable registry CRD
dmartinol
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume @jhrozek will also share his thoughts here.
Once we agree on the design, pls also update the ITs to re-run the existing tests with the other features disabled
.github/workflows/operator-ci.yml
Outdated
| id: git-check | ||
| run: | | ||
| git diff --exit-code deploy/charts/operator-crds/crds || echo "crd-changes=true" >> $GITHUB_OUTPUT | ||
| git diff --exit-code deploy/charts/operator-crds/crds deploy/charts/operator-crds/crds-server deploy/charts/operator-crds/crds-registry deploy/charts/operator-crds/crds-virtualmcp || echo "crd-changes=true" >> $GITHUB_OUTPUT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about using regex like:
... deploy/charts/operator-crds/crds* ...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| @@ -0,0 +1,5 @@ | |||
| {{- if .Values.crds.install.server }} | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, external auth can also be used with virtual MCPs, so this should be in OR with .Values.crds.install.virtualMCP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
.github/workflows/operator-ci.yml
Outdated
| id: git-check | ||
| run: | | ||
| git diff --exit-code deploy/charts/operator-crds/crds || echo "crd-changes=true" >> $GITHUB_OUTPUT | ||
| git diff --exit-code deploy/charts/operator-crds/crds deploy/charts/operator-crds/crds-server deploy/charts/operator-crds/crds-registry deploy/charts/operator-crds/crds-virtualmcp || echo "crd-changes=true" >> $GITHUB_OUTPUT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW: why aren't we keeping all the original CRDs in the same folder instead? This would simplify all the involved steps (of course, ITs should list all the individual files)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we keep all of them under crds/, it automatically installs all the crds. I moved all of them in one folder crd-files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is what I meant: the same folder but different from the one used as the chart template
| CRDDirectoryPaths: []string{filepath.Join("..", "..", "..", "..", "deploy", "charts", "operator-crds", "crds")}, | ||
| CRDDirectoryPaths: []string{ | ||
| filepath.Join("..", "..", "..", "..", "deploy", "charts", "operator-crds", "crds-server"), | ||
| filepath.Join("..", "..", "..", "..", "deploy", "charts", "operator-crds", "crds-virtualmcp"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see MCPGroup is needed here because the association is bottom-up, from server to group: the server controller should take this into account and avoid, e.g., using validateGroupRef
| testEnv = &envtest.Environment{ | ||
| CRDDirectoryPaths: []string{filepath.Join("..", "..", "..", "..", "deploy", "charts", "operator-crds", "crds")}, | ||
| CRDDirectoryPaths: []string{ | ||
| filepath.Join("..", "..", "..", "..", "deploy", "charts", "operator-crds", "crds-server"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if this is logically correct, I imagine the tests would work even w/o the server CRD.
Anyway, I imagine that the vMCP controllers should fail to start if the server functionality is disabled
cmd/thv-operator/main.go
Outdated
| // Check feature flags | ||
| enableServer := isFeatureEnabled("ENABLE_SERVER", true) | ||
| enableRegistry := isFeatureEnabled("ENABLE_REGISTRY", true) | ||
| enableAggregation := isFeatureEnabled("ENABLE_AGGREGATION", true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think enableAggregation also depends on enableServer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is going to become quite brittle quite quickly. Could we have a simple dependency Go map where we express the dependencies?
Another possible simplification would be to not implement ENABLE_SERVER for now but only ENABLE_REGISTRY and ENABLE_AGGREGATION and treat ENABLE_SERVER as a dependency.
@dmartinol did you have a use case for ENABLE_REGISTRY only? I know it was in the original issue but I wasn't sure if it was just for completeness
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dmartinol did you have a use case for ENABLE_REGISTRY only? I know it was in the original issue but I wasn't sure if it was just for completeness
I imagine it would be for those wanting to deploy an "official" registry in k8s without caring about the toolhive tools. The same could be probably achieved via an ad-hoc deployment that someone tracked in the design document as:
In Kubernetes via a Helm chart (Optional, if we have enough time)
(I'm not sure this is tracked for the MVP, anyway)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is going to become quite brittle quite quickly. Could we have a simple dependency Go map where we express the dependencies?
Move the dependencies to the go map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jhrozek
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to digest the changes, better, submitting a first pass
| --set crds.install.virtualMCP=false | ||
| ``` | ||
|
|
||
| This saves approximately 54KB of CRDs and is useful for deployments that don't require Virtual MCP aggregation features. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not include the size, it screams "AI generated".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
cmd/thv-operator/main.go
Outdated
|
|
||
| // setupServerControllers sets up server-related controllers (MCPServer, MCPExternalAuthConfig, MCPRemoteProxy, ToolConfig) | ||
| func setupServerControllers(mgr ctrl.Manager, enableRegistry bool) error { | ||
| // Create a shared platform detector for all controllers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least the comment is a bit misleading. I wonder if there is a reason not to pass the detector down from the caller, IIRC it already uses sync.Once?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
passed platform detector from caller
cmd/thv-operator/main.go
Outdated
| if !found { | ||
| return defaultValue | ||
| } | ||
| return strings.EqualFold(value, "true") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strconv.ParseBool ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
cmd/thv-operator/main.go
Outdated
| // Check feature flags | ||
| enableServer := isFeatureEnabled("ENABLE_SERVER", true) | ||
| enableRegistry := isFeatureEnabled("ENABLE_REGISTRY", true) | ||
| enableAggregation := isFeatureEnabled("ENABLE_AGGREGATION", true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is going to become quite brittle quite quickly. Could we have a simple dependency Go map where we express the dependencies?
Another possible simplification would be to not implement ENABLE_SERVER for now but only ENABLE_REGISTRY and ENABLE_AGGREGATION and treat ENABLE_SERVER as a dependency.
@dmartinol did you have a use case for ENABLE_REGISTRY only? I know it was in the original issue but I wasn't sure if it was just for completeness
| value: {{ .Values.operator.features.server | quote }} | ||
| - name: ENABLE_REGISTRY | ||
| value: {{ .Values.operator.features.registry | quote }} | ||
| - name: ENABLE_AGGREGATION |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we just call this ENABLE_VMCP ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this move is correct, MCPGroup is usable even without vMCP, moving it here would break the use-case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what use case is it?
cmd/thv-operator/Taskfile.yml
Outdated
| # Move server CRDs | ||
| - cmd: | | ||
| mv deploy/charts/operator-crds/crds/toolhive.stacklok.dev_mcpservers.yaml deploy/charts/operator-crds/crds-server/ 2>/dev/null || true | ||
| mv deploy/charts/operator-crds/crds/toolhive.stacklok.dev_mcpexternalauthconfigs.yaml deploy/charts/operator-crds/crds-server/ 2>/dev/null || true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we ignoring errors?
2ceca6b to
215e33a
Compare
215e33a to
d2c3613
Compare
| helm uninstall <release_name> | ||
| ``` | ||
|
|
||
| ### Skipping CRDs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great but maybe a summary table could be more effective.
Can be fixed in another PR if we think it's needed.
| # Users who only need server management without registry features can set this to false | ||
| # to skip installing the registry CRD. | ||
| registry: true | ||
| # -- Install Virtual MCP CRDs (VirtualMCPServer and VirtualMCPCompositeToolDefinition). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and MCPGroup
| - get | ||
| - list | ||
| - watch | ||
| {{- if not .Values.operator.testMode }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you explain where this operator.testMode come from now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
operator.testMode comes from our chart-testing setup and is used in CI to disable operator RBAC/resources to avoid Helm ownership conflicts during ct install.
dmartinol
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm leaving final approval to @jhrozek
Thanks!
Add Feature Flags for Controller Groups and Optional Virtual MCP CRD Installation
Summary
This PR implements feature flags to enable/disable controller groups in the ToolHive operator and adds the ability to skip installing Virtual MCP CRDs. This allows users to deploy a minimal operator configuration when they only need core MCP server management features.
Related Issues
Changes
1. Feature Flags Implementation
Added three environment variable flags to control controller groups:
ENABLE_SERVER(default:true)ENABLE_REGISTRY(default:true)ENABLE_AGGREGATION(default:true)ENABLE_SERVER=true(validated with warning log)2. Optional Virtual MCP CRD Installation
Added
crds.install.virtualMCPoption to thetoolhive-operator-crdsHelm chart:false: Skips installing VirtualMCPServer and VirtualMCPCompositeToolDefinition CRDsENABLE_AGGREGATION=falsein the operator to prevent controller errors3. Code Refactoring
Refactored
setupControllersAndWebhooksfunction into smaller, focused functions:setupServerControllers()- Sets up server-related controllerssetupRegistryController()- Sets up registry controllersetupAggregationControllers()- Sets up aggregation controllers and webhooksThis improves maintainability and makes the code easier to test.
4. Image Validation Logic
Fixed image validation mode selection:
ENABLE_REGISTRY=true: MCPServer usesImageValidationRegistryEnforcingENABLE_REGISTRY=false: MCPServer usesImageValidationAlwaysAllowUsage Examples
Minimal Deployment (Core Server Management Only)
Registry-Only Deployment
# Enable only server and registry features helm upgrade -i toolhive-operator oci://ghcr.io/stacklok/toolhive/toolhive-operator \ -n toolhive-system --create-namespace \ --set operator.env.ENABLE_AGGREGATION=falseBackward Compatibility
truewhen not settrue(installs all CRDs)Documentation
operator-crds/README.mdwith feature flags documentationoperator/values.yamlwith feature flag commentsTesting