-
Notifications
You must be signed in to change notification settings - Fork 69
SPLAT-2341: Add feature gate support in the cloud-config sync controller #400
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@mtulio: This pull request references SPLAT-2341 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set. In response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Skipping CI for Draft Pull Request. |
/test all |
@mtulio: This pull request references SPLAT-2341 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
daa3141
to
15fff4a
Compare
Local |
Hey Mike and Joel, PTAL if I followed the expected approach for FG in the config sync controller? I will use it to gate the CCM feature NLB+SG. thanks /assign @JoelSpeed @elmiko |
@mtulio: This pull request references SPLAT-2341 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@mtulio: This pull request references SPLAT-2341 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
15fff4a
to
3b032b3
Compare
The last version was introducing a panic when closing the channel twice, I just fixed it in the last commit as we are now calling /test e2e-aws-ovn |
e2e-aws-ovn is green, testing all others. /test all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General direction here looks fine, I guess individual transformers will need to be updated to handle enabling/disabling certain features?
When the configmap is changed, do we somehow reload/restart the deployment with the new config?
3b032b3
to
9b983c1
Compare
Thanks for review, @JoelSpeed .
The idea of PR #391, this commit specifically will enhance a feature to the AWS provider to the NLB+SG feature we are working on. Is it ok for you?
Honestly I did not follow the question. Would you mind sharing which configuration your are referring to? I thought the assessor would take care of that restart when the FeatureGate API is changed, but it was an assumption based in the initial exploration. Let me explore how the |
/test all |
Running locally the test is passing, checking if test |
yeah, that's a flake. Still investigating it. The test failed now is /test unit |
9b983c1
to
c50b1b5
Compare
After some agent-peering review I am observing test 'flaking'/leaking cloud-config, the latest version is preventing randomly failures by enforcing deleting it. I am not sure if this is a parallel execution issue, still checking. /test unit |
/test unit |
1 similar comment
/test unit |
okay, looks like unit arent flake now. I am investigating if new tests were introducing it. |
Confirming that the controller is restarting after config gate is changed ( in this case the CCCMO container): $ oc get pods -n openshift-cloud-controller-manager-operator -l k8s-app=cloud-manager-operator -o json | jq -r '.items[].status.containerStatuses[] | select(.name=="cluster-cloud-controller-manager").state'
{
"running": {
"startedAt": "2025-08-27T18:37:55Z"
}
}
$ oc patch featuregate cluster --type='merge' -p='{
"spec": {
"featureSet": "CustomNoUpgrade",
"customNoUpgrade": {
"disabled": [
"VSphereMultiNetworks"
]
}
}
}'
....
$ oc logs -n openshift-cloud-controller-manager-operator cluster-cloud-controller-manager-operator-696bf58c7b-xfzw7 -c cluster-cloud-controller-manager
....
0827 19:39:28.274659 1 main.go:203] "FeatureGates changed" enabled=...
I0827 19:39:28.274747 1 main.go:214] "FeatureGates initialized" logger="CCMOperator.setup"...
I0827 19:39:28.274995 1 event.go:377] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-cloud-controller-manager", Name:"aws-cloud-controller-manager", UID:"06bc6486-c285-482f-aad7-108d30e47085"...
I0827 19:39:28.275043 1 main.go:245] "starting manager" logger="CCMOperator.setup"
I0827 19:39:28.275137 1 server.go:208] "Starting metrics server" logger="CCMOperator.controller-runtime.metrics"
....
$ oc get pods -n openshift-cloud-controller-manager-operator -l k8s-app=cloud-manager-operator -o json | jq -r '.items[].status.containerStatuses[] | select(.name=="cluster-cloud-controller-manager").state'
{
"running": {
"startedAt": "2025-08-27T19:39:28Z"
}
} LMK if this address your questions. |
So you've shown that we restart the operator, but what happened to the operands? Did the CCM pods get updated because of the change to the configmap when the feature was rolled out? |
Hi @JoelSpeed - which configmap are you referring to?
Am I missing something in your questions or not following your thoughts here? if so, would you mind providing specific examples or eventual misunderstanding? |
Injecting features to allow transformers for each cloud provider implementation so enabled feature gates can be eavaluated.
c50b1b5
to
ce021d5
Compare
ce021d5
to
45349ca
Compare
What triggers the CCM pods to be restarted? |
Deployment rools out the CCM pods after updating the hash of cloud-config's configmap: $ oc get deployment.apps/aws-cloud-controller-manager -n openshift-cloud-controller-manager -o yaml | yq ea .spec.template.metadata.annotations -
operator.openshift.io/config-hash: 3a3cd981ecd5560fc7c4405e9d1af9b23f51cf343a740dc3bc0127a94799e0e0
# update/patch ConfigMap cloud-provider-config/openshift-config
$ oc get deployment.apps/aws-cloud-controller-manager -n openshift-cloud-controller-manager -o yaml | yq ea .spec.template.metadata.annotations -
operator.openshift.io/config-hash: c58e7513224b2164d101fb8428c198a78386f4bba20f5d52ab955be78197be9e |
Thanks for the reminder, in which case, yep, I think my concerns are alleviated, this should be good |
Enabling feature gate acessor to the cloud-config sync controller.
45349ca
to
c63ea02
Compare
/lgtm |
@mtulio: This pull request references SPLAT-2341 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/jira refresh |
@mtulio: This pull request references SPLAT-2341 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@mtulio: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
/approve @mtulio Is there something observable as an output of this PR that we can leverage for pre-merge testing to check the functionality? Or is this all just plumbing at this point? |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: JoelSpeed The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test ? |
@mtulio: The following commands are available to trigger required jobs:
The following commands are available to trigger optional jobs:
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
@JoelSpeed I didn't increase verbosity to this PR right now, the plan is to work on a new optional presubmit to drive the #400, which will test this PR completely. Let me know if this makes sense, otherwise I can think in some logging output in a provider transformer just to output here. |
Introducing feature gate support to the cloudConfigTransformer of cloud-config sync controller, so we can ensure custom configuration depending of OCP feature gate.
The idea is to provide the interface to enable the CCM feature of default SG provisioning on NLBs, SPLAT-2137.
The feature gate will added to providers in follow up PRs, starting to AWS config transformer: c61b48d , related to #391
Ref: SPLAT-2341