-
Notifications
You must be signed in to change notification settings - Fork 359
Fix leak managed/owned security group on Service update with BYO SG #1209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
This issue is currently awaiting triage. If cloud-provider-aws contributors determine this is a relevant issue, they will accept it by applying the The DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
Hi @mtulio. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
03f9775 to
83c92f2
Compare
|
/ok-to-test |
83c92f2 to
23ba0b3
Compare
|
/test all |
23ba0b3 to
0fec46d
Compare
|
Fixing doc strings and failed unit tests from previous unexpected behavior: /test all |
|
Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
0fec46d to
1907542
Compare
|
/test pull-cloud-provider-aws-e2e-kubetest2 |
|
/test all |
|
I can't find connection between failures in pull-cloud-provider-aws-e2e-kubetest2 and existing changes. I am going to convert to regular PR to ask for reviewers while we observe if this isnt a CI flake. PTAL? |
|
FWIW interim update, this PR is still alive and need to be fixed, and proposal could be used in the logic of BYOSG in NLBs. I am planning to return on it next week to rebase and ask for final review with recent updates in the Service NLB and e2e. |
f0b38b6 to
e236025
Compare
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
PR rebased including/merging managed NLB SG, ensuring readiness with new e2e tests validating the fix: /test pull-cloud-provider-aws-e2e |
e236025 to
3cf5c63
Compare
|
Fixed conflicts |
3cf5c63 to
e98e826
Compare
|
small fixes in the e2e for PSP and logging: /test pull-cloud-provider-aws-e2e |
|
e2e is now passing, I am going to polish the e2e. Let’s see other overall: /test all |
e98e826 to
ca1ef98
Compare
|
Code Review - Claude Assistant (claude-sonnet-4.5@20250929) SummaryThis PR addresses security group leakage (#1208) when updating Classic Load Balancer services from controller-managed to user-specified security groups via the Core Change: Introduces lifecycle management for controller-owned security groups during load balancer updates, ensuring proper cleanup when security groups are replaced. Problem AnalysisCurrent Behavior: Impact:
Root Cause: Solution DesignNew Functions1.
|
ca1ef98 to
63e7396
Compare
|
initial investigation is pointing to flake test, checking: /test pull-cloud-provider-aws-test |
This commit fixes a security group leak vulnerability in NLB (Network Load Balancer) when updating a service from controller-managed security groups to BYO (Bring Your Own) security groups. Problem: When updating an NLB service to use BYO security groups via the service.beta.kubernetes.io/aws-load-balancer-security-groups annotation, the old managed security groups were not being deleted from AWS, causing resource leaks. Solution: 1. Added security group drift detection in ensureLoadBalancerv2() - Compares expected vs actual SGs on every reconciliation - Calls AWS SetSecurityGroups API when drift detected - Triggers cleanup of replaced managed security groups 2. Added helper function buildSecurityGroupRuleReferences() - Finds security groups with ingress rules referencing target SG - Identifies cluster-owned vs external security groups - Returns permissions that need revocation before deletion 3. Added helper function removeOwnedSecurityGroups() - Verifies SG ownership via cluster tags - Revokes dependent ingress rules from other SGs - Deletes owned SGs with exponential backoff retry Testing: - Added comprehensive unit tests (TestEnsureLoadBalancerv2_SecurityGroupUpdate) - Added E2E test for managed→BYO SG update scenario - All existing tests pass This fix follows the same pattern used for CLB in PR kubernetes#1209. Fixes: Security group leak when updating NLB from managed to BYO SG Related: kubernetes#1209 (CLB security group leak fix)
|
Introduced test is passing:
Failed tests, both hairpinning traffic, are failing to resolve DNS:
Checking if this would be transient ('cloudability' issues) or related to e2e updates (which is mostly debug and new e2e). |
|
/test pull-cloud-provider-aws-e2e |
|
The last attempt only the NLB test has failed for same reason as before (timeout):
I wonder if I need to consider increasing the timeout, although I think it is already high. /test pull-cloud-provider-aws-e2e |
The hairpinning trafic test(s) keeps flaking due timeout (to have a LB/resolve it's name). I had a good sync with @elmiko today, one approach would be increasing timeout of hairpinning traffic tests. I am also considering isolating the e2e improvements added on this PR to a dedicated one, so we can focus here in the fix part of things, while we investigate /isolate the e2e improvements. Open for thoughts. |
Fix the managed (controller-owned) security group leak when user provided security group is added to an existing Service type-loadBalancer CLB. fix/byosg/tests: unit tests to handle managed SG removal on BYOSG. Introduce unit tests for functions added to validate Service update to BYO Security Group annotations from a managed SG state.
Introduce BYO Security Group(SG) update scenario to Service CLB to validate SG leak when user has created a Service CLB with default SG and eventually updated to a user-provided. kubernetes#1208
63e7396 to
3e497b2
Compare
|
e2e timeout increased on loadbalancer curller/pooler to validate if CI issues are related. Expected to decrease the flake between hairpin traffic tests (CLB and NLB) |
|
/test pull-cloud-provider-aws-e2e |
|
@mtulio: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What type of PR is this?
/kind bug
What this PR does / why we need it:
This PR propose fix on leaked security group (SG) when a Service type-loadBalancer (CLB) is updated adding the BYO SG annotation (
service.beta.kubernetes.io/aws-load-balancer-security-groups), which replaces all SG added to the Load Balancer without removing linked rules, as well not deleting managed SG (created by controller).Which issue(s) this PR fixes:
Fixes #1208
Special notes for your reviewer:
We decided of creating isolated dedicated methods to discover and remove linked rule's SG targeting to:
The unit tests and documentation(function) comments have been assisted by Cursor AI(model claude-4-sonet): AIA HAb SeCeNc Hin R v1.0
Does this PR introduce a user-facing change?: