Skip to content

Conversation

@mtulio
Copy link
Contributor

@mtulio mtulio commented Jan 5, 2026

What type of PR is this?

/kind feature

What this PR does / why we need it:

This proposal implements support of Bring Your Own Security Group (BYO SG) annotation to Service type-loadBalancer NLB.

The feature uses the same annotation of BYO SG to Classic Load Balancer, allowing users to create/update a Network Load Balancer with SGs attached to it.

Note: The update will work only when the NLB has been created with managed SG (TODO review this statement).

The BYOSG is an alternative of managed SG for NLBs, when the feature is enabled through cloud-config.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Introduce support of Annotation `service.beta.kubernetes.io/aws-load-balancer-security-groups` on the Service for Network Load Balancer, so that users can provide their own security groups.

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. labels Jan 5, 2026
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign joelspeed for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Jan 5, 2026
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If cloud-provider-aws contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jan 5, 2026
Comment on lines +1909 to +1927
// buildSecurityGroupRuleReferences finds all security groups that have ingress rules
// referencing the specified security group ID. This is needed to safely delete security
// groups by first removing all dependent ingress rules.
//
// Parameters:
// - ctx: The context for AWS API calls
// - sgID: The security group ID to search for in ingress rules
//
// Returns:
// - map[*ec2types.SecurityGroup]bool: Map of security groups to cluster tag ownership status
// - map[*ec2types.SecurityGroup]IPPermissionSet: Map of security groups to their ingress rules that reference sgID
// - error: An error if the operation fails
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -0,0 +1,236 @@
/*
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mtulio
Copy link
Contributor Author

mtulio commented Jan 6, 2026

/test pull-cloud-provider-aws-test

@mtulio
Copy link
Contributor Author

mtulio commented Jan 6, 2026

/test pull-cloud-provider-aws-check

mtulio added 4 commits January 6, 2026 11:54
Introduce the documentation to use the feature Service type-LoadBalancer
with Security Group by opt-in through the cloud-config.

doc/nlb-sg: added instructions to use NLB+SG feature

Introduce the documentation to use the feature Service type-LoadBalancer
with Security Group by opt-in through the cloud-config.

doc(nlb byo-sg): update validation steps
Implement validation for Service annotation
`service.beta.kubernetes.io/aws-load-balancer-security-groups`
on NLBson NLBs.

Key changes:

1. Validation layer (aws_validations.go):
   - Removed blanket blocking of BYO SG annotation for NLB
   - Added NLB-specific constraints:
     * Only one security group allowed (NLB AWS limitation)
     * Security group ID must start with "sg-" (AWS format)
     * Extra security groups annotation remains blocked for NLB
Implement support of Service updates for BYO SG on NLBs, when
annotation is added, detecting drift and preventing leak of managed SG.

NOTE: BYO SG annotation can be added to NLB only when it is created with
managed SG support.

This commit fixes a security group leak vulnerability in NLB (Network Load
Balancer) when updating a service from controller-managed security groups
to BYO (Bring Your Own) security groups.

Problem:
When updating an NLB service to use BYO security groups via the
service.beta.kubernetes.io/aws-load-balancer-security-groups annotation,
the old managed security groups were not being deleted from AWS, causing
resource leaks.

Solution:
1. Added security group drift detection in ensureLoadBalancerv2()
   - Compares expected vs actual SGs on every reconciliation
   - Calls AWS SetSecurityGroups API when drift detected
   - Triggers cleanup of replaced managed security groups

2. Added helper function buildSecurityGroupRuleReferences()
   - Finds security groups with ingress rules referencing target SG
   - Identifies cluster-owned vs external security groups
   - Returns permissions that need revocation before deletion

3. Added helper function removeOwnedSecurityGroups()
   - Verifies SG ownership via cluster tags
   - Revokes dependent ingress rules from other SGs
   - Deletes owned SGs with exponential backoff retry

Testing:
- Added comprehensive unit tests (TestEnsureLoadBalancerv2_SecurityGroupUpdate)
This commit enables users to bring their own security groups (BYO SG)
for Network Load Balancers (NLB) through annotations or global config.

Key changes:

Implementation (aws.go):
   - Updated ensureNLBSecurityGroup() with priority-based SG selection:
     Priority 1: Existing NLB - return current SGs
     Priority 2: BYO SG from annotation
     Priority 3: BYO SG from global config (ElbSecurityGroup)
     Priority 4: Managed mode - create new managed SG
     Priority 5: None - return empty list
   - Added comprehensive logging for each decision path

3. Tests:
   - Updated validation tests with NLB BYO SG success/error cases
   - Fixed implementation tests to reflect new behavior
   - Removed duplicate validation tests from implementation layer

Supported configurations:
- Annotation: service.beta.kubernetes.io/aws-load-balancer-security-groups
- Global config: ElbSecurityGroup (fallback)
- Managed mode: NLBSecurityGroupMode=Managed (fallback)

All tests passing:
- TestValidateServiceAnnotations: PASS
- TestEnsureNLBSecurityGroup: PASS
@mtulio mtulio force-pushed the feat-svc-nlb-byosg branch from 59bba33 to 66728d1 Compare January 6, 2026 15:42
mtulio added 2 commits January 6, 2026 12:43
Ensure e2e tests for BYO SG scenario on create and update.
Introduce e2e aws helper to enhance e2e test setup with aws specific
operations, specially when simulating BYO scenarios, and validating
load balancer AWS resources.
@mtulio mtulio force-pushed the feat-svc-nlb-byosg branch from 66728d1 to 1e861e4 Compare January 6, 2026 15:44
@mtulio
Copy link
Contributor Author

mtulio commented Jan 6, 2026

Commits have been cleaned up.

/test all

@k8s-ci-robot
Copy link
Contributor

@mtulio: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cloud-provider-aws-e2e 1e861e4 link true /test pull-cloud-provider-aws-e2e

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@mtulio
Copy link
Contributor Author

mtulio commented Jan 6, 2026

I am observing two issues in this PR so far:

[It] [cloud-provider-aws-e2e] loadbalancer NLB with BYO SG - create with BYO from star

  1. e2e identity is missing permissions to scenario: create service with BYO SG on NLB:
I0106 16:39:58.845514 32245 loadbalancer.go:1082]   [Warning] SyncLoadBalancerFailed/lbconfig-test-nlb-byo-c: Error 
syncing load balancer: failed to ensure load balancer: error ensuring NLB security group rules: error while updating rules 
to security group "sg-0e755bbafb6239e3d": creating ingress rules for security group "sg-0e755bbafb6239e3d": error 
authorizing security group ingress: "operation error EC2: AuthorizeSecurityGroupIngress, https response error 
StatusCode: 403, RequestID: 0f08cf56-95bd-4b10-8ccc-cb6361739970, api error UnauthorizedOperation: 
You are not authorized to perform this operation. 
User: arn:aws:sts::209411653980:assumed-role/aws-cloud-controller-manager.kube-system.sa.test-cluster--iugf5b/1767716035453412230
 is not authorized to perform: ec2:AuthorizeSecurityGroupIngress on resource: arn:aws:ec2:us-west-2:209411653980:security-group/sg-0e755bbafb6239e3d 
because no identity-based policy allows the ec2:AuthorizeSecurityGroupIngress action. 

Action item: investigate why permission isn't added to CI since it is mentioned as pre-req.

[It] [cloud-provider-aws-e2e] loadbalancer NLB with BYO SG update - managed to BYO scenario

  1. update e2e for BYO SG NLB can't be implemented in existing e2e scenario the test setup does not support managed SG .

Proposed Action Itens: remove this test from upstream or implement cloud-config patch for managed SG scenario.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants