Skip to content

Add TCPRoute and UDPRoute Support for L4 Load Balancing #3688

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Skcey
Copy link

@Skcey Skcey commented Aug 5, 2025

Proposed changes

Problem:
NGF currently lacks support for TCPRoute and UDPRoute resources, which are essential for managing Layer 4 (TCP/UDP) traffic via the Gateway API.

Solution:
This PR adds basic support for TCPRoute and UDPRoute to NGF, enabling Layer 4 load balancing. Key implementations include:

  1. Added controllers to watch and process TCPRoute/UDPRoute resources, following the pattern used for HTTPRoute/TLSRoute.
  2. Implemented route construction logic in the state graph to resolve listeners, backend services, and reference grants for L4 routes.
  3. Extended the NGINX stream configuration generator to create upstream groups and server blocks for TCP/UDP traffic, mapping routes to their backend services.

Testing:
Packaged into an image and tested in my environment.

Closes #3687

Checklist

Before creating a PR, run through this checklist and mark each as complete.

  • [ ✔️] I have read the CONTRIBUTING doc
  • [✔️ ] I have added tests that prove my fix is effective or that my feature works
  • I have checked that all unit tests pass after adding my changes
  • I have updated necessary documentation
  • [ ✔️] I have rebased my branch onto main
  • [ ✔️] I will ensure my PR is targeting the main branch and pulling from my branch from my own fork

Release notes

If this PR introduces a change that affects users and needs to be mentioned in the release notes,
please add a brief note that summarizes the change.


Copy link

nginx-bot bot commented Aug 5, 2025

Hi @Skcey! Welcome to the project! 🎉

Thanks for opening this pull request!
Be sure to check out our Contributing Guidelines while you wait for someone on the team to review this.

@nginx-bot nginx-bot bot added the community label Aug 5, 2025
@github-actions github-actions bot added the enhancement New feature or request label Aug 5, 2025
Copy link
Contributor

github-actions bot commented Aug 5, 2025

✅ All required contributors have signed the F5 CLA for this PR. Thank you!
Posted by the CLA Assistant Lite bot.

@Skcey
Copy link
Author

Skcey commented Aug 5, 2025

I have hereby read the F5 CLA and agree to its terms

@Skcey
Copy link
Author

Skcey commented Aug 5, 2025

recheck

@sjberman
Copy link
Collaborator

@Skcey Can you please rebase this PR? There are a couple changes that went into main that are causing compilation errors.

Also, the clusterrole in the helm chart needs to be updated to support these routes and their statuses so the controller can actually handle them.

Comment on lines +46 to +50
type PortInfo struct {
Port int32
Protocol corev1.Protocol
}

Copy link
Contributor

@shaun-nx shaun-nx Aug 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, why did you make this struct?
From the way this is being used, it feels like you can reference Protocol on its own.

For example, on line 144, we can define ports := make(map[int32]corev1.Protocol)
Line 155 then becomes ports[int32(listener.Port)] = protocol

Then the loop on line 473 becomes this

for port, protocol := range ports {
  servicePort := corev1.ServicePort{
	  Name:       fmt.Sprintf("port-%d", port),
	  Port:       port,
	  TargetPort: intstr.FromInt32(port),
	  Protocol:   protocol,

Would love to know what you think though. Do please tell me if I'm overlooking anything.

Comment on lines +184 to +191
if len(l.L4Routes) > 1 {
fmt.Printf(
"WARN: Listener %s has %d TCPRoutes, which is not supported. Skipping.",
l.Name,
len(l.L4Routes),
)
continue
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this warning occurs, will the user see this information in their resource status?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't be using print statements in the code anyway. We have a logger to use. Regardless, we should be doing validation and setting status conditions before we even get to this point in the code. It would be when we build the graph, which is the initial step. If we're building the configuration here, this validation should have already occurred.

Copy link
Contributor

@ciarams87 ciarams87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Skcey thanks so much for this contribution! The core architecture looks good and follows the existing patterns well. As a first pass, I'd like to see a few improvements:

  1. Replace fmt.Printf debug statements with proper structured logging using logr.Logger
  2. Fix the serviceResolver.Resolve() calls in TCP/UDP upstream functions to include the missing logger parameter
  3. Add comprehensive unit test coverage for all new functions, following existing test patterns
  4. As Saylor pointed out above, the RBAC permissions need to be updated and your branch needs to be rebased
  5. If possible, could we reduce code duplication between TCP and UDP implementations

Please feel free to reach out with any questions or if you need any help with anything!

"k8s.io/apimachinery/pkg/util/validation/field"
"sigs.k8s.io/gateway-api/apis/v1alpha2"

"github.com/nginx/nginx-gateway-fabric/internal/controller/state/conditions"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"github.com/nginx/nginx-gateway-fabric/internal/controller/state/conditions"
"github.com/nginx/nginx-gateway-fabric/v2/internal/controller/state/conditions"

"k8s.io/apimachinery/pkg/util/validation/field"
"sigs.k8s.io/gateway-api/apis/v1alpha2"

"github.com/nginx/nginx-gateway-fabric/internal/controller/state/conditions"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"github.com/nginx/nginx-gateway-fabric/internal/controller/state/conditions"
"github.com/nginx/nginx-gateway-fabric/v2/internal/controller/state/conditions"


allowedAddressType := getAllowedAddressType(ipFamily)

eps, err := serviceResolver.Resolve(ctx, br.SvcNsName, br.ServicePort, allowedAddressType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function call needs to be updated as it's missing the logger parameter - see the pattern in buildUpstreams:

eps, err := svcResolver.Resolve(ctx, logger, br.SvcNsName, br.ServicePort, allowedAddressType)


allowedAddressType := getAllowedAddressType(ipFamily)

eps, err := serviceResolver.Resolve(ctx, br.SvcNsName, br.ServicePort, allowedAddressType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above - this function call needs to be updated as it's missing the logger parameter - see the pattern in buildUpstreams:

eps, err := svcResolver.Resolve(ctx, logger, br.SvcNsName, br.ServicePort, allowedAddressType)

@github-project-automation github-project-automation bot moved this from 🆕 New to 🏗 In Progress in NGINX Gateway Fabric Aug 13, 2025
@Skcey
Copy link
Author

Skcey commented Aug 14, 2025

@Skcey Can you please rebase this PR? There are a couple changes that went into main that are causing compilation errors.

Also, the clusterrole in the helm chart needs to be updated to support these routes and their statuses so the controller can actually handle them.

Thanks for the reminder! I’ll keep these issues in mind when I modify the code later. It’s just that I might not have time to update the code for about a week.

@Skcey
Copy link
Author

Skcey commented Aug 14, 2025

Hi @Skcey thanks so much for this contribution! The core architecture looks good and follows the existing patterns well. As a first pass, I'd like to see a few improvements:

  1. Replace fmt.Printf debug statements with proper structured logging using logr.Logger
  2. Fix the serviceResolver.Resolve() calls in TCP/UDP upstream functions to include the missing logger parameter
  3. Add comprehensive unit test coverage for all new functions, following existing test patterns
  4. As Saylor pointed out above, the RBAC permissions need to be updated and your branch needs to be rebased
  5. If possible, could we reduce code duplication between TCP and UDP implementations

Please feel free to reach out with any questions or if you need any help with anything!

Thank you so much for your feedback! I will definitely make improvements based on these points. However, I’ll be quite busy over the next week and may not be able to work on this task. I expect to have time to make the changes and refinements the week after next

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community enhancement New feature or request
Projects
Status: 🏗 In Progress
Development

Successfully merging this pull request may close these issues.

Add TCPRoute and UDPRoute Support for L4 Load Balancing
4 participants