Skip to content

Conversation

@sjberman
Copy link
Collaborator

Problem: In order for NGINX to get the endpoint of the AI workload from the EndpointPicker, it needs to send a gRPC request using the proper protobuf protocol.

Solution: A simple Go server is injected as an additional container when the inference extension feature is enabled, that will listen for a request from our (upcoming) NJS module, and forward to the configured EPP to get a response in a header.

Testing: Manually sent a request to the Golang app and received the endpoint header in the response.

Closes #3837

Checklist

Before creating a PR, run through this checklist and mark each as complete.

  • I have read the CONTRIBUTING doc
  • I have added tests that prove my fix is effective or that my feature works
  • I have checked that all unit tests pass after adding my changes
  • I have updated necessary documentation
  • I have rebased my branch onto main
  • I will ensure my PR is targeting the main branch and pulling from my branch from my own fork

Release notes

If this PR introduces a change that affects users and needs to be mentioned in the release notes,
please add a brief note that summarizes the change.


Problem: In order for NGINX to get the endpoint of the AI workload from the EndpointPicker, it needs to send a gRPC request using the proper protobuf protocol.

Solution: A simple Go server is injected as an additional container when the inference extension feature is enabled, that will listen for a request from our (upcoming) NJS module, and forward to the configured EPP to get a response in a header.
@sjberman sjberman requested a review from Copilot September 17, 2025 14:27
@github-actions github-actions bot added enhancement New feature or request dependencies Pull requests that update a dependency file labels Sep 17, 2025
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a Go-based HTTP shim server that enables NGINX to communicate with the Gateway API Inference Extension Endpoint Picker via gRPC. The implementation adds new command-line functionality and container injection capabilities to support inference workload routing.

  • Adds an endpoint-picker command to the gateway binary that runs an HTTP server
  • Introduces container injection logic for the inference extension feature
  • Implements gRPC client functionality to communicate with the External Processing Protocol (EPP)

Reviewed Changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
cmd/gateway/endpoint_picker.go Core HTTP server implementation for EPP communication
cmd/gateway/endpoint_picker_test.go Comprehensive test suite for the endpoint picker functionality
cmd/gateway/commands.go Adds the new endpoint-picker command to the CLI
cmd/gateway/main.go Registers the endpoint-picker command in the root command
internal/controller/provisioner/objects.go Implements container injection logic for inference extension
internal/controller/provisioner/objects_test.go Tests for container injection functionality
internal/controller/provisioner/provisioner.go Adds InferenceExtension configuration field
internal/controller/manager.go Passes InferenceExtension config to provisioner
go.mod Adds required gRPC and protobuf dependencies

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@nginx nginx deleted a comment from Copilot AI Sep 17, 2025
@sjberman sjberman marked this pull request as ready for review September 17, 2025 14:38
@sjberman sjberman requested a review from a team as a code owner September 17, 2025 14:38
@sjberman sjberman changed the title Add golang shim for comms wth EPP Add golang shim for comms with EPP Sep 17, 2025
Copy link
Contributor

@bjee19 bjee19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are a couple of nits related to spacing / new lines before return statements, but it seems so common perhaps thats no longer in our style guidelines, but elsewise lgtm

@bjee19
Copy link
Contributor

bjee19 commented Sep 17, 2025

Actually as a question, I thought #3841 says that the communication between the go app and the EPP will use TLS. But here it sends a grpc request. Am i missing something or has some stuff been changed?

Or does gRPC satisfy the tls requirements already? Or are these things not mutually exclusive?

@sjberman
Copy link
Collaborator Author

@bjee19 Not mutually exclusive, right now this sends an insecure gRPC request, that we'll need to secure with a certificate/key.

@sjberman sjberman merged commit 211d13b into feat/inference-extension Sep 18, 2025
38 checks passed
@sjberman sjberman deleted the feat/golang-inference-app branch September 18, 2025 18:38
@github-project-automation github-project-automation bot moved this from 🆕 New to ✅ Done in NGINX Gateway Fabric Sep 18, 2025
sjberman added a commit that referenced this pull request Sep 18, 2025
Problem: In order for NGINX to get the endpoint of the AI workload from the EndpointPicker, it needs to send a gRPC request using the proper protobuf protocol.

Solution: A simple Go server is injected as an additional container when the inference extension feature is enabled, that will listen for a request from our (upcoming) NJS module, and forward to the configured EPP to get a response in a header.
salonichf5 pushed a commit that referenced this pull request Oct 2, 2025
Problem: In order for NGINX to get the endpoint of the AI workload from the EndpointPicker, it needs to send a gRPC request using the proper protobuf protocol.

Solution: A simple Go server is injected as an additional container when the inference extension feature is enabled, that will listen for a request from our (upcoming) NJS module, and forward to the configured EPP to get a response in a header.
salonichf5 pushed a commit that referenced this pull request Oct 15, 2025
Problem: In order for NGINX to get the endpoint of the AI workload from the EndpointPicker, it needs to send a gRPC request using the proper protobuf protocol.

Solution: A simple Go server is injected as an additional container when the inference extension feature is enabled, that will listen for a request from our (upcoming) NJS module, and forward to the configured EPP to get a response in a header.
salonichf5 pushed a commit that referenced this pull request Oct 15, 2025
Problem: In order for NGINX to get the endpoint of the AI workload from the EndpointPicker, it needs to send a gRPC request using the proper protobuf protocol.

Solution: A simple Go server is injected as an additional container when the inference extension feature is enabled, that will listen for a request from our (upcoming) NJS module, and forward to the configured EPP to get a response in a header.
ciarams87 pushed a commit that referenced this pull request Oct 16, 2025
Problem: In order for NGINX to get the endpoint of the AI workload from the EndpointPicker, it needs to send a gRPC request using the proper protobuf protocol.

Solution: A simple Go server is injected as an additional container when the inference extension feature is enabled, that will listen for a request from our (upcoming) NJS module, and forward to the configured EPP to get a response in a header.
ciarams87 pushed a commit that referenced this pull request Oct 17, 2025
Problem: In order for NGINX to get the endpoint of the AI workload from the EndpointPicker, it needs to send a gRPC request using the proper protobuf protocol.

Solution: A simple Go server is injected as an additional container when the inference extension feature is enabled, that will listen for a request from our (upcoming) NJS module, and forward to the configured EPP to get a response in a header.
ciarams87 pushed a commit that referenced this pull request Oct 17, 2025
Problem: In order for NGINX to get the endpoint of the AI workload from the EndpointPicker, it needs to send a gRPC request using the proper protobuf protocol.

Solution: A simple Go server is injected as an additional container when the inference extension feature is enabled, that will listen for a request from our (upcoming) NJS module, and forward to the configured EPP to get a response in a header.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file enhancement New feature or request

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants