-
Notifications
You must be signed in to change notification settings - Fork 141
Add golang shim for comms with EPP #3930
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Problem: In order for NGINX to get the endpoint of the AI workload from the EndpointPicker, it needs to send a gRPC request using the proper protobuf protocol. Solution: A simple Go server is injected as an additional container when the inference extension feature is enabled, that will listen for a request from our (upcoming) NJS module, and forward to the configured EPP to get a response in a header.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a Go-based HTTP shim server that enables NGINX to communicate with the Gateway API Inference Extension Endpoint Picker via gRPC. The implementation adds new command-line functionality and container injection capabilities to support inference workload routing.
- Adds an
endpoint-pickercommand to the gateway binary that runs an HTTP server - Introduces container injection logic for the inference extension feature
- Implements gRPC client functionality to communicate with the External Processing Protocol (EPP)
Reviewed Changes
Copilot reviewed 9 out of 10 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| cmd/gateway/endpoint_picker.go | Core HTTP server implementation for EPP communication |
| cmd/gateway/endpoint_picker_test.go | Comprehensive test suite for the endpoint picker functionality |
| cmd/gateway/commands.go | Adds the new endpoint-picker command to the CLI |
| cmd/gateway/main.go | Registers the endpoint-picker command in the root command |
| internal/controller/provisioner/objects.go | Implements container injection logic for inference extension |
| internal/controller/provisioner/objects_test.go | Tests for container injection functionality |
| internal/controller/provisioner/provisioner.go | Adds InferenceExtension configuration field |
| internal/controller/manager.go | Passes InferenceExtension config to provisioner |
| go.mod | Adds required gRPC and protobuf dependencies |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are a couple of nits related to spacing / new lines before return statements, but it seems so common perhaps thats no longer in our style guidelines, but elsewise lgtm
|
Actually as a question, I thought #3841 says that the communication between the go app and the EPP will use TLS. But here it sends a grpc request. Am i missing something or has some stuff been changed?
|
|
@bjee19 Not mutually exclusive, right now this sends an insecure gRPC request, that we'll need to secure with a certificate/key. |
Problem: In order for NGINX to get the endpoint of the AI workload from the EndpointPicker, it needs to send a gRPC request using the proper protobuf protocol. Solution: A simple Go server is injected as an additional container when the inference extension feature is enabled, that will listen for a request from our (upcoming) NJS module, and forward to the configured EPP to get a response in a header.
Problem: In order for NGINX to get the endpoint of the AI workload from the EndpointPicker, it needs to send a gRPC request using the proper protobuf protocol. Solution: A simple Go server is injected as an additional container when the inference extension feature is enabled, that will listen for a request from our (upcoming) NJS module, and forward to the configured EPP to get a response in a header.
Problem: In order for NGINX to get the endpoint of the AI workload from the EndpointPicker, it needs to send a gRPC request using the proper protobuf protocol. Solution: A simple Go server is injected as an additional container when the inference extension feature is enabled, that will listen for a request from our (upcoming) NJS module, and forward to the configured EPP to get a response in a header.
Problem: In order for NGINX to get the endpoint of the AI workload from the EndpointPicker, it needs to send a gRPC request using the proper protobuf protocol. Solution: A simple Go server is injected as an additional container when the inference extension feature is enabled, that will listen for a request from our (upcoming) NJS module, and forward to the configured EPP to get a response in a header.
Problem: In order for NGINX to get the endpoint of the AI workload from the EndpointPicker, it needs to send a gRPC request using the proper protobuf protocol. Solution: A simple Go server is injected as an additional container when the inference extension feature is enabled, that will listen for a request from our (upcoming) NJS module, and forward to the configured EPP to get a response in a header.
Problem: In order for NGINX to get the endpoint of the AI workload from the EndpointPicker, it needs to send a gRPC request using the proper protobuf protocol. Solution: A simple Go server is injected as an additional container when the inference extension feature is enabled, that will listen for a request from our (upcoming) NJS module, and forward to the configured EPP to get a response in a header.
Problem: In order for NGINX to get the endpoint of the AI workload from the EndpointPicker, it needs to send a gRPC request using the proper protobuf protocol. Solution: A simple Go server is injected as an additional container when the inference extension feature is enabled, that will listen for a request from our (upcoming) NJS module, and forward to the configured EPP to get a response in a header.
Problem: In order for NGINX to get the endpoint of the AI workload from the EndpointPicker, it needs to send a gRPC request using the proper protobuf protocol.
Solution: A simple Go server is injected as an additional container when the inference extension feature is enabled, that will listen for a request from our (upcoming) NJS module, and forward to the configured EPP to get a response in a header.
Testing: Manually sent a request to the Golang app and received the endpoint header in the response.
Closes #3837
Checklist
Before creating a PR, run through this checklist and mark each as complete.
Release notes
If this PR introduces a change that affects users and needs to be mentioned in the release notes,
please add a brief note that summarizes the change.