Fix: Merge Extra field during VirtualService merge for multiple Infer…#647
Closed
aslakknutsen wants to merge 1 commit intoopenshift-service-mesh:release-1.27from
Closed
Conversation
44d843c to
6af3efe
Compare
Contributor
Author
|
/test integ-pilot |
Contributor
Author
|
/test integ-security |
Contributor
Author
|
/test integ-pilot |
…… (#58393) * Fix: Merge Extra field during VirtualService merge for multiple InferencePools When multiple HTTPRoutes referencing different InferencePools are attached to the same Gateway, only the first InferencePool's ext_proc configuration was applied. This fix ensures all InferencePool configurations are preserved during VirtualService merge. Root cause: The mergeHTTPRoutes() function merged HTTP routes and annotations but ignored the Extra field where InferencePool ext_proc configurations are stored. As a result, only the first HTTPRoute's InferencePool configuration was retained. Changes: 1. route_collections.go: Add Extra field merge logic in mergeHTTPRoutes() - Merge InferencePool config maps instead of overwriting - Add detailed logging for debugging and verification - Log final merged result with InferencePool count 2. conversion.go: Improve shadow service lookup for InferencePools - Use KRT collection (ctx.Services) for direct K8s service lookup - Add fallback to K8s service labels when PushContext not yet updated - Add comprehensive logging for troubleshooting Impact: - Fixes multi-InferencePool deployments on same Gateway - No performance impact (configuration-time only, not runtime) - Enables scalable inference routing with multiple models per Gateway Testing: - Verified with 2 InferencePools on 1 Gateway - Both routes now have ext_proc configuration in Envoy - Confirmed via config_dump and istiod logs Fixes #58392 * Revert conversion.go changes - focus only on VirtualService merge fix Reverted conversion.go changes as they are unnecessary. The InferencePool dependency registered(krt.FetchOne on InferencePools collection) is sufficient to trigger HTTPRoute recomputation when shadow services are created during InferencePool reconciliation. Related to #58392 * Address code review feedback for InferencePool merge - Refactor route_collections.go for better readability * Pull base.Extra nil check out one level * Use early return for non-InferencePool configs * Extract type assertions to variables * Change log.Infof → log.Debugf * Add comment on route name conflict prevention - Fix test infrastructure in conversion_test.go * Add mock services for infpool-model1/model2 * Enables proper multi-route InferencePool testing - Update golden files with correct expected behavior * BackendNotFound → All references resolved * Empty destinations → Properly populated hosts All reviewer feedback addressed. Tests passing. * Add test case for multiple HTTPRoutes with InferencePools on same gateway * Add release notes for PR #58393 * Fix race condition in VirtualService merge by deep copying InferencePool configs The Config.DeepCopy() method only performs a shallow copy of the Extra field. When merging multiple VirtualServices with InferencePool configs, this caused race conditions as multiple goroutines could modify the same underlying map. This fix adds an explicit deep copy of the InferencePool configs map before merging to ensure thread safety when running with race detector enabled. Fixes unit-tests-arm64_istio test failures in CI. * fix: preserve first Extra field value and add type safety check * chore: remove unnecessary comments * fix: preserve first Extra field value, add type safety, and log unexpected types (cherry picked from commit 1cbc5c7) [Agent Backport Notes] * What/How: (1) Resolved merge conflict in http.status.yaml.golden by inserting the two new InferencePool status entries (infpool-model1, infpool-model2) with creationTimestamp: null and keeping v1beta1 for GatewayClass, and by inserting the two new HTTPRoute status entries (multi-route-infpool-1, multi-route-infpool-2) with v1beta1 and creationTimestamp: null. (2) Changed new HTTPRoute test inputs in http.yaml from gateway.networking.k8s.io/v1 to v1beta1. (3) Fixed VirtualService name in http.yaml.golden from tilde-separated (gateway~istio-autogenerated-k8s-gateway~default) to dash-separated (gateway-istio-autogenerated-k8s-gateway-default) to match the release-1.27 gateway naming convention, and added creationTimestamp: null. * Why: The release-1.27 branch uses gateway.networking.k8s.io/v1beta1 throughout (not v1), includes creationTimestamp: null in all golden file resource metadata, and uses a different gateway name encoding scheme (dashes vs tildes) compared to the upstream branch. Made-with: Cursor
6af3efe to
412295c
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
…… (#58393)
When multiple HTTPRoutes referencing different InferencePools are attached to the same Gateway, only the first InferencePool's ext_proc configuration was applied. This fix ensures all InferencePool configurations are preserved during VirtualService merge.
Root cause: The mergeHTTPRoutes() function merged HTTP routes and annotations but ignored the Extra field where InferencePool ext_proc configurations are stored. As a result, only the first HTTPRoute's InferencePool configuration was retained.
Changes:
route_collections.go: Add Extra field merge logic in mergeHTTPRoutes()
conversion.go: Improve shadow service lookup for InferencePools
Impact:
Testing:
Fixes #58392
Reverted conversion.go changes as they are unnecessary. The InferencePool dependency registered(krt.FetchOne on InferencePools collection) is sufficient to trigger HTTPRoute recomputation when shadow services are created during InferencePool reconciliation.
Related to #58392
Refactor route_collections.go for better readability
Fix test infrastructure in conversion_test.go
Update golden files with correct expected behavior
All reviewer feedback addressed. Tests passing.
Add test case for multiple HTTPRoutes with InferencePools on same gateway
Add release notes for PR #58393
Fix race condition in VirtualService merge by deep copying InferencePool configs
The Config.DeepCopy() method only performs a shallow copy of the Extra field. When merging multiple VirtualServices with InferencePool configs, this caused race conditions as multiple goroutines could modify the same underlying map.
This fix adds an explicit deep copy of the InferencePool configs map before merging to ensure thread safety when running with race detector enabled.
Fixes unit-tests-arm64_istio test failures in CI.
fix: preserve first Extra field value and add type safety check
chore: remove unnecessary comments
fix: preserve first Extra field value, add type safety, and log unexpected types
Please provide a description of this PR:
To help us figure out who should review this PR, please put an X in all the areas that this PR affects.
Please check any characteristics that apply to this pull request.
Required PR Label: Please add one of the following labels to this PR:
permanent-change: For OSSM-specific changes that should be cherry-picked to release branchesno-permanent-change: For temporary changes that will be removed and should NOT be cherry-pickedThis labeling helps release maintainers identify which changes to include in new release branches.