Skip to content

Fix: Merge Extra field during VirtualService merge for multiple Infer…#645

Open
aslakknutsen wants to merge 1 commit intoopenshift-service-mesh:release-1.26from
aslakknutsen:ossm_release-1.26-extra-vs
Open

Fix: Merge Extra field during VirtualService merge for multiple Infer…#645
aslakknutsen wants to merge 1 commit intoopenshift-service-mesh:release-1.26from
aslakknutsen:ossm_release-1.26-extra-vs

Conversation

@aslakknutsen
Copy link
Contributor

…… (#58393) (#59303)

  • Fix: Merge Extra field during VirtualService merge for multiple InferencePools

When multiple HTTPRoutes referencing different InferencePools are attached to the same Gateway, only the first InferencePool's ext_proc configuration was applied. This fix ensures all InferencePool configurations are preserved during VirtualService merge.

Root cause: The mergeHTTPRoutes() function merged HTTP routes and annotations but ignored the Extra field where InferencePool ext_proc configurations are stored. As a result, only the first HTTPRoute's InferencePool configuration was retained.

Changes:

  1. route_collections.go: Add Extra field merge logic in mergeHTTPRoutes()

    • Merge InferencePool config maps instead of overwriting
    • Add detailed logging for debugging and verification
    • Log final merged result with InferencePool count
  2. conversion.go: Improve shadow service lookup for InferencePools

    • Use KRT collection (ctx.Services) for direct K8s service lookup
    • Add fallback to K8s service labels when PushContext not yet updated
    • Add comprehensive logging for troubleshooting

Impact:

  • Fixes multi-InferencePool deployments on same Gateway
  • No performance impact (configuration-time only, not runtime)
  • Enables scalable inference routing with multiple models per Gateway

Testing:

  • Verified with 2 InferencePools on 1 Gateway
  • Both routes now have ext_proc configuration in Envoy
  • Confirmed via config_dump and istiod logs

Fixes #58392

  • Revert conversion.go changes - focus only on VirtualService merge fix

Reverted conversion.go changes as they are unnecessary. The InferencePool dependency registered(krt.FetchOne on InferencePools collection) is sufficient to trigger HTTPRoute recomputation when shadow services are created during InferencePool reconciliation.

Related to #58392

  • Address code review feedback for InferencePool merge
  • Refactor route_collections.go for better readability

    • Pull base.Extra nil check out one level
    • Use early return for non-InferencePool configs
    • Extract type assertions to variables
    • Change log.Infof → log.Debugf
    • Add comment on route name conflict prevention
  • Fix test infrastructure in conversion_test.go

    • Add mock services for infpool-model1/model2
    • Enables proper multi-route InferencePool testing
  • Update golden files with correct expected behavior

    • BackendNotFound → All references resolved
    • Empty destinations → Properly populated hosts

All reviewer feedback addressed. Tests passing.

  • Add test case for multiple HTTPRoutes with InferencePools on same gateway

  • Add release notes for PR #58393

  • Fix race condition in VirtualService merge by deep copying InferencePool configs

The Config.DeepCopy() method only performs a shallow copy of the Extra field. When merging multiple VirtualServices with InferencePool configs, this caused race conditions as multiple goroutines could modify the same underlying map.

This fix adds an explicit deep copy of the InferencePool configs map before merging to ensure thread safety when running with race detector enabled.

Fixes unit-tests-arm64_istio test failures in CI.

  • fix: preserve first Extra field value and add type safety check

  • chore: remove unnecessary comments

  • fix: preserve first Extra field value, add type safety, and log unexpected types

Please provide a description of this PR:

To help us figure out who should review this PR, please put an X in all the areas that this PR affects.

  • Ambient
  • Configuration Infrastructure
  • Docs
  • Dual Stack
  • Installation
  • Networking
  • Performance and Scalability
  • Extensions and Telemetry
  • Security
  • Test and Release
  • User Experience
  • Developer Infrastructure
  • Upgrade
  • Multi Cluster
  • Virtual Machine
  • Control Plane Revisions

Please check any characteristics that apply to this pull request.

  • Does not have any user-facing changes. This may include CLI changes, API changes, behavior changes, performance improvements, etc.

Required PR Label: Please add one of the following labels to this PR:

  • permanent-change: For OSSM-specific changes that should be cherry-picked to release branches
  • no-permanent-change: For temporary changes that will be removed and should NOT be cherry-picked

This labeling helps release maintainers identify which changes to include in new release branches.

…… (#58393) (#59303)

* Fix: Merge Extra field during VirtualService merge for multiple InferencePools

When multiple HTTPRoutes referencing different InferencePools are attached
to the same Gateway, only the first InferencePool's ext_proc configuration
was applied. This fix ensures all InferencePool configurations are preserved
during VirtualService merge.

Root cause: The mergeHTTPRoutes() function merged HTTP routes and annotations
but ignored the Extra field where InferencePool ext_proc configurations are
stored. As a result, only the first HTTPRoute's InferencePool configuration
was retained.

Changes:
1. route_collections.go: Add Extra field merge logic in mergeHTTPRoutes()
   - Merge InferencePool config maps instead of overwriting
   - Add detailed logging for debugging and verification
   - Log final merged result with InferencePool count

2. conversion.go: Improve shadow service lookup for InferencePools
   - Use KRT collection (ctx.Services) for direct K8s service lookup
   - Add fallback to K8s service labels when PushContext not yet updated
   - Add comprehensive logging for troubleshooting

Impact:
- Fixes multi-InferencePool deployments on same Gateway
- No performance impact (configuration-time only, not runtime)
- Enables scalable inference routing with multiple models per Gateway

Testing:
- Verified with 2 InferencePools on 1 Gateway
- Both routes now have ext_proc configuration in Envoy
- Confirmed via config_dump and istiod logs

Fixes #58392

* Revert conversion.go changes - focus only on VirtualService merge fix

Reverted conversion.go changes as they are unnecessary. The InferencePool
dependency registered(krt.FetchOne on InferencePools collection)
is sufficient to trigger HTTPRoute recomputation when shadow services are
created during InferencePool reconciliation.

Related to #58392

* Address code review feedback for InferencePool merge

- Refactor route_collections.go for better readability
  * Pull base.Extra nil check out one level
  * Use early return for non-InferencePool configs
  * Extract type assertions to variables
  * Change log.Infof → log.Debugf
  * Add comment on route name conflict prevention

- Fix test infrastructure in conversion_test.go
  * Add mock services for infpool-model1/model2
  * Enables proper multi-route InferencePool testing

- Update golden files with correct expected behavior
  * BackendNotFound → All references resolved
  * Empty destinations → Properly populated hosts

All reviewer feedback addressed. Tests passing.

* Add test case for multiple HTTPRoutes with InferencePools on same gateway

* Add release notes for PR #58393

* Fix race condition in VirtualService merge by deep copying InferencePool configs

The Config.DeepCopy() method only performs a shallow copy of the Extra field.
When merging multiple VirtualServices with InferencePool configs, this caused
race conditions as multiple goroutines could modify the same underlying map.

This fix adds an explicit deep copy of the InferencePool configs map before
merging to ensure thread safety when running with race detector enabled.

Fixes unit-tests-arm64_istio test failures in CI.

* fix: preserve first Extra field value and add type safety check

* chore: remove unnecessary comments

* fix: preserve first Extra field value, add type safety, and log unexpected types

Co-authored-by: jazzsir <jazzsir@gmail.com>
@openshift-ci openshift-ci bot added the size/L label Mar 6, 2026
@aslakknutsen aslakknutsen requested a review from dgn March 6, 2026 16:51
@aslakknutsen aslakknutsen added the no-permanent-change OSSM-specific temporary change label Mar 6, 2026
@FilipB
Copy link
Collaborator

FilipB commented Mar 6, 2026

/retest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

no-permanent-change OSSM-specific temporary change size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants