Skip to content

Conversation

sarthyparty
Copy link
Contributor

What type of PR is this?

/kind test

What this PR does / why we need it:

Adds GRPCRoute weighted BackendRefs test. Increase conformance feature coverage of GRPCRoute

Which issue(s) this PR fixes:

Fixes #2901

Does this PR introduce a user-facing change?:

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jul 28, 2025
@k8s-ci-robot
Copy link
Contributor

Welcome @sarthyparty!

It looks like this is your first PR to kubernetes-sigs/gateway-api 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/gateway-api has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 28, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @sarthyparty. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@sjberman
Copy link
Contributor

Question for the maintainers, since weight is a Core feature of GRPCRoute, should this test live in the Core GRPC tests, or do users need to opt-in to run it? Don't want to break any users that are running GRPCRoute tests that don't have this support, but also want to make sure that Core is treated properly.

@LiorLieberman
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 7, 2025
Copy link
Member

@LiorLieberman LiorLieberman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add the tests in conformance/tests/mesh/

@LiorLieberman
Copy link
Member

Question for the maintainers, since weight is a Core feature of GRPCRoute, should this test live in the Core GRPC tests, or do users need to opt-in to run it? Don't want to break any users that are running GRPCRoute tests that don't have this support, but also want to make sure that Core is treated properly.

Good question. I think this is a core feature of GRPCRoute, meaning everyone who supports GRPCRoute is expected to support this. You are right that we did not have conformance for that (which is sub-optimal) but the implementors of this should have already supported this feature.

So I think adding coverage is good, and if we break someone - thats a good sign they should fix it, cause the promise we give to our users is that weights are supported with grpcRoute

/cc @mikemorris @robscott for thoughts

@k8s-ci-robot
Copy link
Contributor

@LiorLieberman: GitHub didn't allow me to request PR reviews from the following users: for, thoughts.

Note that only kubernetes-sigs members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

Question for the maintainers, since weight is a Core feature of GRPCRoute, should this test live in the Core GRPC tests, or do users need to opt-in to run it? Don't want to break any users that are running GRPCRoute tests that don't have this support, but also want to make sure that Core is treated properly.

Good question. I think this is a core feature of GRPCRoute, meaning everyone who supports GRPCRoute is expected to support this. You are right that we did not have conformance for that (which is sub-optimal) but the implementors of this should have already supported this feature.

So I think adding coverage is good, and if we break someone - thats a good sign they should fix it, cause the promise we give to our users is that weights are supported with grpcRoute

/cc @mikemorris @robscott for thoughts

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@youngnick
Copy link
Contributor

Yes, agreed that weighted load-balancing is a Core feature, meaning that support is not optional. So if we break existing implementations, they'll need to fix it before they can claim support for the release this is included in (that is, v1.4).

@LiorLieberman
Copy link
Member

/retest

@LiorLieberman
Copy link
Member

Can you also add the tests in conformance/tests/mesh/

@sarthyparty ping on this in case you missed that part of the comment

@sarthyparty
Copy link
Contributor Author

Can you also add the tests in conformance/tests/mesh/

@sarthyparty ping on this in case you missed that part of the comment

Oh shoot thanks for ping I missed it. Also I'll fix the lint issue

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 11, 2025
@sarthyparty sarthyparty force-pushed the grpcroute-weight-test branch from 41fa6c3 to bef25ff Compare August 11, 2025 22:54
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 11, 2025
@sarthyparty
Copy link
Contributor Author

@LiorLieberman Do you want me to add gRPC mesh setup for testing? There doesn't seem to be any GRPC tests there. Or did you just mean to refactor the existing http weight test to use the shared func?

@LiorLieberman
Copy link
Member

@LiorLieberman Do you want me to add gRPC mesh setup for testing? There doesn't seem to be any GRPC tests there. Or did you just mean to refactor the existing http weight test to use the shared func?

I dont think you need any special setup beyond what we already have in mesh base manifests for conformance. (feel free to shout if I am missing something here).

I did mean to add grpc test to this folder as well, so it adds mesh coverage for that.

@rikatz
Copy link
Member

rikatz commented Aug 21, 2025

/assign

return resp.Response.GetAssertions().GetContext().GetPod(), nil
})

for i := 0; i < 10; i++ {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have a retries constant defined? use that instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, good point

Comment on lines 117 to 153
// Handle status code comparison based on protocol
expectedCode := fmt.Sprint(wantResp.StatusCode)
if strings.ToLower(exp.Request.Protocol) == "grpc" {
// For gRPC, HTTP 200 maps to gRPC status 0 (OK), but the echo client
// seems to report HTTP status codes even for gRPC requests
if wantResp.StatusCode == 200 && resp.Code == "200" {
// Both expect and got HTTP 200, which is fine for gRPC success
expectedCode = "200"
} else if wantResp.StatusCode == 200 {
expectedCode = "0"
}
}
if expectedCode != resp.Code {
return fmt.Errorf("wanted status code %v, got %v", expectedCode, resp.Code)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this logic is testing what we want for gRPC requests, specifically I believe // Both expect and got HTTP 200, which is fine for gRPC success is insufficient - it's only checking the "outer" initial HTTP connection and not the "inner" gRPC request/response?

/cc @LiorLieberman

Copy link
Contributor Author

@sarthyparty sarthyparty Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's a good point, I don't know if it's possible to get the grpc status code with the istio client. Maybe need to setup a different grpc calling system. Although according to Lior's earlier comment, it seems as though the client will error if there's a grpc error, so perhaps 200 is good enough?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think it would be preferable if we can affirmatively test the actual gRPC response code (which does sound like it would require changes to the client).

Copy link
Contributor Author

@sarthyparty sarthyparty Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I can change the istio client code itself (currently the test works by execing into echo pod and running client grpc://url). Do you have any ideas on the best way how I can make the grpc request from the pod without using client?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see we're just using the retagged Istio image - @LiorLieberman @howardjohn do y'all know where the source for this is?

I'm guessing that modifying the source would be disruptive for Istio - unsure if this is maybe a sufficient justification to fork and take ownership of this util ourselves for our tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It dont think it would be disruptive for istio, John is open for changes/improvements. I also think it makes sense but maybe not a MUST now

@mikemorris
Copy link
Contributor

mikemorris commented Aug 22, 2025

I ran this test locally with the snippet @LiorLieberman provided in #3962 (comment) and while it at least did run to completion (with the caveat in my prior comment about checking HTTP success and not the inner gRPC response), it appears as if Istio might not support this weighting as is currently written in the test - is this expected? A quick glance at the implementation looks like the GRPCRoute builder is attempting to include weighting... but the logic might be transforming the 0.7 and 0.3 values both to either 1 or 0?

According to the spec, the weight field is expected to be an integer, so I believe this is likely an issue with the test (and missing CEL validation perhaps?) and not the Istio implementation.

    grpcroute-weight.go:75: Traffic distribution test failed (10/10): backend "echo-v1" weighted traffic of 0.512 not within tolerance 0.7 (+/-0.050000)
        backend "echo-v2" weighted traffic of 0.488 not within tolerance 0.3 (+/-0.050000)
    grpcroute-weight.go:80: Weighted distribution tests failed

EDIT: integer values were provided in the YAML, so something else is off here...

  - backendRefs:
    - name: echo-v1
      port: 7070
      weight: 70
    - name: echo-v2
      port: 7070
      weight: 30

/cc @LiorLieberman @howardjohn

Comment on lines +13 to +19
- backendRefs:
- name: echo-v1
port: 7070
weight: 70
- name: echo-v2
port: 7070
weight: 30
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- backendRefs:
- name: echo-v1
port: 7070
weight: 70
- name: echo-v2
port: 7070
weight: 30
- backendRefs:
- name: echo-v1
port: 7070
weight: 70
- name: echo-v2
port: 7070
weight: 30
- name: echo-v3
port: 7070
weight: 0

We should probably include the third zero-weight backend for parity with the N/S test I think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure that sounds good to me

@sarthyparty
Copy link
Contributor Author

sarthyparty commented Aug 22, 2025

I ran this test locally with the snippet @LiorLieberman provided in #3962 (comment) and while it at least did run to completion (with the caveat in my prior comment about checking HTTP success and not the inner gRPC response), it appears as if Istio might not support this weighting as is currently written in the test - is this expected?

When I ran the test locally on istio it has passed. I'm not completely sure why it seems to fail for others. I did have cleanup on, not sure if that makes a difference

Copy link
Member

@LiorLieberman LiorLieberman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do you run it, can you paste the command and the output you have (make sure its verbose)

@sarthyparty
Copy link
Contributor Author

sarthyparty commented Aug 22, 2025

I ran

go test -v ./conformance --gateway-class istio --run TestConformance/MeshGRPCRouteWeight  --supported-features=Mesh,GRPCRoute -namespace-labels istio-injection=enabled > test-output.txt

Here is test-output.txt
Also pasted into google doc https://docs.google.com/document/d/1DPeDGf6CdnLl4vkGeILcI6y9FreRtoFqNnOuIUyq1-A/edit?usp=sharing

Here's just the bottom:

=== NAME  TestConformance/MeshGRPCRouteWeight
    apply.go:283: 2025-08-22T12:38:06.659768-07:00: Deleting mesh-grpc-weighted-backends GRPCRoute
=== NAME  TestConformance
    apply.go:283: 2025-08-22T12:38:06.666448-07:00: Deleting echo-v1 Deployment
    apply.go:283: 2025-08-22T12:38:06.670961-07:00: Deleting gateway-conformance-mesh-consumer Namespace
    apply.go:283: 2025-08-22T12:38:06.675795-07:00: Deleting echo Service
    apply.go:283: 2025-08-22T12:38:06.696662-07:00: Deleting echo-v2 Service
    apply.go:283: 2025-08-22T12:38:06.713987-07:00: Deleting echo-v2 Deployment
    apply.go:283: 2025-08-22T12:38:06.71996-07:00: Deleting echo-v1 Service
    apply.go:283: 2025-08-22T12:38:06.737305-07:00: Deleting echo-v1 Deployment
    apply.go:283: 2025-08-22T12:38:06.744919-07:00: Deleting gateway-conformance-mesh Namespace
--- PASS: TestConformance (26.41s)
    --- PASS: TestConformance/MeshGRPCRouteWeight (21.11s)
        --- PASS: TestConformance/MeshGRPCRouteWeight/Requests_should_have_a_distribution_that_matches_the_weight (21.01s)
PASS
ok  	sigs.k8s.io/gateway-api/conformance	26.999s

EDIT: Pasted output into a google doc so its not a suspicious link

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 26, 2025
@LiorLieberman
Copy link
Member

Thanks for all the hard work and navigating through the back-and-forth @sarthyparty !

I am bumping this in the group chat for final set of eyes, but I think I am good to go with this.

@LiorLieberman
Copy link
Member

I ran this test locally with the snippet @LiorLieberman provided in #3962 (comment) and while it at least did run to completion (with the caveat in my prior comment about checking HTTP success and not the inner gRPC response), it appears as if Istio might not support this weighting as is currently written in the test - is this expected? A quick glance at the implementation looks like the GRPCRoute builder is attempting to include weighting... but the logic might be transforming the 0.7 and 0.3 values both to either 1 or 0?

According to the spec, the weight field is expected to be an integer, so I believe this is likely an issue with the test (and missing CEL validation perhaps?) and not the Istio implementation.

    grpcroute-weight.go:75: Traffic distribution test failed (10/10): backend "echo-v1" weighted traffic of 0.512 not within tolerance 0.7 (+/-0.050000)
        backend "echo-v2" weighted traffic of 0.488 not within tolerance 0.3 (+/-0.050000)
    grpcroute-weight.go:80: Weighted distribution tests failed

EDIT: integer values were provided in the YAML, so something else is off here...

  - backendRefs:
    - name: echo-v1
      port: 7070
      weight: 70
    - name: echo-v2
      port: 7070
      weight: 30

/cc @LiorLieberman @howardjohn

Whats the status here @mikemorris ?

@mikemorris
Copy link
Contributor

I apparently ran the conformance test without first installing Istio on the cluster (and on rerunning, forgetting to restart the conformance test pods to ensure sidecars were injected), which completely explains why traffic would have been split equally and not weighted as expected.

After ensuring Istio was installed and prior runs were cleaned up before running the conformance tests this now passes fine for me in sidecar mode. I additionally tested on Istio's ambient mode, which fails by default (expected with only ztunnel which doesn't handle L7) when run with:

istioctl install --set profile=ambient
go test -v ./conformance --gateway-class istio --run TestConformance/MeshGRPCRouteWeight  \
  --supported-features=Mesh,GRPCRoute -namespace-labels istio.io/dataplane-mode=ambient

and passes after deploying a waypoint and running the test with the requisite namespace labels!

istioctl waypoint apply -n gateway-conformance-mesh
go test -v ./conformance --gateway-class istio --run TestConformance/MeshGRPCRouteWeight  \
  --supported-features=Mesh,GRPCRoute -namespace-labels istio.io/dataplane-mode=ambient,istio.io/use-waypoint=waypoint

After we add the Mesh resource in #4030 it might be nice to run a precheck to confirm the expected mesh is present and has a Ready status as part of the work for reading supported features from that resource, but we don't really have a viable alternative for this today, as we can't expect a GatewayClass will be present for E/W implementations.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 28, 2025
@LiorLieberman
Copy link
Member

Thanks @mikemorris !

@sarthyparty - can you fix the verify?

/retest

@sarthyparty sarthyparty force-pushed the grpcroute-weight-test branch from 0709f9d to 21dd149 Compare August 28, 2025 17:40
@k8s-ci-robot k8s-ci-robot removed lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Aug 28, 2025
@sarthyparty
Copy link
Contributor Author

@LiorLieberman Fixed the verify

@LiorLieberman
Copy link
Member

Thank you for all the hard work on this @sarthyparty

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 28, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: LiorLieberman, sarthyparty

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 28, 2025
@k8s-ci-robot k8s-ci-robot merged commit 56cc1c4 into kubernetes-sigs:main Aug 28, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/test lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add conformance test for GRPCRoute traffic splitting