Skip to content

Conversation

@Miciah
Copy link
Contributor

@Miciah Miciah commented Mar 24, 2025

Add a new test to verify that Istio is configured not to allow manual deployment.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 24, 2025
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Mar 24, 2025

@Miciah: This pull request references NE-1994 which is a valid jira issue.

In response to this:

Based on #1152. Only the newest commit is specific to this PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from alebedev87 and gcs278 March 24, 2025 20:49
@Miciah Miciah force-pushed the NE-1994-test-e2e-new-test-for-Istio-manual-deployment branch from 567bd5e to 1f0444b Compare March 25, 2025 17:33
@Miciah
Copy link
Contributor Author

Miciah commented Mar 25, 2025

@Miciah
Copy link
Contributor Author

Miciah commented Mar 25, 2025

/label acknowledge-critical-fixes-only

Per TRT, "Feature gated code is free to use this label."

@openshift-ci openshift-ci bot added the acknowledge-critical-fixes-only Indicates if the issuer of the label is OK with the policy. label Mar 25, 2025
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Mar 25, 2025

@Miciah: This pull request references NE-1994 which is a valid jira issue.

In response to this:

Add a new test to verify that Istio is configured not to allow manual deployment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@candita
Copy link
Contributor

candita commented Mar 26, 2025

/assign
/assign @alebedev87

Comment on lines 199 to 200
// enabled. When manual deployment is enabled, then Istio allows a gateway use
// to another gateway's service by specifying that gateway's service in
Copy link
Contributor

@candita candita Mar 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
// enabled. When manual deployment is enabled, then Istio allows a gateway use
// to another gateway's service by specifying that gateway's service in
// enabled. When manual deployment is enabled, then Istio allows a gateway to use
// another gateway's service by specifying that gateway's service in

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Istio to provision a service for this specific gateway, even if it specifies
// spec.addresses.
func testGatewayAPIManualDeployment(t *testing.T) {
gatewayClass, err := createGatewayClass("openshift-default", "openshift.io/gateway-controller")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
gatewayClass, err := createGatewayClass("openshift-default", "openshift.io/gateway-controller")
gatewayClass, err := createGatewayClass("openshift-default", "openshift.io/gateway-controller/v1")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very odd - I don't understand how the test works without the correct controllerName.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't rerun the tests since #1202 merged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

t.Logf("Polling for up to %v to verify that the gateway is accepted...", timeout)
if err := wait.PollUntilContextTimeout(context.Background(), interval, timeout, false, func(context context.Context) (bool, error) {
if err := kclient.Get(context, gatewayName, &gateway); err != nil {
t.Logf("Failed to get gateway %s: %v", gatewayName, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, it can be helpful to see "retrying..." in the log:

Suggested change
t.Logf("Failed to get gateway %s: %v", gatewayName, err)
t.Logf("Failed to get gateway %s: %v, retrying...", gatewayName, err)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems redundant (and possibly inaccurate for the last iteration when we reach the timeout), but I can add it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
t.Logf("Gateway %s not yet Accepted, retrying...", gatewayname)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


return false, nil
}); err != nil {
t.Errorf("Failed to observe the expected condition: %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
t.Errorf("Failed to observe the expected condition: %v", err)
t.Errorf("Failed to find gateway %s at the expected condition: %v", gatewayname, err)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems redundant; do we need to include the gateway name in every log message in the test? I can add it though.

t.Logf("Polling for up to %v to verify that service %q is created...", timeout, serviceName)
if err := wait.PollUntilContextTimeout(context.Background(), interval, timeout, false, func(context context.Context) (bool, error) {
if err := kclient.Get(context, serviceName, &service); err != nil {
t.Logf("Failed to get service %s: %v", serviceName, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
t.Logf("Failed to get service %s: %v", serviceName, err)
t.Logf("Failed to get service %s: %v, retrying...", serviceName, err)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


return true, nil
}); err != nil {
t.Errorf("Failed to observe the expected condition: %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
t.Errorf("Failed to observe the expected condition: %v", err)
t.Errorf("Istio failed to automatically provision service %q: %v", serviceName, err)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this extrapolating, and does it add any useful information? I think it's better to keep the error message to what we actually observed and know. The preceding log lines will provide the necessary context to understand why the expected condition was not observed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Miciah Miciah force-pushed the NE-1994-test-e2e-new-test-for-Istio-manual-deployment branch from 1f0444b to 6734479 Compare March 27, 2025 17:37
@Miciah
Copy link
Contributor Author

Miciah commented Mar 27, 2025

https://github.com/openshift/cluster-ingress-operator/compare/1f0444b996dfff19d6118aadb7309bc8e1293fa8..6734479ebfc5bbb83897bc51f7d7ca580f783e2b rebases, updates the controller name per #1202, and updates some comments and strings to address review comments.

}
}

// testGatewayAPIManualDeployment verifies that Istio's manual deployment is not
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that I'm struggling to understand what "Istio manual deployment" means. Is it an installation of servicemeshoperator3 OLM operator made by a third party (not CIO)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that you mention it, I think this test might not be complete.

I think of manual deployment as the opposite of auto-deployment. Auto-deployment means that Istio must create a service (for a Gateway?). Manual deployment means it must NOT create the service automatically. Manual deployment is apparently configured by adding a service's address in the Gateway's spec.Addresses. We don't want manual deployment, so here we're checking that it created the service automatically even though a service is specified in the spec.Addresses.

I think to be complete, the test needs to check that the created service is not the same service specified in the spec.Addresses. But also, maybe the service that is added to spec.Addresses needs to really exist, and the test needs to make sure the service that is created is not the service added in spec.Addresses.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think of manual deployment as the opposite of auto-deployment. Auto-deployment means that Istio must create a service (for a Gateway?). Manual deployment means it must NOT create the service automatically. Manual deployment is apparently configured by adding a service's address in the Gateway's spec.Addresses. We don't want manual deployment, so here we're checking that it created the service automatically even though a service is specified in the spec.Addresses.

Got it, thank you! The test starts to make sense for me now.

I think to be complete, the test needs to check that the created service is not the same service specified in the spec.Addresses. But also, maybe the service that is added to spec.Addresses needs to really exist, and the test needs to make sure the service that is created is not the service added in spec.Addresses.

Yes, that may complete the test indeed. We can create a service and set its FQDN as Gateway.Spec.Address (type Hostname): Istio doc's example using service FQDN.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think of manual deployment as the opposite of auto-deployment. Auto-deployment means that Istio must create a service (for a Gateway?). Manual deployment means it must NOT create the service automatically. Manual deployment is apparently configured by adding a service's address in the Gateway's spec.Addresses. We don't want manual deployment, so here we're checking that it created the service automatically even though a service is specified in the spec.Addresses.

Right. If it helps, I can add a link to https://istio.io/latest/docs/tasks/traffic-management/ingress/gateway-api/#manual-deployment in the code comment.

I think to be complete, the test needs to check that the created service is not the same service specified in the spec.Addresses. But also, maybe the service that is added to spec.Addresses needs to really exist, and the test needs to make sure the service that is created is not the service added in spec.Addresses.

I'm not sure I understand. If a service got created, then that implies that Istio did automated deployment. We know what service the test specifies, and it is not a .svc.cluster.local host name:

Addresses: []gatewayapiv1.GatewayAddress{{
Type: ptr.To(gatewayapiv1.HostnameAddressType),
Value: "lb.example.com",
}},

Yes, that may complete the test indeed. We can create a service and set its FQDN as Gateway.Spec.Address (type Hostname): Istio doc's example using service FQDN.

I could change the test to specify spec.addresses[].value: router-internal-default.openshift-ingress.svc.cluster.local. I had that during one iteration of the PR, thought it was unnecessary and potentially confusing, and changed the value to "lb.example.com". I can change it back if that would be less confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

References to "manual deployment", "automated deployment", and gateway listener "merging" added here:

// testGatewayAPIManualDeployment verifies that Istio's "manual deployment"
// feature is not enabled (see
// <https://istio.io/latest/docs/tasks/traffic-management/ingress/gateway-api/#manual-deployment>).
// We only want Istio to allow "automated deployment" (see
// <https://istio.io/latest/docs/tasks/traffic-management/ingress/gateway-api/#automated-deployment>).
//
// When manual deployment is enabled, then Istio allows a gateway to use an
// existing service (for example, another gateway's service) by specifying that
// service in spec.addresses. When a gateway using manual deployment specifies
// another gateway's service, the resulting behavior is effectively the same
// behavior as Gateway API's concept of gateway listener "merging" (see
// <https://github.com/kubernetes-sigs/gateway-api/blob/v1.2.1/apis/v1/gateway_types.go#L181-L182>).
//
// Gateway listener merging is underspecified in Gateway API and is not
// consistently implemented among Gateway API implementations, and so we do not
// want to allow it or any similar behavior (such as Istio's "manual
// deployment") until such a time as it is well defined, standard behavior.
// Instead, for the time being, we expect Istio to provision a service for a
// gateway ("automated deployment"), even if the gateway specifies some existing
// service in spec.addresses.

Changed to use an existing service here:

// Use the router's internal service in order to ensure that the
// referent exists. Using an existing service isn't strictly necessary
// in order to verify that Istio does not use manual deployment; if
// manual deployment *is* enabled, Istio rejects the gateway if it
// points to a non-existent referent. However, using an existing
// service more closely reflects the way that manual deployment *would*
// be used if it were allowed.

I struggled with describing why the test needs to use an existing service. I think what I wrote will make sense to future me. Let me know what you think!

GatewayClassName: gatewayapiv1.ObjectName(gatewayClass.Name),
Addresses: []gatewayapiv1.GatewayAddress{{
Type: ptr.To(gatewayapiv1.HostnameAddressType),
Value: "lb.example.com",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following up on #1204 (comment), shouldn't the spec.Address point to an existing service? Otherwise, how do we know that the manual deployment didn't just fail because the service didn't exist?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following up on #1204 (comment), shouldn't the spec.Address point to an existing service? Otherwise, how do we know that the manual deployment didn't just fail because the service didn't exist?

We know that because the gateway was accepted and Istio created a service for it.

var service corev1.Service
t.Logf("Polling for up to %v to verify that service %q is created...", timeout, serviceName)
if err := wait.PollUntilContextTimeout(context.Background(), interval, timeout, false, func(context context.Context) (bool, error) {
if err := kclient.Get(context, serviceName, &service); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, how do we know it didn't create a service of another name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we could list all services and then look for unexpected services, but is there a reason why this test should do that?

This commit resolves NE-1994.

https://issues.redhat.com/browse/NE-1994

* test/e2e/gateway_api_test.go (TestGatewayAPI): Run the new test,
testGatewayAPIManualDeployment.  Update a log message.
(testGatewayAPIManualDeployment): New test.  Verify that Istio is configured not
to allow manual deployment.
@Miciah Miciah force-pushed the NE-1994-test-e2e-new-test-for-Istio-manual-deployment branch from 6734479 to e2eb4f0 Compare March 27, 2025 23:44
@Miciah
Copy link
Contributor Author

Miciah commented Mar 27, 2025

https://github.com/openshift/cluster-ingress-operator/compare/6734479ebfc5bbb83897bc51f7d7ca580f783e2b..e2eb4f098027f87d7761bea11198f9aacfae41e1 elaborates on some comments to explain the concepts of Istio's "manual deployment" and "automated deployment" as well as Gateway API's gateway listener "merging". It also changes the test gateway to point to the router's internal service so that the gateway's referent is an existing service, which more closely resembles how someone would use manual deployment in practice.

@candita
Copy link
Contributor

candita commented Mar 28, 2025

Cluster install failure:

CatalogdClusterCatalogOpenshiftRedhatMarketplaceDegraded: Internal error occurred: failed calling webhook "inject-metadata-name.olm.operatorframework.io": failed to call webhook: Post "https://catalogd-service.openshift-catalogd.svc:9443/mutate-olm-operatorframework-io-v1-clustercatalog?timeout=10s": no endpoints available for service "catalogd-service"
CatalogdClusterCatalogOpenshiftRedhatOperatorsDegraded: Internal error occurred: failed calling webhook "inject-metadata-name.olm.operatorframework.io": failed to call webhook: Post "https://catalogd-service.openshift-catalogd.svc:9443/mutate-olm-operatorframework-io-v1-clustercatalog?timeout=10s": no endpoints available for service "catalogd-service"

/test e2e-aws-operator

@candita
Copy link
Contributor

candita commented Mar 28, 2025

Multus CNI error for single node:

Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_simpletest-rc-to-be-deleted-wskfn_e2e-gc-5189_bda753e8-edc3-471b-a1cc-dd404222fc05_0(7ebb5667a9c8c1e09418ea9377b06a50cf153a5042fd56930080bef4fd6e7aae): error adding pod e2e-gc-5189_simpletest-rc-to-be-deleted-wskfn to CNI network "multus-cni-network": plugin type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): CNI request failed with status 400: ...
ERRORED: error configuring pod [e2e-gc-5189/simpletest-rc-to-be-deleted-wskfn] networking: Multus: [e2e-gc-5189/simpletest-rc-to-be-deleted-wskfn/bda753e8-edc3-471b-a1cc-dd404222fc05]: error waiting for pod: Get "https://api-int.ci-op-cw14gdlz-b5a36.origin-ci-int-aws.dev.rhcloud.com:6443/api/v1/namespaces/e2e-gc-5189/pods/simpletest-rc-to-be-deleted-wskfn?timeout=1m0s": context deadline exceeded

/test e2e-aws-ovn-single-node

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 28, 2025

@Miciah: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@candita
Copy link
Contributor

candita commented Mar 28, 2025

/lgtm
/approve

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 28, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 28, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: candita

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 28, 2025
@openshift-merge-bot openshift-merge-bot bot merged commit 7f0fd6d into openshift:master Mar 28, 2025
19 checks passed
@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

Distgit: ose-cluster-ingress-operator
This PR has been included in build ose-cluster-ingress-operator-container-v4.20.0-202503282312.p0.g7f0fd6d.assembly.stream.el9.
All builds following this will include this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

acknowledge-critical-fixes-only Indicates if the issuer of the label is OK with the policy. approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants