-
Notifications
You must be signed in to change notification settings - Fork 218
OCPBUGS-53424: Wait for install plans to enter the "Requires Approval" phase before approving them #1203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-53424: Wait for install plans to enter the "Requires Approval" phase before approving them #1203
Conversation
|
/assign @alebedev87 |
|
test failures appear unrelated. |
ede8707 to
1bdc5af
Compare
|
The gateway api suite passed, but I'm going to run it a few more times to check for flakes. |
1bdc5af to
928c8ed
Compare
alebedev87
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was more in favor of approving all install plans of the expected subscription. But I guess approving the latest one (knowing that we watch for installplans) may be enough. Let's see more e2e-operator-techpreview runs.
test/e2e/gateway_api_test.go
Outdated
| // - If the Istio and Subscription CRs are deleted, they are recreated | ||
| // automatically. | ||
| func testGatewayAPIIstioInstallation(t *testing.T) { | ||
| gatewayClassName := "openshift-test" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be the third subtest which will create a gatewayclass with the opesnhift gatewayapi controller. I think it's redundant in the current setup where we have a single test which runs them all. But I can understand the idea of being able to run each test separately. In that context I'm wondering why not calling this gatewayclass openshift-default as we do in 2 other subtests?
test/e2e/gateway_api_test.go
Outdated
|
|
||
| t.Run("testGatewayAPIResources", testGatewayAPIResources) | ||
| if gatewayAPIControllerEnabled { | ||
| t.Run("testGatewayAPIIstioInstallation", testGatewayAPIIstioInstallation) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized that testGatewayAPIIstioInstallation deletes the subscription at the end (to see it recreated by the operator). Which means that all the checks (OSSM operator, istiod control plane, etc.) which could be useful to detect problems in the next step testGatewayAPIObjects (where we wait for the acceptance of GatewayClass) won't help. Because they are made for the previous subscription.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, but I think it's still useful to run testGatewayAPIIstioInstallation first. That way, hopefully any errors that would stop testGatewayAPIObjects will also appear in testGatewayAPIIstioInstallation, so even though you'll still get the "waiting for gateway class to be accepted" message, the first failure in the suite will hopefully be the more detailed one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That way, hopefully any errors that would stop testGatewayAPIObjects will also appear in testGatewayAPIIstioInstallation, so even though you'll still get the "waiting for gateway class to be accepted" message, the first failure in the suite will hopefully be the more detailed one.
Well, yes, "hopefully". However it may be not the case and the newly re-created Subscription may result into different errors. Also, if testGatewayAPIIstioInstallation does not delete the subscription the check for the acceptance of GatewayClass from testGatewayAPIObjects will have more time to be successful and show any timeout problems. We can a) move the subscription recreation to a dedicated subtest which is run somewhere after testGatewayAPIIstioInstallation and testGatewayAPIObjects or b) we can separate the change of the order of the subtests into a different PR and discuss it further (check of CSV can be kept in this PR).
test/e2e/util_gatewayapi_test.go
Outdated
|
|
||
| func assertClusterServiceVersionSucceeded(t *testing.T) error { | ||
| t.Helper() | ||
| expectedVersion, err := expectedOSSMVersion(t) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would the starting CSV from the subscription do the same thing? We can return the starting CSV from assertSubscription function to avoid going down to the ingress operator's deployment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used the value from the ingress operator deployment because it's the ultimate source of truth for what the operator is looking for, but you are right, the starting CSV reflects the same value, so it's equally valid.
|
@rfredette: This pull request references Jira Issue OCPBUGS-53424, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. In response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/jira refresh |
|
@rfredette: This pull request references Jira Issue OCPBUGS-53424, which is valid. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira ([email protected]), skipping review request. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
928c8ed to
6217df9
Compare
|
/retest /test e2e-aws-gatewayapi |
|
Pre-merge-verified on
Subscription uses the latest installPlan hence marking as verified |
|
|
|
/test e2e-aws-gatewayapi |
|
hit the issue in origin e2e test: openshift/origin#29638 (comment) |
|
I could not find a reason for the missing load balancer status for gateway's service in must-gather. I think that we can verify the Also, I'm still wondering whether it's a good idea to remove (expecting recreation) |
8799e18 to
19ae0a4
Compare
|
/test e2e-aws-gatewayapi |
alebedev87
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fix looks good to me + it passed the e2e tests twice with no problems. Just the last remark about the commit message: I think that we need to highlight the fact that we approve only the install plans which require approval. Because I think that that was the main workaround, not the fact that we approve the most recent one.
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alebedev87 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/test e2e-aws-gatewayapi Never enough of e2e runs ;) |
…approving them. Waiting until that point guarantees that there won't be simultaneous updates to the install plan, which can cause the install process to fail. Also, if multiple install plans exist for the same OSSM version, approve the most recently created one. This is part of the fix for OCPBUGS-53424
19ae0a4 to
b9d2420
Compare
Agreed. I've updated the commit message. |
|
The last Is that the issue #1212 is trying to address? |
No, 1212 focuses on augmenting the logs and improving the e2e to detect the missing |
|
Thank you for changing the commit message! /retitle OCPBUGS-53424: Wait for install plans to enter the "Requires Approval" phase before approving them |
|
ovn-serial timed out. Otherwise, no specific test failures /retest |
|
/test e2e-aws-ovn-serial |
1 similar comment
|
/test e2e-aws-ovn-serial |
|
e2e-aws-ovn-serial failed because of an infrastructure issue: I think that CI sometimes leaks a DNS record and then runs into this error when it re-uses the base domain. Shane saw something similar on #1206. He worked around the issue by amending a commit and pushing the change, which caused CI to use a different base domain for the cluster. (He changed a code comment, but even just changing the commit message should suffice.) Let's see whether just re-running the test solves the issue. /test e2e-aws-ovn-serial |
|
(If you do need to push an artificial change, my suggestion is to reword the commit message to keep the title line within 50 characters.) |
|
@rfredette: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
@rfredette: Jira Issue OCPBUGS-53424: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-53424 has been moved to the MODIFIED state. In response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[ART PR BUILD NOTIFIER] Distgit: ose-cluster-ingress-operator |
No description provided.