-
Notifications
You must be signed in to change notification settings - Fork 74
Add timeout to metrics query and cleanup job failure logs in e2e test #3378
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add timeout to metrics query and cleanup job failure logs in e2e test #3378
Conversation
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: sebsoto The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retitle Add timeout to metrics query and cleanup job failure logs in e2e test |
|
changes lgtm, will cherry-pick this and wait for test results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly LGTM, PTAL at the comments
test/e2e/metrics_test.go
Outdated
| TLSClientConfig: &tls.Config{InsecureSkipVerify: true}, | ||
| } | ||
| client := &http.Client{Transport: tr} | ||
| client := &http.Client{Transport: tr, Timeout: 30 * time.Second} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
30s may be too much for this request, consider 5s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather wait a bit longer than needed than too little to help flakes.
I know vSphere has historically had a very slow network
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the idea in the client is to fail fast, it will get retried in the wrapper
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as a compromise i'll do 15 seconds 👍
| } | ||
| req.Header.Set("Authorization", "Bearer "+token) | ||
| tr := &http.Transport{ | ||
| // InsecureSkipVerify is required to avoid errors due to bad certificate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider usign context.WithTimeout to control the timeout per request in ef239e2#diff-41354c8d32cc5449ea0e5de5e5f28928103f9d2642b034f4b6e8474d487d656fL261
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for example:
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
req, err := http.NewRequestWithContext(ctx, "GET", url, nil)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't know if that really helps here, theres only one request at the moment, and I can't think of a reason of we would need to tweak individual timeouts for different requests in the case we need to add another.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, disregard.
Prevents client.Do from blocking indefinitely.
When a job fails a wall of text is dumped in the test job logs, including the full job spec as well as all events associated with it. This hasn't proven to be useful for debugging, and makes the errors harder to read. All of this information can be found elsewhere in the artifact directory.
d8ecbcf to
92ec5dc
Compare
|
/lgtm |
|
/hold Revision 92ec5dc was retested 3 times: holding |
|
/test gcp-e2e-operator |
|
/hold cancel |
|
/retest-required |
|
/override ci/prow/nutanix-e2e-operator |
|
@mansikulkarni96: Overrode contexts on behalf of mansikulkarni96: ci/prow/nutanix-e2e-operator In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@sebsoto: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/cherry-pick release-4.20 |
|
@mansikulkarni96: new pull request created: #3389 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/cherry-pick release-4.19 release-4.18 release-4.17 |
|
@mansikulkarni96: new pull request created: #3390 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/cherry-pick release-4.18 |
|
@mansikulkarni96: #3378 failed to apply on top of branch "release-4.17": In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Addresses a missing timeout and improves job error output.