-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Retry timeout tests for aggs #122031
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retry timeout tests for aggs #122031
Conversation
The aggs timeout test waits for the agg to return and then double checks that the agg is stopped using the tasks API. We're seeing some failures where the tasks API reports that the agg is still running. I can't reproduce them because computers. This adds two things: 1. Logs the hot_threads so we can see if the query is indeed still running. 2. Retries the _tasks API for a minute. If it goes away soon after the _search returns that's *fine*. If it sticks around for more than a few seconds then the cancel isn't working. We wait for a minute because CI can't be trusted to do anything quickly. Closes elastic#121993
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
not-napoleon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| * it still does. So long as it stops <strong>eventually</strong> that's | ||
| * still indicative of the interrupt code working. | ||
| * </p> | ||
| */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ good clarification
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. I had this method and thought "in three weeks I'm not going to know what it's ok to wait here."
The aggs timeout test waits for the agg to return and then double checks that the agg is stopped using the tasks API. We're seeing some failures where the tasks API reports that the agg is still running. I can't reproduce them because computers. This adds two things: 1. Logs the hot_threads so we can see if the query is indeed still running. 2. Retries the _tasks API for a minute. If it goes away soon after the _search returns that's *fine*. If it sticks around for more than a few seconds then the cancel isn't working. We wait for a minute because CI can't be trusted to do anything quickly. Closes elastic#121993
The aggs timeout test waits for the agg to return and then double checks that the agg is stopped using the tasks API. We're seeing some failures where the tasks API reports that the agg is still running. I can't reproduce them because computers. This adds two things: 1. Logs the hot_threads so we can see if the query is indeed still running. 2. Retries the _tasks API for a minute. If it goes away soon after the _search returns that's *fine*. If it sticks around for more than a few seconds then the cancel isn't working. We wait for a minute because CI can't be trusted to do anything quickly. Closes elastic#121993
The aggs timeout test waits for the agg to return and then double checks that the agg is stopped using the tasks API. We're seeing some failures where the tasks API reports that the agg is still running. I can't reproduce them because computers. This adds two things: 1. Logs the hot_threads so we can see if the query is indeed still running. 2. Retries the _tasks API for a minute. If it goes away soon after the _search returns that's *fine*. If it sticks around for more than a few seconds then the cancel isn't working. We wait for a minute because CI can't be trusted to do anything quickly. Closes elastic#121993
The aggs timeout test waits for the agg to return and then double checks that the agg is stopped using the tasks API. We're seeing some failures where the tasks API reports that the agg is still running. I can't reproduce them because computers. This adds two things: 1. Logs the hot_threads so we can see if the query is indeed still running. 2. Retries the _tasks API for a minute. If it goes away soon after the _search returns that's *fine*. If it sticks around for more than a few seconds then the cancel isn't working. We wait for a minute because CI can't be trusted to do anything quickly. Closes elastic#121993
The aggs timeout test waits for the agg to return and then double checks that the agg is stopped using the tasks API. We're seeing some failures where the tasks API reports that the agg is still running. I can't reproduce them because computers. This adds two things: 1. Logs the hot_threads so we can see if the query is indeed still running. 2. Retries the _tasks API for a minute. If it goes away soon after the _search returns that's *fine*. If it sticks around for more than a few seconds then the cancel isn't working. We wait for a minute because CI can't be trusted to do anything quickly. Closes elastic#121993
The aggs timeout test waits for the agg to return and then double checks that the agg is stopped using the tasks API. We're seeing some failures where the tasks API reports that the agg is still running. I can't reproduce them because computers. This adds two things: 1. Logs the hot_threads so we can see if the query is indeed still running. 2. Retries the _tasks API for a minute. If it goes away soon after the _search returns that's *fine*. If it sticks around for more than a few seconds then the cancel isn't working. We wait for a minute because CI can't be trusted to do anything quickly. Closes #121993
The aggs timeout test waits for the agg to return and then double checks that the agg is stopped using the tasks API. We're seeing some failures where the tasks API reports that the agg is still running. I can't reproduce them because computers. This adds two things: 1. Logs the hot_threads so we can see if the query is indeed still running. 2. Retries the _tasks API for a minute. If it goes away soon after the _search returns that's *fine*. If it sticks around for more than a few seconds then the cancel isn't working. We wait for a minute because CI can't be trusted to do anything quickly. Closes #121993
The aggs timeout test waits for the agg to return and then double checks that the agg is stopped using the tasks API. We're seeing some failures where the tasks API reports that the agg is still running. I can't reproduce them because computers. This adds two things: 1. Logs the hot_threads so we can see if the query is indeed still running. 2. Retries the _tasks API for a minute. If it goes away soon after the _search returns that's *fine*. If it sticks around for more than a few seconds then the cancel isn't working. We wait for a minute because CI can't be trusted to do anything quickly. Closes #121993
The aggs timeout test waits for the agg to return and then double checks that the agg is stopped using the tasks API. We're seeing some failures where the tasks API reports that the agg is still running. I can't reproduce them because computers. This adds two things: 1. Logs the hot_threads so we can see if the query is indeed still running. 2. Retries the _tasks API for a minute. If it goes away soon after the _search returns that's *fine*. If it sticks around for more than a few seconds then the cancel isn't working. We wait for a minute because CI can't be trusted to do anything quickly. Closes #121993
The aggs timeout test waits for the agg to return and then double checks that the agg is stopped using the tasks API. We're seeing some failures where the tasks API reports that the agg is still running. I can't reproduce them because computers. This adds two things: 1. Logs the hot_threads so we can see if the query is indeed still running. 2. Retries the _tasks API for a minute. If it goes away soon after the _search returns that's *fine*. If it sticks around for more than a few seconds then the cancel isn't working. We wait for a minute because CI can't be trusted to do anything quickly. Closes #121993
The aggs timeout test waits for the agg to return and then double checks that the agg is stopped using the tasks API. We're seeing some failures where the tasks API reports that the agg is still running. I can't reproduce them because computers. This adds two things to hopefully fix the bug:
Closes #121993