Add Retryable Test annotation #5999
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation and Context
The two modified tests are intermittently failing in our integ test code build jobs.
Since the tests in this file are using constant delay timeouts to signal an error response, and test whether they succeed / timeout, we see a race condition between the stubbed error response signaling the termination and the standard operation - both are operating on the same completable future.
A failure might look like this:
We could further tweak those constant timeout delay values but I don’t think it will actually solve the problem or at least not definitively. Since the existing setup is using a non-deterministic timeout delay, I believe that retrying the test when it fails with
@RetryableTest(maxRetries = 3)
will likely reduce failures significantly in the upstream codebuild jobs since the failure rate is already relatively low.Testing
When I run these two locally in a
@RepeatedTest
suite set to 1000 repetitions, I see all of them passing with 100% success rates.Approach to simulating the error:
For
nonstreamingOperation500_notFinishedWithinTime_shouldTimeout
, I tweaked theDELAY_AFTER_API_CALL_ATTEMPT_TIMEOUT
from 500 to 250ms, and saw a 12% failure rate.Adding a retryable annotation brings it down to 0.2% failure rate.