Skip to content

Conversation

RanVaknin
Copy link
Contributor

Motivation and Context

The two modified tests are intermittently failing in our integ test code build jobs.
Since the tests in this file are using constant delay timeouts to signal an error response, and test whether they succeed / timeout, we see a race condition between the stubbed error response signaling the termination and the standard operation - both are operating on the same completable future.

A failure might look like this:

Expecting a throwable with cause being an instance of:
  software.amazon.awssdk.core.exception.ApiCallAttemptTimeoutException
but was an instance of:
  software.amazon.awssdk.services.protocolrestjson.model.ProtocolRestJsonException: Service returned HTTP status code 500 (Service: ProtocolRestJson, Status Code: 500, Request ID: null)

We could further tweak those constant timeout delay values but I don’t think it will actually solve the problem or at least not definitively. Since the existing setup is using a non-deterministic timeout delay, I believe that retrying the test when it fails with @RetryableTest(maxRetries = 3) will likely reduce failures significantly in the upstream codebuild jobs since the failure rate is already relatively low.

Testing

When I run these two locally in a @RepeatedTest suite set to 1000 repetitions, I see all of them passing with 100% success rates.

Approach to simulating the error:

For nonstreamingOperation500_notFinishedWithinTime_shouldTimeout, I tweaked the DELAY_AFTER_API_CALL_ATTEMPT_TIMEOUT from 500 to 250ms, and saw a 12% failure rate.
Adding a retryable annotation brings it down to 0.2% failure rate.

@RanVaknin RanVaknin requested a review from a team as a code owner March 31, 2025 20:14
@@ -19,13 +19,14 @@

import java.time.Duration;
import org.junit.jupiter.api.Test;
import software.amazon.awssdk.testutils.retry.RetryableTest;
import software.amazon.awssdk.utils.Pair;

public abstract class BaseApiCallAttemptTimeoutTest extends BaseTimeoutTest {

protected static final Duration API_CALL_ATTEMPT_TIMEOUT = Duration.ofMillis(100);
protected static final Duration DELAY_BEFORE_API_CALL_ATTEMPT_TIMEOUT = Duration.ofMillis(50);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not able to reproduce the test pertaining to this timeout since the value here is already low. I don't think lowering it under 50ms is going to solve anything.

Copy link

@RanVaknin RanVaknin added this pull request to the merge queue Mar 31, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Apr 1, 2025
@RanVaknin RanVaknin added this pull request to the merge queue Apr 2, 2025
Merged via the queue into master with commit fbc3b88 Apr 2, 2025
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants