Skip to content

feat: add automatic S3 request retry with exponential backoff#1701

Open
allanrogerr wants to merge 18 commits intominio:masterfrom
allanrogerr:feature/retry-mechanism
Open

feat: add automatic S3 request retry with exponential backoff#1701
allanrogerr wants to merge 18 commits intominio:masterfrom
allanrogerr:feature/retry-mechanism

Conversation

@allanrogerr
Copy link
Copy Markdown

@allanrogerr allanrogerr commented May 4, 2026

Description

Adds SDK-level automatic retry on transient HTTP failures via a single OkHttp interceptor (Http.RetryInterceptor) installed on the SDK's default OkHttpClient (built by Http.newDefaultClient() / Http.newDefaultClient(IntSupplier)). The interceptor retries on retryable HTTP status codes and retryable transport-level IOExceptions, then returns the final response (or the terminal exception). The retry budget is exposed as Builder.maxRetries(int) / setMaxRetries(int) and defaults to 10 attempts.

Caller-supplied clients are used verbatim. A custom OkHttpClient passed via Builder.httpClient(...) is not modified: no interceptor injection, no retryOnConnectionFailure override, no rebuild via newBuilder(). To opt into the SDK's retry policy on a custom client, register Http.RetryInterceptor yourself and pair it with retryOnConnectionFailure(false). Builder.maxRetries(...) and MinioAsyncClient.setMaxRetries(...) are no-ops on the caller-supplied path unless the caller has wired the interceptor to read from there.

On the SDK-default path, OkHttpClient.retryOnConnectionFailure(false) is set so the SDK's interceptor is the single source of retry policy — no double-retry layered on top. The interceptor's attempt budget is bound to the per-instance BaseS3Client.maxRetries field via an IntSupplier so setMaxRetries(int) takes effect on the very next request.

Retryable HTTP status codes (8): 408, 429, 499, 500, 502, 503, 504, 520.
Retryable transport errors: connection reset, EOF, socket timeout, idle-connection close, etc. SSL handshake / unknown-CA / cert-path errors and the HTTP-on-HTTPS protocol mismatch are explicitly not retried.
Backoff: full-jitter exponential, base 200 ms, per-attempt cap 1 s, exponent capped at 30 to avoid bit-shift overflow. Result range per attempt is [1, min(cap, base · 2ⁿ)].
Cancellation: chain.call().isCanceled() is consulted before each attempt and after every IOException so cancelled calls do not burn the retry budget.
Region/HEAD redirect: untouched — executeHeadAsync region rediscovery still works.
Public API: Retry (helper class) is package-private; the public surface is Http.RetryInterceptor, Http.newDefaultClient(...), and the builder/setter methods.

Side-effect-free Checksum.update

Checksum.update's RandomAccessFile branch was rewritten to use positional FileChannel.read(ByteBuffer, long position) so the method no longer mutates the file pointer. As a result, BaseS3Client.createBody and the two cousin patterns inside MinioAsyncClient.putObject no longer need to bracket their checksum reads with getFilePointer() / seek(). Net effect: ~25 lines deleted across three call sites, callers simpler, behaviour identical end-to-end.

minio-go alignment

The retry contract is intentionally aligned with minio-go/retry.go:

Item minio-go Java
MaxRetry 10 10
DefaultRetryUnit 200 ms 200 ms
DefaultRetryCap 1 s 1 s
MaxJitter 1.0 1.0
Retryable HTTP codes 8 (above) 8 (above)
IOException classifier TLS handshake / unknown-CA / HTTPS-on-HTTP excluded same
Backoff formula min(cap, unit·2ⁿ) − rand·… identical

Two deliberate divergences:

  1. No S3 error-code body probing. Per reviewer feedback, retry decisions are HTTP-status-only. Every body-level retryable code AWS S3 / MinIO emits in practice already rides on a retryable HTTP status (SlowDown/InternalError/Throttling → 503/500), so observable coverage is unchanged in real-world workloads. Removed: 5 MiB body peek, XML regex parse, the 12-element S3-code set.
  2. No request re-signing per attempt. minio-go rebuilds and re-signs the request per attempt; this Java port reuses the signed okhttp3.Request because retries run inside the OkHttp interceptor layer. For the default attempt budget the worst-case wall-clock retry window is well under the 15-minute S3 signing window. Documented inline on Http.RetryInterceptor.

Motivation and context

The SDK had no retry capability for transient HTTP failures. Throttling (503), gateway errors (502/504), request timeouts (408), and Cloudflare 520s propagated directly to callers, requiring every consumer to implement its own retry loop.

This PR went through several iterations driven by reviewer feedback:

  1. Initial design: CompletableFuture-based retry loop with a dedicated scheduler thread.
  2. Per @balamurugana review 4243115858: stripped to a simple OkHttp interceptor; scheduler dropped.
  3. Per review 4248622939 and review 4250022733 (multi-thread batches):
    • Caller-supplied clients are used verbatim (r3205524097, r3206746125).
    • BaseS3Client.httpClient is volatile for safe publication across setTimeout/ignoreCertCheck reassignments (r3205557447).
    • Checksum.update made side-effect-free on the file pointer; createBody save/restore (and two cousins in MinioAsyncClient.putObject) dropped (r3205580978).
    • S3 error-code body probing removed; HTTP-status-only retry (r3205604391).
    • Reuse BaseS3Client.RANDOM for fan-out object-name suffix (r3206792852).

Does not address #1700: error code BucketNotEmpty (HTTP 409) is not natively retryable in minio-go and stays out of the retryable set here.

How to test this PR

  1. Unit + integration + lint (Java 17):

    ./gradlew :api:check
    

    28 RetryTest cases: each Retry predicate, the backoff formula, and MockWebServer integration tests covering retry per status code, no-retry on 4xx, retry exhaustion (both InvalidResponseException and ErrorResponseException paths), maxRetries(1) disables retry, setMaxRetries post-construction, builder validation (0 and -1), connection-drop IOException retry, caller-supplied client contracts (testUserSuppliedClientWithoutInterceptorDoesNotRetry and testUserSuppliedClientWithInterceptorRetries), cert-path SSLException not retryable, and a 10 s timeout-guarded test that fires Call.cancel() mid-retry-loop and asserts the loop short-circuits in <5 s.

  2. Functional tests against a real server:

    ./gradlew :functional:runFunctionalTest \
      -Pendpoint=http://localhost:9000 \
      -PaccessKey=minioadmin -PsecretKey=minioadmin -Pregion=us-east-1
    
  3. Mint suite via Docker:

    docker run --rm --network host \
      -e SERVER_ENDPOINT="127.0.0.1:9000" \
      -e ACCESS_KEY="minioadmin" -e SECRET_KEY="minioadmin" \
      -e ENABLE_HTTPS=0 -e MINT_MODE="full" \
      minio/mint:local minio-java
    
  4. Verify the caller-supplied-client contract (no interceptor injection, retryOnConnectionFailure left alone):

    OkHttpClient custom = new OkHttpClient.Builder()
        .retryOnConnectionFailure(true)  // intentionally true; preserved verbatim
        .build();
    MinioClient client = MinioClient.builder()
        .endpoint("https://…").credentials(…)
        .httpClient(custom)
        .maxRetries(3)                   // no-op on caller-supplied path
        .build();

    To opt into SDK retry on a custom client, register the interceptor explicitly:

    OkHttpClient custom = new OkHttpClient.Builder()
        .retryOnConnectionFailure(false)
        .addInterceptor(new Http.RetryInterceptor())
        .build();

    Both behaviours are locked in by RetryTest.

Validation performed

Check Result
:api:check (compile + spotless + spotbugs + 28 RetryTest cases, Java 17) ✅ green
:api:javadoc ✅ green
./gradlew build (api, adminapi, examples, functional) ✅ green
:functional:runFunctionalTest against play.min.io ✅ runs through bucket / object / encryption / lifecycle / policy coverage (a pre-existing reference-equality bug in TestMinioClient.testBucketCors on play.min.io is unrelated to this PR — CORSConfiguration lacks an equals override, and the branch does not touch CORS code)
Mint suite (erasure mode, JARs from this branch overlayed into the registry.min.dev/public/mint:aistor image) ✅ minio-java tests run successfully against a 4-disk MinIO erasure cluster behind nginx-TLS
End-to-end repro for the side-effect-free Checksum.update change: client.putObject(PutObjectAPIArgs) with a RandomAccessFile body and no pre-set checksum headers ✅ upload succeeds; bytes match (the prior "remove the lines" world fails with EOFException after 0 bytes — confirms the fix is load-bearing on this code path)

@allanrogerr allanrogerr added the test-framework Tests and test framework changes label May 4, 2026
@allanrogerr allanrogerr self-assigned this May 4, 2026
Implements SDK-level automatic retry matching the minio-go retry-mechanism
spec. All requests through BaseS3Client.executeAsync() now retry on
retryable S3 error codes (InternalError, SlowDown, RequestTimeout, etc.),
retryable HTTP status codes (408, 429, 499, 500, 502, 503, 504, 520), and
transient IO errors (connection reset, EOF, server closed idle connection).

Non-seekable request bodies (raw okhttp3.RequestBody) get a single attempt;
seekable bodies (byte[], ByteBuffer, RandomAccessFile) retry up to maxRetries
(default 10). Backoff is full-jitter exponential: rand(0, min(1s, 200ms*2^n))
before the nth retry, matching minio-go's DefaultRetryUnit/DefaultRetryCap.
The "RetryHead" S3 code is explicitly excluded so executeHeadAsync region
redirect logic is unaffected. maxRetries is configurable via builder and
setMaxRetries() post-construction.

Closes minio#1700.
@allanrogerr allanrogerr force-pushed the feature/retry-mechanism branch from 49fda52 to 9524c1f Compare May 4, 2026 11:50
@allanrogerr allanrogerr marked this pull request as ready for review May 4, 2026 13:37
@allanrogerr allanrogerr requested review from Copilot and rraulinio May 4, 2026 13:37
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces SDK-level automatic retry for S3 requests with full-jitter exponential backoff, aligning the Java SDK’s behavior with minio-go’s retry policy and making transient S3/HTTP/IO failures less likely to leak to callers.

Changes:

  • Added a new package-private Retry helper for retryability classification (S3 codes, HTTP status codes, IO exceptions) and jittered exponential backoff.
  • Wired a retry loop into BaseS3Client.executeAsync() with delayed retries via a daemon ScheduledExecutorService, plus a seekability-based single-attempt gate for non-replayable bodies.
  • Exposed configuration via maxRetries(...) on builders and setMaxRetries(...), and added an InvalidResponseException.responseCode() accessor plus a comprehensive new RetryTest suite.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
api/src/main/java/io/minio/BaseS3Client.java Adds execute-with-retry loop, backoff scheduling, and max-retry configuration.
api/src/main/java/io/minio/Retry.java Defines retryable error sets and backoff computation.
api/src/main/java/io/minio/MinioAsyncClient.java Adds maxRetries(...) builder option and applies it to the built client.
api/src/main/java/io/minio/MinioClient.java Exposes setMaxRetries(...) and maxRetries(...) on the synchronous client/builder.
api/src/main/java/io/minio/Http.java Adds Http.S3Request.body() accessor used for retry seekability gating.
api/src/main/java/io/minio/errors/InvalidResponseException.java Stores/returns the triggering HTTP response code for retry classification.
api/src/test/java/io/minio/RetryTest.java Adds unit + MockWebServer integration tests covering retry classification and behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread api/src/main/java/io/minio/BaseS3Client.java Outdated
Comment thread api/src/main/java/io/minio/BaseS3Client.java Outdated
Comment thread api/src/test/java/io/minio/RetryTest.java
Comment thread api/src/main/java/io/minio/Retry.java Outdated
Comment thread api/src/main/java/io/minio/MinioClient.java
Comment thread api/src/main/java/io/minio/MinioAsyncClient.java
- Http.Body now captures the initial file pointer (fileOffset) when
  created for RandomAccessFile bodies; toRequestBody() seeks back to
  that offset before each attempt, fixing both first-attempt and retry
  payload corruption when checksum computation has advanced the pointer
- BaseS3Client.createBody() explicitly restores the file pointer after
  computing checksums so Http.Body captures the correct starting offset
- Retry scheduler now dispatches actual retry work (credential fetch +
  request build) to ForkJoinPool.commonPool(), freeing the single
  scheduler thread from head-of-line blocking by slow providers
- Replace package-private {@link Retry#MAX_RETRY} Javadoc references in
  MinioClient and MinioAsyncClient with {@code 10} to satisfy doclint
- Fix RETRYABLE_HTTP_CODES closing paren indentation to match codebase
  style (ImmutableSet.of() in Http.java and Signer.java)
- Add testRandomAccessFileBodyRetried integration test: verifies that
  both the failed and retried PUT requests carry identical body content
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread api/src/main/java/io/minio/BaseS3Client.java Outdated
Comment thread api/src/main/java/io/minio/Http.java Outdated
Comment thread api/src/test/java/io/minio/RetryTest.java
Comment thread api/src/test/java/io/minio/RetryTest.java Outdated
Comment thread api/src/test/java/io/minio/RetryTest.java
…he.remove

The no-retry tests for 404/403 were asserting only assertNotNull, which
would pass for any exception including NPEs.  Stronger assertions now
verify ErrorResponseException type and S3 error code/HTTP status, and
the retry-exhausted test verifies InvalidResponseException with status
500.

While adding those assertions, the tests revealed a pre-existing NPE:
regionCache.remove(s3request.bucket()) was called unconditionally when
the S3 error code was NoSuchBucket/RetryHead, but bucket is null for
operations like listBuckets().  Added a null guard so the cache
invalidation is skipped when there is no bucket in the request.
@allanrogerr allanrogerr requested a review from Copilot May 5, 2026 19:12
@allanrogerr allanrogerr review requested due to automatic review settings May 5, 2026 19:13
…gs JUA

SpotBugs JUA flags Assert.assertTrue(e instanceof T) because the message
hides what type was actually thrown on failure.  Using a typed catch block
is stricter: any unexpected exception type propagates as a test error with
the full stack trace, which is more informative than a failed boolean
assertion.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread api/src/main/java/io/minio/BaseS3Client.java Outdated
Comment thread api/src/main/java/io/minio/BaseS3Client.java Outdated
Comment thread api/src/test/java/io/minio/RetryTest.java
Comment thread api/src/test/java/io/minio/RetryTest.java
…strengthen test assertions

Replace the shared static Random instance with ThreadLocalRandom.current()
at the backoff call site and for the fan-out name generator in
MinioAsyncClient — eliminating shared-state CAS contention under concurrent
retry load.

Disable OkHttp's retryOnConnectionFailure in doExecuteAsync so the SDK's
executeWithRetry is the sole retry policy; previously OkHttp could silently
add an extra attempt on stale-connection IOExceptions, causing more total
attempts than maxRetries.

Strengthen the two remaining assertNotNull catch blocks in RetryTest
(testMaxRetriesOneDisablesRetry, testSetMaxRetriesPostConstruction) to typed
InvalidResponseException catches with responseCode()==500 assertions,
matching the pattern applied to the other tests in the previous commit.
@allanrogerr allanrogerr requested a review from Copilot May 5, 2026 20:27
@allanrogerr allanrogerr review requested due to automatic review settings May 5, 2026 20:32
Per @balamurugana's review on PR minio#1701, replace the
CompletableFuture/scheduler-based retry orchestration with an OkHttp
interceptor (Http.RetryInterceptor) that retries on retryable HTTP
status codes (408/429/499/500/502/503/504/520) and IOExceptions with
exponential backoff and full jitter.

- Drop S3-error-code retry classification; well-behaved S3/MinIO
  servers return retryable conditions with non-2xx HTTP status, so
  RETRYABLE_S3_CODES is redundant.
- Drop RETRY_SCHEDULER and the executeWithRetry recursive chain.
- Non-replayable bodies (raw okhttp3.RequestBody) are sent once via
  Http.RequestBody.isReplayable(); byte[]/ByteBuffer/RandomAccessFile
  bodies replay safely because writeTo rewinds the source.
- Wire interceptor at construction (BaseS3Client.wrapWithRetry) so
  setMaxRetries() takes effect on subsequent requests.
Replace `throws Exception` with the actually-thrown checked exceptions
(`IOException, MinioException`, plus `InterruptedException` for tests
calling `MockWebServer.takeRequest()`). Drops `throws` entirely on
`testSetMaxRetriesValidation` since it throws only an unchecked
`IllegalArgumentException`.

Fixes spotbugsTest THROWS_METHOD_THROWS_CLAUSE_BASIC_EXCEPTION
findings on Java 8/11/17/21/25.
@allanrogerr allanrogerr marked this pull request as draft May 7, 2026 17:10
Per @balamurugana's review on PR minio#1701, replace the
CompletableFuture/scheduler retry orchestration with a self-contained
OkHttp interceptor that retries on retryable HTTP status codes with
exponential backoff and full jitter.

- Http.RetryInterceptor: hard-coded MAX_RETRIES=5, BASE_DELAY_MS=200,
  MAX_DELAY_MS=30s, RETRYABLE_STATUS_CODES = 408/429/499/500/502/503/
  504/520. Bit-shift exponent capped to avoid overflow at high attempt
  counts; jitter bound guarded with Math.max(1, cap).
- Wired into Http.newDefaultClient() with a single addInterceptor call.
- Drop public maxRetries API on MinioClient/MinioAsyncClient/BaseS3Client
  and Builder.maxRetries; the interceptor's params are constants.
- Drop Retry.java entirely — RETRYABLE_HTTP_CODES inlined; the S3-error-
  code retry set, isRetryable(Throwable), isRetryableIOException, and
  computeBackoffMs helpers are no longer needed.
- IOException retry now relies on OkHttp's default
  retryOnConnectionFailure(true); only the interceptor handles status
  codes. No double-retry, no overlap.
- Drop Http.RequestBody.isReplayable() and the body-replayability gate;
  the raw okhttp3 RequestBody body path is unused in production code,
  and byte[]/ByteBuffer/RandomAccessFile bodies replay safely via
  writeTo().
- Drop InvalidResponseException.responseCode() accessor (added solely
  for the deleted retry classifier).
- RetryTest: trim from 539 lines to two integration smoke tests
  (retry on 503->200, no retry on 404).

Validated locally:
- :api:check (unit + spotbugs + spotless) green on Java 17
- :functional:runFunctionalTest passes through CORS test (modulo a
  pre-existing CORSConfiguration.equals() bug unrelated to this PR)
- mint suite (overlayed JARs in mint container, erasure mode):
  "All tests ran successfully", PASS in 42.8s
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

Comment thread api/src/main/java/io/minio/Http.java
Comment thread api/src/main/java/io/minio/Http.java Outdated
Comment thread api/src/main/java/io/minio/BaseS3Client.java Outdated
Comment thread api/src/test/java/io/minio/RetryTest.java
Reinstate the retry triggers, classifiers, and configurability that the
strip-back trimmed away, while keeping the orchestration in a single
self-contained OkHttp interceptor.

- Retry.java: status-code/S3-code sets, IOException filter, and the
  full-jitter exponential-backoff formula. Defaults: MAX_RETRY=10,
  DEFAULT_RETRY_UNIT_MS=200, DEFAULT_RETRY_CAP_MS=1000, MAX_JITTER=1.0.
- Http.RetryInterceptor: retries on retryable IOException, retryable
  HTTP status, and retryable S3 <Code> in non-2xx response bodies (peek
  capped at 5 MiB). Per-instance attempt budget via IntSupplier.
- BaseS3Client.wrapWithRetry installs the interceptor on every supplied
  client (default or user-passed) and disables OkHttp's
  retryOnConnectionFailure to avoid double retry.
- Restore the public maxRetries API: MinioClient.setMaxRetries,
  MinioClient.Builder.maxRetries, MinioAsyncClient.Builder.maxRetries,
  BaseS3Client.setMaxRetries. Pass 1 to disable retries.
- Drop the dead Http.S3Request.body() getter that was orphaned by the
  earlier strip-back.

RetryTest expanded from 2 to 22 tests covering the helpers, the
interceptor, every retryable status code, S3-code retry on a non-2xx
body, retry exhaustion, mid-sequence retry, post-construction setter,
and builder validation.

Validated locally on Java 17:
- :api:check (compile + spotless + spotbugs + tests) PASS
- ./gradlew build (api, adminapi, examples, functional) PASS
- mint suite, erasure mode, JARs from this commit: PASS in 44.3s
…status assertions

Address Copilot review feedback (PR minio#1701, review 4246873092):

- RetryInterceptor Javadoc now declares it as part of the SDK's supported
  public API and documents how BaseS3Client wraps every supplied
  OkHttpClient. Drops {@link} references to package-private Retry helpers
  so the public Javadoc no longer points at internal symbols. Adds a
  Threading note about Thread.sleep blocking the OkHttp dispatcher slot
  during backoff and points high-concurrency callers at Dispatcher
  pool sizing.
- BaseS3Client.executeAsync Javadoc reworded to reflect the wrapWithRetry
  behaviour (interceptor installed on every supplied client, default or
  caller-provided) and to note the maxRetries supplier is read per request.
- RetryTest: strengthen testRetryExhaustedReturnsLastResponse to assert
  the InvalidResponseException reflects HTTP 500 rather than allowing any
  exception type to pass. Add testRetryExhaustedSurfacesXmlErrorResponse
  for the XML/ErrorResponseException path, asserting both response.code()
  and the parsed S3 code after retries are exhausted.

Validated locally on Java 17:
- :api:check (compile + spotless + spotbugs + tests) PASS
- :api:javadoc PASS
… tests

Address findings from the deep-review pass on the retry feature.

Behaviour
- RetryInterceptor.intercept now consults chain.call().isCanceled() at the
  top of each attempt and after every IOException from chain.proceed.
  Without this a cancelled call would surface IOException("Canceled"),
  fall through Retry.isRequestErrorRetryable as "retryable", and burn the
  full attempt budget in sleep+retry — a regression in real OkHttp
  cancellation flows. Mirrors minio-go's `errors.Is(err, context.Canceled)`
  short-circuit.

API hardening
- BaseS3Client.maxRetries narrowed from protected to package-private; the
  only out-of-class writer is MinioAsyncClient.Builder (same package),
  which now goes through the validating setMaxRetries() instead of a raw
  field assignment.
- Http.Body(okhttp3.RequestBody) now validates non-null and documents the
  retry-replay caveat for one-shot bodies.

Documentation
- RetryInterceptor Javadoc adds explicit notes on cancellation handling,
  request-body replayability (with a warning about caller-supplied
  isOneShot bodies), and the deliberate divergence from minio-go around
  per-attempt request rebuild / re-signing / region & STS refresh.
- BaseS3Client.wrapWithRetry Javadoc states that retryOnConnectionFailure
  is forced to false on every supplied client.
- BaseS3Client.traceOn Javadoc notes per-retry attempts are not surfaced
  to the trace stream and points at OkHttp's HttpLoggingInterceptor.
- MinioAsyncClient.Builder.httpClient overloads now document the wrap
  behaviour.
- Retry.java header references minio-go's retry.go as the contract source.
- docs/API.md builder-methods table gains a maxRetries() row.

Tests (RetryTest.java)
- All 13 integration tests now close the MinioClient via a closeQuietly
  helper, eliminating the per-test dispatcher leak.
- New: testCancelDuringRetryStopsLoop — wraps a real OkHttpClient with the
  interceptor, fires Call.cancel() ~150ms in, asserts the loop short-
  circuits in <5s (without the fix it would run all 10 attempts).
- New: testRetryOnConnectionDropThenSuccess — exercises the IOException
  retry path end-to-end via MockWebServer SocketPolicy.DISCONNECT_AT_START.
- New: testUserSuppliedClientStillRetries — caller-supplied OkHttpClient
  with retryOnConnectionFailure(true) still gets the SDK interceptor.
- New: testIsRequestErrorRetryable_certPathErrorsNotRetryable — covers
  the SSLException-with-cert-path-cause and SSLPeerUnverifiedException
  branches that were previously untested.
- New: testMaxRetriesBuilderValidation_negative,
  testSetMaxRetriesValidation_negative — extends validation coverage
  beyond the 0-only case.

Style
- Replaced fully-qualified java.util.regex.Pattern/Matcher with imports
  on Http.java.
- Bumped 2025-only copyright headers on touched files to 2025-2026 (and
  2022 -> 2022-2026, 2015-2021 -> 2015-2026).

Validated locally on Java 17:
- :api:check (compile + spotless + spotbugs + tests) PASS
- :api:javadoc PASS
- ./gradlew build (api, adminapi, examples, functional) PASS

minio-go contract: cancellation handling is *added*, bringing parity with
minio-go's errors.Is(context.Canceled) check. No retryable status, S3 code,
backoff formula, or jitter constant was changed.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Comment thread api/src/main/java/io/minio/BaseS3Client.java Outdated
Comment thread api/src/main/java/io/minio/MinioAsyncClient.java Outdated
Comment thread api/src/main/java/io/minio/MinioAsyncClient.java Outdated
Comment thread docs/API.md
Comment thread api/src/main/java/io/minio/Retry.java Outdated
… prior interceptor from network chain too; correct backoff range docs

Address Copilot review (PR minio#1701, review 4247997916):

- BaseS3Client.setMaxRetries and MinioAsyncClient.Builder.maxRetries
  Javadocs no longer reference {@link Retry#MAX_RETRY}. The Retry class
  is package-private, so the link target was unresolvable from generated
  public docs. Replaced with the literal default {@code 10}, which is
  what callers actually see.
- BaseS3Client.wrapWithRetry now also strips any prior RetryInterceptor
  from networkInterceptors(), not only the application-interceptor list.
  Previously the Javadoc claimed every prior RetryInterceptor was
  replaced, but only application interceptors were. A caller that had
  (mis)registered RetryInterceptor as a network interceptor would have
  ended up with two retry layers. Defends against that and brings the
  Javadoc claim into line with the code.
- Retry.exponentialBackoffMs Javadoc previously stated the result range
  as [0, sleep]. The actual range is [1, sleep] because nextDouble() is
  in [0.0, 1.0) and the (long) cast truncates rand * sleep to at most
  sleep - 1, so the subtraction never reaches sleep itself. minio-go
  has the same off-by-one (Float64() is also [0.0, 1.0)); this is
  documented now rather than papered over, so callers reading the
  formula see the same bounds the implementation produces.

Validated locally on Java 17:
- :api:check (compile + spotless + spotbugs + tests) PASS
- :api:javadoc PASS
@allanrogerr allanrogerr marked this pull request as ready for review May 7, 2026 22:26
Comment thread api/src/main/java/io/minio/BaseS3Client.java Outdated
Comment thread api/src/main/java/io/minio/BaseS3Client.java Outdated
Comment thread api/src/main/java/io/minio/BaseS3Client.java
Comment thread api/src/main/java/io/minio/BaseS3Client.java Outdated
Comment thread api/src/main/java/io/minio/BaseS3Client.java Outdated
Comment thread api/src/main/java/io/minio/Http.java Outdated
Comment thread api/src/main/java/io/minio/Http.java Outdated
Comment thread api/src/main/java/io/minio/BaseS3Client.java
Comment thread api/src/main/java/io/minio/Retry.java Outdated
@@ -0,0 +1,136 @@
/*
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retry.java is a technically correct, direct port from minio-go. Removing it would disrupt that alignment.

This comment was marked as duplicate.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking none of this is needed. I am seeking to maintain alignment with minio-go as much as possible. It is a single source of truth, with pure-function testability, policy versus mechanism separation, as well as parity with minio-go/retry.go

Perhaps more substantive: it is the SDK's only retry policy: with an HTTP-status set, IOException classifier, and jitter formula - none of which OkHttp provides. They are kept in one package-private class so the synch path and the interceptor can't drift, and so the predicates are unit-testable as pure functions.

As an alternative I could inline the four policy pieces as private statics inside Http.RetryInterceptor. What do you prefer @balamurugana ?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove it. We should not copy/convert go code. We need to use each language features.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spoke to Bala - I am removing this entire file and refactoring to suit.

Comment thread docs/API.md
Five concessions from bala's review batch — each verified against
minio-go's contract or the file's own pre-PR conventions:

BaseS3Client.java
- Restore the `protected static final Random RANDOM` field (with its
  java.security.SecureRandom + java.util.Random imports) that
  60aeaed dropped when switching backoff jitter to
  ThreadLocalRandom. Per bala's r3205617158: do not remove
  protected/public API surface. The field is part of the public
  subclassing contract on BaseS3Client and is the Java analogue of
  minio-go's per-client `c.random` source (api.go:307); SDK-internal
  code paths still use ThreadLocalRandom.current() for the retry
  backoff hot path so jitter is unchanged.
- Drop the null-guard around `regionCache.remove(s3request.bucket())`
  introduced in beead6e. The guard was only load-bearing for the
  contrived testNoRetryOn404 (listBuckets + 404 NoSuchBucket — a wire
  shape no spec-compliant S3 server emits since NoSuchBucket only
  applies to bucket-scoped ops); the test is rewritten in this commit
  to a realistic scenario, removing the dependency on the guard.

Http.java
- Drop the two-line `// Our RetryInterceptor handles transient
  failures with full-jitter backoff; defer entirely to it instead of
  layering OkHttp's connection-level retry on top.` comment from
  newDefaultClient() per bala's r3205589355: the
  retryOnConnectionFailure(false) call paired with addInterceptor(new
  RetryInterceptor()) is self-explanatory and the prose duplicated the
  RetryInterceptor Javadoc.
- Drop the `Utils.validateNotNull(requestBody, "request body")` guard
  from the Body(okhttp3.RequestBody) constructor per bala's
  r3205613399; the three other Body constructors do not validate
  non-null and let the next-line dereference fail with the JVM's
  helpful-NPE message — restoring that local convention.
- Drop the `private long fileOffset` field, its capture in the
  Body(RandomAccessFile,...) constructor, and the seek in
  toRequestBody() per bala's r3205614300. Empirical truth-table probe
  (4 trials toggling F = Body.fileOffset against C+R = createBody
  save/restore + RequestBody.position+seek) confirmed F is redundant
  when C+R are present: removing F alone keeps both first-attempt and
  retry-attempt body content correct (Trial 3), since the
  pre-Body-construction seek in createBody already places the file
  pointer where RequestBody.position will capture it on construction
  and writeTo() will seek back to on each network attempt.

RetryTest.java
- Rewrite testNoRetryOn404 to call removeBucket on a non-existent
  bucket. NoSuchBucket only applies to bucket-scoped operations on
  the S3 wire; pairing it with the bucket-less listBuckets() is a
  contrived scenario AWS S3 / MinIO never produce. The rewrite
  preserves coverage (404 does not retry, ErrorResponseException
  carries code "NoSuchBucket" and status 404, request count == 1)
  while exercising a wire shape the SDK actually has to handle, and
  uses a non-null bucket so the regionCache invalidation path runs
  with a real ConcurrentHashMap.remove(String) call.

Validated locally on Java 17:
- :api:check (compile + spotless + spotbugs + tests) PASS
- :api:javadoc PASS
@allanrogerr allanrogerr requested a review from balamurugana May 8, 2026 05:38
Comment thread api/src/main/java/io/minio/Retry.java Outdated
@@ -0,0 +1,136 @@
/*

This comment was marked as duplicate.

Comment thread api/src/main/java/io/minio/BaseS3Client.java Outdated
Comment thread api/src/main/java/io/minio/BaseS3Client.java
} catch (IOException e) {
throw new MinioException(e);
}
}

This comment was marked as duplicate.

Comment thread api/src/main/java/io/minio/Http.java Outdated
Comment thread api/src/main/java/io/minio/MinioAsyncClient.java Outdated
allanrogerr added a commit to allanrogerr/minio-java that referenced this pull request May 8, 2026
Five reviewer threads addressed plus a WHY-only comment trim across the
touched files. Each thread is mapped to the reviewer's discussion ID and
verified locally against :api:check / :api:javadoc / a production-cluster
repro on the local mint-erasure compose.

r3205524097 / r3206746125 ("we should not wrap")
- BaseS3Client.java: drop wrapWithRetry() and the matching call sites in
  both ctors. Caller-supplied OkHttpClient is now used verbatim — no
  newBuilder(), no interceptor strip-and-add, no retryOnConnectionFailure
  override — per bala's principle that the user's client is the user's
  client. Field/Javadoc updates to setMaxRetries, executeAsync, the
  copy-ctor, and the maxRetries field reflect the new contract.
- Http.java: add newDefaultClient(IntSupplier) overload so the SDK-default
  client can have its retry budget bound to BaseS3Client.maxRetries via a
  supplier. The no-arg form delegates with () -> Retry.MAX_RETRY for
  backward compatibility (MinioAdminClient, functional/TestArgs).
- MinioAsyncClient.java: Builder.build() constructs the client first, then
  assigns httpClient via the supplier-bound newDefaultClient when the
  caller didn't supply one. Builder.httpClient(...) and Builder.maxRetries
  Javadocs spell out the new caller-supplied-client contract.
- RetryTest.java: replace testUserSuppliedClientStillRetries (which
  exercised the now-gone wrap behaviour) with two contract tests:
  testUserSuppliedClientWithoutInterceptorDoesNotRetry and
  testUserSuppliedClientWithInterceptorRetries.

r3205557447 (thread-safety on this.httpClient reassignment)
- BaseS3Client.java:114: protected OkHttpClient httpClient ->
  protected volatile OkHttpClient httpClient. Matches the precedent of
  volatile int maxRetries on the same class. Provides JMM happens-before
  between writers (setTimeout / ignoreCertCheck reassigning the field)
  and readers (executeAsync). The local-snapshot pattern bala originally
  proposed does not solve the visibility problem; the field qualifier
  does. The "no two timeouts mid-call" property bala raised separately
  is already guaranteed by OkHttp Call binding (a Call captures its
  OkHttpClient at newCall time and the chain reuses it for retries).

r3205580978 ("createBody save/restore not needed")
- Checksum.java: rewrite the RandomAccessFile branch of update() to use
  positional FileChannel.read(ByteBuffer, long) so the method no longer
  mutates the file pointer. Side-effect-free helper now; callers don't
  need to bracket with getFilePointer / seek.
- BaseS3Client.java: drop the createBody fileStartPos capture and the
  matching seek-back. Empirically verified on a live MinIO erasure
  cluster: client.putObject(PutObjectAPIArgs) with a RandomAccessFile
  body and no pre-set checksum headers now completes successfully (the
  prior incorrect "remove these lines" world fails with EOFException
  because Http.RequestBody captures the post-checksum EOF position).
- MinioAsyncClient.java: drop the two cousin save/restore patterns
  around Checksum.update at lines ~2010 and ~3560 — both are now
  redundant for the same reason.

r3205604391 ("don't probe S3 error codes")
- Retry.java: drop RETRYABLE_S3_CODES (12-element set) and
  isS3CodeRetryable(String). Class is now narrower: HTTP status set,
  IOException classifier, and full-jitter backoff formula. minio-go-
  parity comment removed since the SDK no longer mirrors that list.
- Http.java: drop MAX_PEEK_BYTES, S3_ERROR_CODE_PATTERN, the
  peekS3ErrorCode helper, and the body-probing branch in
  RetryInterceptor.intercept. RetryInterceptor Javadoc updated to drop
  the S3-code bullet from "Retries on:". Unused
  java.util.regex.Pattern import removed (Matcher is still used
  elsewhere in the file).
- RetryTest.java: drop testIsS3CodeRetryable_retryable,
  testIsS3CodeRetryable_notRetryable, and the integration test
  testRetryOnRetryableS3CodeIn400Body. 28 RetryTest cases remain,
  all PASS.

r3206792852 ("we already have RANDOM")
- MinioAsyncClient.java:96, 3382: drop the
  java.util.concurrent.ThreadLocalRandom import and swap
  ThreadLocalRandom.current() for the inherited
  BaseS3Client.RANDOM at the fan-out object-name suffix site. The
  ThreadLocalRandom usage in Retry.exponentialBackoffMs is a
  separate, hot-path concern (added in 60aeaed to address Copilot
  review 3191258329 about RANDOM contention); bala did not flag it
  and it remains.

Comment trim
- WHY-only comments across the six files: removed restatements of
  what the code does, kept only non-obvious rationale (order
  constraints, cross-class contracts, gotchas). Javadocs trimmed to
  the user-facing contract; mechanics moved out.

Validated on Java 17:
- :api:check (compile + spotless + spotbugs + 28 RetryTest cases) PASS
- :api:javadoc PASS
- Live mint-erasure compose cluster + ReproUploadObject driving the
  no-pre-set-checksum-headers PUT path: PASS

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@allanrogerr allanrogerr requested a review from balamurugana May 8, 2026 19:17
Five reviewer threads addressed plus a WHY-only comment trim across the
touched files. Each thread is mapped to the reviewer's discussion ID and
verified locally against :api:check / :api:javadoc / a production-cluster
repro on the local mint-erasure compose.

r3205524097 / r3206746125 ("we should not wrap")
- BaseS3Client.java: drop wrapWithRetry() and the matching call sites in
  both ctors. Caller-supplied OkHttpClient is now used verbatim — no
  newBuilder(), no interceptor strip-and-add, no retryOnConnectionFailure
  override — per bala's principle that the user's client is the user's
  client. Field/Javadoc updates to setMaxRetries, executeAsync, the
  copy-ctor, and the maxRetries field reflect the new contract.
- Http.java: add newDefaultClient(IntSupplier) overload so the SDK-default
  client can have its retry budget bound to BaseS3Client.maxRetries via a
  supplier. The no-arg form delegates with () -> Retry.MAX_RETRY for
  backward compatibility (MinioAdminClient, functional/TestArgs).
- MinioAsyncClient.java: Builder.build() constructs the client first, then
  assigns httpClient via the supplier-bound newDefaultClient when the
  caller didn't supply one. Builder.httpClient(...) and Builder.maxRetries
  Javadocs spell out the new caller-supplied-client contract.
- RetryTest.java: replace testUserSuppliedClientStillRetries (which
  exercised the now-gone wrap behaviour) with two contract tests:
  testUserSuppliedClientWithoutInterceptorDoesNotRetry and
  testUserSuppliedClientWithInterceptorRetries.

r3205557447 (thread-safety on this.httpClient reassignment)
- BaseS3Client.java:114: protected OkHttpClient httpClient ->
  protected volatile OkHttpClient httpClient. Matches the precedent of
  volatile int maxRetries on the same class. Provides JMM happens-before
  between writers (setTimeout / ignoreCertCheck reassigning the field)
  and readers (executeAsync). The local-snapshot pattern bala originally
  proposed does not solve the visibility problem; the field qualifier
  does. The "no two timeouts mid-call" property bala raised separately
  is already guaranteed by OkHttp Call binding (a Call captures its
  OkHttpClient at newCall time and the chain reuses it for retries).

r3205580978 ("createBody save/restore not needed")
- Checksum.java: rewrite the RandomAccessFile branch of update() to use
  positional FileChannel.read(ByteBuffer, long) so the method no longer
  mutates the file pointer. Side-effect-free helper now; callers don't
  need to bracket with getFilePointer / seek.
- BaseS3Client.java: drop the createBody fileStartPos capture and the
  matching seek-back. Empirically verified on a live MinIO erasure
  cluster: client.putObject(PutObjectAPIArgs) with a RandomAccessFile
  body and no pre-set checksum headers now completes successfully (the
  prior incorrect "remove these lines" world fails with EOFException
  because Http.RequestBody captures the post-checksum EOF position).
- MinioAsyncClient.java: drop the two cousin save/restore patterns
  around Checksum.update at lines ~2010 and ~3560 — both are now
  redundant for the same reason.

r3205604391 ("don't probe S3 error codes")
- Retry.java: drop RETRYABLE_S3_CODES (12-element set) and
  isS3CodeRetryable(String). Class is now narrower: HTTP status set,
  IOException classifier, and full-jitter backoff formula. minio-go-
  parity comment removed since the SDK no longer mirrors that list.
- Http.java: drop MAX_PEEK_BYTES, S3_ERROR_CODE_PATTERN, the
  peekS3ErrorCode helper, and the body-probing branch in
  RetryInterceptor.intercept. RetryInterceptor Javadoc updated to drop
  the S3-code bullet from "Retries on:". Unused
  java.util.regex.Pattern import removed (Matcher is still used
  elsewhere in the file).
- RetryTest.java: drop testIsS3CodeRetryable_retryable,
  testIsS3CodeRetryable_notRetryable, and the integration test
  testRetryOnRetryableS3CodeIn400Body. 28 RetryTest cases remain,
  all PASS.

r3206792852 ("we already have RANDOM")
- MinioAsyncClient.java:96, 3382: drop the
  java.util.concurrent.ThreadLocalRandom import and swap
  ThreadLocalRandom.current() for the inherited
  BaseS3Client.RANDOM at the fan-out object-name suffix site. The
  ThreadLocalRandom usage in Retry.exponentialBackoffMs is a
  separate, hot-path concern (added in 60aeaed to address Copilot
  review 3191258329 about RANDOM contention); bala did not flag it
  and it remains.

Comment trim
- WHY-only comments across the six files: removed restatements of
  what the code does, kept only non-obvious rationale (order
  constraints, cross-class contracts, gotchas). Javadocs trimmed to
  the user-facing contract; mechanics moved out.

Validated on Java 17:
- :api:check (compile + spotless + spotbugs + 28 RetryTest cases) PASS
- :api:javadoc PASS
- Live mint-erasure compose cluster + ReproUploadObject driving the
  no-pre-set-checksum-headers PUT path: PASS
@allanrogerr allanrogerr force-pushed the feature/retry-mechanism branch from 14bcc29 to 93e4f47 Compare May 8, 2026 19:19
@allanrogerr
Copy link
Copy Markdown
Author

All comments addressed

Three changes:

(1) Retry.java is folded into Http.RetryInterceptor as package-private
static members (MAX_RETRY, DEFAULT_RETRY_UNIT_MS, DEFAULT_RETRY_CAP_MS,
MAX_JITTER, RETRYABLE_HTTP_STATUS_CODES, isHttpStatusRetryable,
isRequestErrorRetryable, exponentialBackoffMs). The class was a one-use
container for the interceptor's helpers; folding shrinks the package
surface. RetryTest call sites updated to Http.RetryInterceptor.X.

(2) BaseS3Client.createBody save/restore bracket dropped (r3206778779,
"It is not needed for Java"). For the file-PUT paths that go through
MinioAsyncClient, the upstream brackets at L1995/L3540 leave the file
pointer at start-of-payload, and the pre-computed Checksum.makeHeaders
headers short-circuit every Checksum.update guard inside createBody --
so the pointer never moves there. Http.RequestBody.writeTo seeks to
the captured start-of-payload on every attempt, so the retry
interceptor's repeated writeTo calls replay the same bytes. createBody
is now byte-identical to master.

(3) Checksum.java reverted to master form. The positional
FileChannel.read(ByteBuffer, long) variant added in 93e4f47 exists
only to make the bracket in createBody redundant; once the bracket is
gone, the variant is no longer needed and Checksum.update returns to
the master file.read(buf, off, len) form. MinioAsyncClient L1995/L3540
brackets remain (they match master verbatim).

Empirical validation -- two new tests in RetryTest:
- testRetryWithFileBody_replaysSameBytesOnEveryAttempt: 8 KiB random
  payload, MockWebServer queue 503/503/503/200; asserts 4 PUT requests
  recorded and each body is byte-identical to the original payload.
- testRetryWithFileBody_transportFailureReplaysFromStart: 4 KiB
  payload, DISCONNECT_AT_START followed by 200; asserts the retry
  attempt's body matches the original payload after seek-back.

:api:check (compile + spotless + spotbugs + 30 RetryTest cases) PASS
:api:javadoc PASS
@allanrogerr allanrogerr force-pushed the feature/retry-mechanism branch from c2224b1 to 0ae488b Compare May 9, 2026 01:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test-framework Tests and test framework changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants