Skip to content

Conversation

schase-es
Copy link
Contributor

@schase-es schase-es commented Sep 27, 2025

The MinIO repository analyze test is failing in builds sometimes because of
a timeout. This change increases it from 2 minutes to 5 minutes.

Closes: #134853

The MinIO repository analyze test is failing in builds sometimes because of a
timeout. This change increases it from 2 minutes to 5 minutes.
@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label v9.2.0 labels Sep 27, 2025
@DiannaHohensee DiannaHohensee added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed Coordination Meta label for Distributed Coordination team >test Issues or PRs that are addressing/adding tests and removed needs:triage Requires assignment of a team area label labels Sep 29, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

@DiannaHohensee
Copy link
Contributor

@schase-es could you link the test failure(s)?

@schase-es
Copy link
Contributor Author

Yes -- #134853

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I suspect this test failure might represents a continuation of the saga in MinIO around concurrent UploadPart and AbortMultipartUpload calls:

The change that closed minio/minio#21456 might not actually have fixed the issue, it might have just made it so that retrying the AbortMultipartUpload call eventually cleans things up. I'd rather we ruled out such issues in MinIO before just blindly increasing the timeout. 2 minutes should already be ample time to complete this test.

@schase-es
Copy link
Contributor Author

schase-es commented Oct 7, 2025

@DaveCTurner thanks for the added context -- I'm realizing I need to do some searching around and git-blaming to look at things like this more. I did need to get docker setup before I could test this locally -- so I was very blind.

The docker image for testing this is the 9/07 release. Most of the tests are around 20 seconds. But the reproduction string is running at about 3'20":

./gradlew ":x-pack:plugin:snapshot-repo-test-kit:qa:minio:javaRestTest" --tests "org.elasticsearch.repositories.blobstore.testkit.analyze.MinioRepositoryAnalysisRestIT.testRepositoryAnalysis" -Dtests.seed=43BF8E5368D66EDC -Dtests.locale=asa-TZ -Dtests.timezone=America/Thunder_Bay -Druntime.java=24

I tried:

  • adding back in the settings from the PR you linked: time_to_live and anti_contention_delay
  • increasing settings like throttled_delete_retry.delay_increment and throttled_delete_retry.delay_increment
  • lowering the cooldown_period to one minute (three minutes and 20 would be perfect for hitting this)
  • in S3BlobContainer::get_register, temporarily removing the noBackoff policy for the analysis pathway

I'm still seeing 3'20" with various combinations of these.

Do you have any ideas about what this could be hitting?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed Coordination Meta label for Distributed Coordination team >test Issues or PRs that are addressing/adding tests v9.3.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CI] MinioRepositoryAnalysisRestIT testRepositoryAnalysis failing
4 participants