fixed concurrent translog download failover#20906
fixed concurrent translog download failover#20906ThyTran1402 wants to merge 3 commits intoopensearch-project:mainfrom
Conversation
PR Reviewer Guide 🔍(Review updated until commit f5f393a)Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Latest suggestions up to f5f393a Explore these optional code suggestions:
Previous suggestionsSuggestions up to commit 680aff8
Suggestions up to commit de9745c
Suggestions up to commit d9a49de
|
|
Persistent review updated to latest commit de9745c |
|
❌ Gradle check result for de9745c: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
de9745c to
680aff8
Compare
|
Persistent review updated to latest commit 680aff8 |
Signed-off-by: Thy Tran <58045538+ThyTran1402@users.noreply.github.com>
Signed-off-by: Thy Tran <58045538+ThyTran1402@users.noreply.github.com>
680aff8 to
f5f393a
Compare
|
Persistent review updated to latest commit f5f393a |
|
❌ Gradle check result for f5f393a: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Description
During failover and primary relocations, OpenSearch downloads all translog generations from remote store before opening the engine. Previously this was done sequentially and one generation at a time in a blocking for-loop inside
RemoteFsTranslog.downloadOnce(). Since translog generations are independent of each other, this was a pure serial I/O bottleneck that scaled linearly with the number of retained generations.This change parallelises translog generation downloads using the existing
TRANSLOG_TRANSFERthreadpool (already used for uploads), following the sameCountDownLatch+ worker-queue pattern established for concurrent segment downloads for #10519.Related Issues
Resolves #10826
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.