[CELEBORN-2226][CIP-14] Support RetryFetchChunk functionality for Cel…#3605
Draft
afterincomparableyum wants to merge 1 commit intoapache:mainfrom
Draft
[CELEBORN-2226][CIP-14] Support RetryFetchChunk functionality for Cel…#3605afterincomparableyum wants to merge 1 commit intoapache:mainfrom
afterincomparableyum wants to merge 1 commit intoapache:mainfrom
Conversation
Author
|
I will rebase off of main and then open this PR once #3583 gets merged. |
…ebornInputStream in CppClient
Implement chunk-fetch retry logic in CelebornInputStream::getNextChunk(), matching the Java CelebornInputStream behavior. When a chunk fetch fails, the retry loop excludes the failed worker, switches to the peer replica (if available), and sleeps between retry rounds before creating a new reader.
- Add getLocation() to PartitionReader interface and WorkerPartitionReader
- Replace the stub getNextChunk() with full retry logic: excluded worker
checks, peer switching, configurable retry count, sleep between retries
- Update moveToNextChunk() and moveToNextReader() to handle nullable
returns from getNextChunk()
- Add unit test for WorkerPartitionReader::getLocation()
bf7acef to
d3b5389
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implement chunk-fetch retry logic in CelebornInputStream::getNextChunk(), matching the Java CelebornInputStream behavior. When a chunk fetch fails, the retry loop excludes the failed worker, switches to the peer replica (if available), and sleeps between retry rounds before creating a new reader.
Added getLocation() to PartitionReader interface and WorkerPartitionReader
Replaced the stub getNextChunk() with full retry logic: excluded worker checks, peer switching, configurable retry count, sleep between retries
Updated moveToNextChunk() and moveToNextReader() to handle nullable returns from getNextChunk()
Added unit test for WorkerPartitionReader::getLocation()
C++ fails to compile, but after #3583 is merged, it will pass.