fix: add erasure reDecoder for evicted chunks#5097
Conversation
f6a3543 to
e4dc36e
Compare
cb11d8b to
1de8ca1
Compare
| if ok && d != nil { | ||
| return d | ||
| } | ||
| d = getter.New(addrs, shardCnt, g.fetcher, g.putter, decoderCallback, g.config) |
There was a problem hiding this comment.
I think here it is possible to have a deadlock. At this point, the same goroutine is holding the lock from line 96 and trying to acquire it again here in decoderCallback.
There was a problem hiding this comment.
decoderCallback is called in prefetch function only that is executed by a different go routine
martinconic
left a comment
There was a problem hiding this comment.
Is there a way to manually test this?
The erasure data recovery should be triggered, for example, by leaving out some chunks for upload of an erasure coded chunk tree similarly as beekeeper does. Then, attempt to retrieve the data and see what happens in case of a Bee which does not include the changes and the one which has it. The bee instances should be started with lowered size cache parameters e.g. |
Checklist
Description
During long-lived joiner operations, reconstructed chunks can be evicted from the local cache due to memory pressure. When these chunks are needed again, the current implementation doesn't attempt to recover them a second time. Instead, it falls back to direct network fetching, which fails with ErrNotFound if the chunk isn't available in the network.
This creates a reliability issue where successfully recovered chunks become inaccessible once evicted from cache, potentially breaking long-running operations like downloads or uploads of large files.
This PR implements a erasure redecoder functionality with the following components:
Open API Spec Version Changes (if applicable)
Motivation and Context (Optional)
Related Issue (Optional)
Screenshots (if appropriate):