fix: add erasure reDecoder for evicted chunks by nugaon · Pull Request #5097 · ethersphere/bee

nugaon · 2025-05-13T15:37:21Z

Checklist

I have read the coding guide.
My change requires a documentation update, and I have done it.
I have added tests to cover my changes.
I have filled out the description and linked the related issues.

Description

During long-lived joiner operations, reconstructed chunks can be evicted from the local cache due to memory pressure. When these chunks are needed again, the current implementation doesn't attempt to recover them a second time. Instead, it falls back to direct network fetching, which fails with ErrNotFound if the chunk isn't available in the network.

This creates a reliability issue where successfully recovered chunks become inaccessible once evicted from cache, potentially breaking long-running operations like downloads or uploads of large files.

This PR implements a erasure redecoder functionality with the following components:

ReDecoder: A new wrapper that attempts network fetch first, and only falls back to erasure recovery if the network fetch fails with ErrNotFound.
Lazy Decoder Instantiation: Recovery decoders are only created on-demand when network fetch fails, saving resources.
Memory-Efficient Caching: Maintains the existing memory optimization of nulling decoders after successful recovery while keeping track of successful recoveries.

Open API Spec Version Changes (if applicable)

Motivation and Context (Optional)

Related Issue (Optional)

Screenshots (if appropriate):

martinconic · 2025-09-09T14:07:31Z

pkg/file/joiner/joiner.go

+				if ok && d != nil {
+					return d
+				}
+				d = getter.New(addrs, shardCnt, g.fetcher, g.putter, decoderCallback, g.config)


I think here it is possible to have a deadlock. At this point, the same goroutine is holding the lock from line 96 and trying to acquire it again here in decoderCallback.

decoderCallback is called in prefetch function only that is executed by a different go routine

martinconic

Is there a way to manually test this?

pkg/file/redundancy/getter/redecoder.go

nugaon · 2025-09-17T07:35:20Z

Is there a way to manually test this?

The erasure data recovery should be triggered, for example, by leaving out some chunks for upload of an erasure coded chunk tree similarly as beekeeper does. Then, attempt to retrieve the data and see what happens in case of a Bee which does not include the changes and the one which has it. The bee instances should be started with lowered size cache parameters e.g. --cache-capacity=1.

nugaon requested a review from zelig May 13, 2025 15:37

nugaon changed the title ~~fix: call recovery after cache eviction~~ fix: add erasure reDecoder for evicted chunks May 13, 2025

nugaon force-pushed the fix/rs-cache-evict branch from f6a3543 to e4dc36e Compare June 2, 2025 11:21

nugaon and others added 9 commits August 7, 2025 15:26

fix: call recovery after cache eviction

3c322a0

fix: lint

c41ec35

refactor: remove factory naming

4f369c5

fix: get redundancy getter from cache

433294d

fix: return cached value if not null

92efdbe

fix: only attempt recovery if in storage not found

94cddeb

test: redecorder

acd23d7

fix: add rsbuf nil check

6de3a56

test: fix testdata generation

1de8ca1

nugaon force-pushed the fix/rs-cache-evict branch from cb11d8b to 1de8ca1 Compare August 7, 2025 13:27

martinconic reviewed Sep 9, 2025

View reviewed changes

martinconic approved these changes Sep 16, 2025

View reviewed changes

martinconic reviewed Sep 16, 2025

View reviewed changes

pkg/file/redundancy/getter/redecoder.go Outdated Show resolved Hide resolved

chore: change year in file comment header

3ba1eea

gacevicljubisa self-requested a review September 22, 2025 12:33

gacevicljubisa approved these changes Sep 22, 2025

View reviewed changes

nugaon merged commit 7f997bd into master Sep 23, 2025
15 checks passed

nugaon deleted the fix/rs-cache-evict branch September 23, 2025 08:20

bcsorvasi added this to the v2.7.0 milestone Oct 8, 2025

v1rtl pushed a commit to v1rtl/bee that referenced this pull request Jan 22, 2026

fix: add erasure reDecoder for evicted chunks (ethersphere#5097)

102297e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: add erasure reDecoder for evicted chunks#5097

fix: add erasure reDecoder for evicted chunks#5097
nugaon merged 10 commits intomasterfrom
fix/rs-cache-evict

nugaon commented May 13, 2025

Uh oh!

martinconic Sep 9, 2025

Uh oh!

nugaon Sep 12, 2025

Uh oh!

martinconic left a comment

Uh oh!

Uh oh!

nugaon commented Sep 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

nugaon commented May 13, 2025

Checklist

Description

Open API Spec Version Changes (if applicable)

Motivation and Context (Optional)

Related Issue (Optional)

Screenshots (if appropriate):

Uh oh!

martinconic Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

nugaon Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

martinconic left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nugaon commented Sep 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants