Skip to content

Conversation

@alpe
Copy link
Contributor

@alpe alpe commented Nov 27, 2025

🚧 WIP - early version

Overview

Resolves #2609

The basic idea is to store an additional DAHightHint field within the p2p store.

As SignedHeader and Data are used in other places, too. I added an an DAHeightHintContainer type to wrap the tuple for the store only.
The DAHeight Hint is added by the da_submitter and read in the syncer to fetch the missing DA header/data for the most recent block as required.

There are still some todo with the tests and types as well as smarter block fetching to not pull duplicates or hammer the DA API.

@claude
Copy link
Contributor

claude bot commented Nov 27, 2025

Claude finished @alpe's task —— View job


Comprehensive Code Review

I've completed a thorough review of PR #2891 (feat: DA hints in p2p). Here are my findings organized by category:


1. Architecture & Design ⚠️

Strengths

  • Clean abstraction: The DAHeightHintContainer[H] generic wrapper elegantly adds DA hints without modifying core types
  • Separation of concerns: DA hint logic is properly encapsulated in the pkg/sync package
  • Async design: The AsyncDARetriever worker pool pattern is appropriate for concurrent DA fetching

Concerns

Critical: Type Wrapping Complexity (pkg/sync/da_hint_container.go)

  • The DAHeightHintContainer implements the header.Header[H] interface by delegating to the wrapped Entry. This creates a recursive type constraint: DAHeightHintContainer[V] where V itself is header.Header[V]
  • This means the store is actually storing *DAHeightHintContainer[*types.SignedHeader] instead of just *types.SignedHeader
  • Implications:
    • Breaking change to P2P wire format and storage format
    • Backward compatibility concerns - old nodes won't understand the wrapped format
    • Migration strategy needed for existing p2p stores

Recommendation: Document the migration path and consider versioning the p2p protocol.

Design Question: DA Hint Array Logic (block/internal/syncing/syncer.go:442-482)

// Complex logic to check which DA heights to fetch
case event.DaHeightHints == [2]uint64{0, 0}:
    // empty, nothing to do
case event.DaHeightHints[0] == 0:
    // check only data
    if _, exists := s.cache.GetDataDAIncluded(event.Data.Hash().String()); !exists {
        daHeightHints = []uint64{event.DaHeightHints[1]}
    }
  • The [2]uint64 array uses index 0 for header hint and index 1 for data hint
  • Issue: This is fragile and not self-documenting. A struct with named fields would be clearer:
    type DAHeightHints struct {
        HeaderDAHeight uint64
        DataDAHeight   uint64
    }

Missing: Deduplication Strategy

  • The PR mentions "smarter block fetching to not pull duplicates" as a TODO
  • Current implementation in AsyncDARetriever.RequestRetrieval() prevents in-flight duplicates but doesn't prevent fetching already-cached DA heights
  • Recommendation: Check cache before enqueueing work:
    // Before requesting retrieval, check if already cached
    if _, exists := s.cache.GetHeaderDAIncluded(...); exists {
        return // skip
    }

2. Error Handling & Resilience ⚠️

Issues Found

Silent Error Swallowing (block/internal/submitting/da_submitter.go:204-207, 253-256)

if err := s.headerDAHintAppender.AppendDAHint(ctx, res.Height, hashes...); err != nil {
    s.logger.Error().Err(err).Msg("failed to append da height hint in header p2p store")
    // ignoring error here, since we don't want to block the block submission'
}
  • Problem: Errors are logged but ignored. If hint appending fails consistently, the feature silently degrades
  • Impact: Nodes won't get DA hints, falling back to sequential DA scanning without any alert
  • Recommendation:
    • Add metrics counter for append failures
    • Consider circuit breaker pattern if failures exceed threshold
    • Log at WARN level with periodic rate limiting

Missing Context Propagation (block/internal/syncing/async_da_retriever.go:98)

events, err := r.retriever.RetrieveFromDA(r.ctx, height)
  • Uses component's root context r.ctx instead of the method's context
  • Issue: Worker shutdown might not be immediate if retrievals are slow
  • Fix: Use the context passed to processRetrieval or create a timeout context

No Backpressure on Result Channel (block/internal/syncing/async_da_retriever.go:105-110)

for _, event := range events {
    select {
    case r.resultCh <- event:
    case <-r.ctx.Done():
        return
    }
}
  • If resultCh is full, this blocks the worker indefinitely
  • Recommendation: Add timeout or drop policy:
    select {
    case r.resultCh <- event:
    case <-time.After(5*time.Second):
        r.logger.Warn().Msg("result channel full, dropping event")
    case <-r.ctx.Done():
        return
    }

3. Concurrency & Race Conditions ⚠️

Issues Found

Potential Race in InFlight Map (block/internal/syncing/async_da_retriever.go:61-76)

func (r *AsyncDARetriever) RequestRetrieval(height uint64) {
    r.mu.Lock()
    defer r.mu.Unlock()

    if _, exists := r.inFlight[height]; exists {
        return
    }

    select {
    case r.workCh <- height:
        r.inFlight[height] = struct{}{}
    default:
        r.logger.Debug().Msg("DA retrieval worker pool full, dropping request")
    }
}
  • Correctness: The lock is held while sending to channel, which is good
  • Performance: Holding mutex while potentially blocking on channel send could cause contention
  • Minor improvement: Consider buffered channel size tuning (currently 100)

Worker Pool Size Hardcoded (block/internal/syncing/async_da_retriever.go:42-46)

// Start 5 workers
for i := 0; i < 5; i++ {
    r.wg.Add(1)
    go r.worker()
}
  • Issue: Magic number 5 is hardcoded
  • Recommendation: Make configurable or adjust based on DA layer characteristics

4. Security 🟢

Good Practices

  • ✅ No credential/secret handling in this code
  • ✅ Input validation in P2PHandler.ProcessHeight checks proposer address
  • ✅ DA hints are checked against cache before use (prevents malicious hints)

Minor Concerns

  • The DA hints are not cryptographically signed or verified
  • A malicious peer could provide wrong hints, causing unnecessary DA fetches
  • Mitigation: This is acceptable as hints are optimization only - verification still happens

5. Performance & Resource Management ⚠️

Issues Found

Unbounded Memory Growth Risk (block/internal/syncing/async_da_retriever.go:16)

inFlight  map[uint64]struct{}
  • The inFlight map tracks all in-flight requests
  • Issue: If workers get stuck or DA is very slow, this map could grow unbounded
  • Recommendation: Add periodic cleanup of stale entries or TTL

Buffer Sizes (block/internal/syncing/async_da_retriever.go:33)

workCh:    make(chan uint64, 100), // Buffer size 100
  • Buffer size 100 seems arbitrary
  • Recommendation: Document rationale or make configurable

Syncer Height Channel (block/internal/syncing/syncer.go:104)

heightInCh:        make(chan common.DAHeightEvent, 1_000),
  • Buffer of 1000 events could consume significant memory (each event contains full header + data)
  • Estimate: If each event is ~100KB, this is 100MB of buffered data
  • Recommendation: Consider reducing buffer or using a queue with size limits

6. Test Coverage 🔴

Critical Gaps (per Codecov)

block/internal/syncing/syncer.go - 38 missing lines + 1 partial

  • The DA hint logic in processHeightEvent (lines 442-482) appears untested
  • Critical path: targeted DA retrieval via async retriever
  • Required tests:
    • Test P2P event with both hints populated
    • Test P2P event with only header hint
    • Test P2P event with only data hint
    • Test deduplication (don't fetch if already in cache)

pkg/sync/sync_service.go - 21 missing lines + 1 partial

  • AppendDAHint method (lines 184-198) lacks comprehensive tests
  • Required tests:
    • Test successful append
    • Test ErrNotFound handling
    • Test partial failure (some hashes not found)

pkg/sync/da_hint_container.go - 10 missing lines + 2 partials

  • Marshaling/unmarshaling edge cases not tested
  • Required tests:
    • Test UnmarshalBinary with invalid length (<8 bytes)
    • Test round-trip with various DA heights
    • Test IsZero method

Test Quality Issues

Async DA Retriever Tests (block/internal/syncing/async_da_retriever_test.go)

  • ✅ Good: Tests deduplication and worker pool limits
  • ⚠️ Missing: Error handling tests (what happens when RetrieveFromDA fails?)
  • ⚠️ Missing: Context cancellation tests
  • ⚠️ Missing: Result channel backpressure tests

7. Documentation 🔴

Missing Documentation

No Package-Level Documentation

  • pkg/sync/da_hint_container.go lacks package comment explaining the DA hint feature
  • Required: Add comprehensive package documentation explaining:
    • What DA hints are and why they're needed
    • How the hint flow works (DA submission → p2p storage → syncer retrieval)
    • Migration/compatibility considerations

Undocumented Type Constraints (pkg/sync/da_hint_container.go:15-18)

type DAHeightHintContainer[H header.Header[H]] struct {
    Entry        H
    DAHeightHint uint64
}
  • No explanation of why this implements header.Header interface
  • No explanation of the recursive type constraint
  • Add: Type-level comments explaining the design

Magic Numbers throughout codebase:

  • Worker pool size: 5
  • Work channel buffer: 100
  • Height channel buffer: 1000
  • Fix: Add constants with explanatory comments

8. Code Quality & Style 🟡

Good Practices

  • ✅ Proper use of atomic operations for processedHeight, daRetrieverHeight
  • ✅ Structured logging with context
  • ✅ Proper use of errgroup for coordinated goroutines

Issues

Comment Typo (block/internal/submitting/da_submitter.go:206)

// ignoring error here, since we don't want to block the block submission'
  • Extra apostrophe at end
  • Should be: "block the block submission"

Inconsistent Error Messages

  • Some say "failed to X", others say "error while X"
  • Recommendation: Standardize on "failed to X"

SetProcessedHeight Loop (block/internal/syncing/p2p_handler.go:63-71)

for range 1_000 {
    current := h.processedHeight.Load()
    if height <= current {
        return
    }
    if h.processedHeight.CompareAndSwap(current, height) {
        return
    }
}
  • Issue: Loops 1000 times on CAS failure, then silently fails
  • Extremely unlikely to hit 1000 iterations, but if it does, failure is silent
  • Recommendation: Log error if loop exhausts without success

Priority Recommendations

🔴 Must Fix Before Merge

  1. Add comprehensive tests for the DA hint retrieval path in syncer.go (lines 442-482)
  2. Test AppendDAHint failure scenarios and document behavior
  3. Document migration strategy for the p2p store format change
  4. Add metrics for hint append failures

🟡 Should Fix Before Merge

  1. Replace [2]uint64 with named struct for DA height hints
  2. Add timeout/drop policy for AsyncDARetriever result channel
  3. Make worker pool size configurable
  4. Add deduplication check before enqueueing DA retrieval work

🟢 Nice to Have

  1. Add circuit breaker for persistent hint append failures
  2. Add benchmarks for DAHeightHintContainer marshaling
  3. Add integration test for end-to-end DA hint flow
  4. Document buffer size choices

Test Coverage Analysis

Current patch coverage: 43.79% (77 lines missing)

Target: Aim for >80% coverage by adding:

  • Unit tests for DA hint logic in syncer
  • Unit tests for AppendDAHint error paths
  • Integration tests for async DA retriever with failures
  • Edge case tests for marshaling/unmarshaling

Summary

This PR introduces a valuable optimization for DA block retrieval. The core design is sound, but there are gaps in error handling, testing, and documentation that should be addressed.

Overall Assessment: The implementation is functional but needs polish before production readiness. The async DA retriever pattern is well-designed, but error handling needs improvement. Test coverage is insufficient for a feature that modifies critical sync paths.

Estimated Effort to Address:

  • Must-fix items: 4-6 hours
  • Should-fix items: 3-4 hours

@github-actions
Copy link
Contributor

github-actions bot commented Nov 27, 2025

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedNov 28, 2025, 4:20 PM

@alpe alpe changed the title Alex/hints feat: DA hints in p2p Nov 28, 2025
@codecov
Copy link

codecov bot commented Nov 28, 2025

Codecov Report

❌ Patch coverage is 75.97765% with 43 lines in your changes missing coverage. Please review.
✅ Project coverage is 65.09%. Comparing base (830ae45) to head (aaacbde).
⚠️ Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
block/internal/syncing/syncer.go 58.62% 12 Missing ⚠️
pkg/sync/sync_service.go 63.63% 9 Missing and 3 partials ⚠️
pkg/sync/da_hint_container.go 77.77% 6 Missing and 2 partials ⚠️
block/internal/syncing/async_da_retriever.go 86.53% 6 Missing and 1 partial ⚠️
block/internal/submitting/da_submitter.go 80.95% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2891      +/-   ##
==========================================
+ Coverage   64.76%   65.09%   +0.33%     
==========================================
  Files          81       83       +2     
  Lines        7350     7492     +142     
==========================================
+ Hits         4760     4877     +117     
- Misses       2049     2068      +19     
- Partials      541      547       +6     
Flag Coverage Δ
combined 65.09% <75.97%> (+0.33%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sync: P2P should provide da inclusion hints

2 participants