Skip to content

Conversation

@randygrok
Copy link
Contributor

Overview

- Introduced a new `fetchBlobs` function to streamline the process of retrieving blobs from the DA client.
- Replaced direct calls to `client.DA.GetIDs` and `client.DA.Get` with the new `fetchBlobs` function to handle errors more gracefully.
- Added custom error types for better clarity when blobs are not found or when a height is in the future.
- Updated blob processing logic to work with the new batch structure, improving efficiency and readability.
- Modified tests to reflect changes in ID creation and splitting, ensuring compatibility with the new blob handling logic.
@github-actions
Copy link
Contributor

github-actions bot commented Nov 28, 2025

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedNov 28, 2025, 5:17 PM

@claude
Copy link
Contributor

claude bot commented Nov 28, 2025

Claude finished @randygrok's task —— View job


Code Review Complete

I've completed a comprehensive review of this large refactoring PR that migrates from the DA interface to the Blob API. This is a significant architectural change that removes ~5000 lines while adding ~2200 new lines across 81 files.


Summary

This PR successfully refactors the codebase to use a streamlined Blob API instead of the previous DA interface. The migration centralizes blob operations into pkg/blob/, removes the core/da/ abstraction, and updates all consumers to use the new patterns. Overall, the refactoring improves code clarity and reduces complexity.

Strengths:

  • Clean separation of concerns with new pkg/blob/ package
  • Removal of redundant abstractions in core/da/
  • Improved error handling with explicit status codes
  • Good test coverage for new blob functionality
  • Proper handling of namespace validation

Areas of Concern:

  • Test coverage dropped from 64.81% to 57.97% (6.84% decrease)
  • 285 lines in patch missing coverage
  • Several security and concurrency issues identified below
  • Missing input validation in critical paths
  • Potential goroutine leaks

Critical Issues

1. Concurrency Safety in da_submitter.go (block/internal/submitting/da_submitter.go:534-581)

The marshalItems function spawns goroutines without proper cleanup on early exit:

for i, item := range items {
    go func(idx int, itm T) {
        sem <- struct{}{}
        defer func() { <-sem }()
        // ...
    }(i, item)
}

Issue: If context is canceled or an error occurs, goroutines may not complete before the function returns, causing:

  • Goroutine leaks
  • Writes to closed resultCh
  • Race conditions on marshaled slice

Recommendation: Use a proper worker pool pattern with context cancellation or wait for all goroutines to complete even on error.


2. Missing Input Validation (block/internal/da/client.go:78-114)

The Submit method doesn't validate data before processing:

func (c *client) Submit(ctx context.Context, data [][]byte, namespace []byte, options []byte) datypes.ResultSubmit {
    var blobSize uint64
    for _, b := range data {
        blobSize += uint64(len(b))  // No check for nil or empty
    }

Issues:

  • No validation for nil data array
  • No check for individual nil elements
  • Integer overflow possible on blobSize accumulation

Recommendation:

if data == nil {
    return datypes.ResultSubmit{
        BaseResult: datypes.BaseResult{
            Code: datypes.StatusError,
            Message: "data cannot be nil",
        },
    }
}

3. Error String Matching is Fragile (block/internal/da/client.go:131-144)

Error classification uses substring matching:

switch {
case strings.Contains(err.Error(), datypes.ErrTxTimedOut.Error()):
    code = datypes.StatusNotIncludedInBlock
case strings.Contains(err.Error(), datypes.ErrTxAlreadyInMempool.Error()):
    code = datypes.StatusAlreadyInMempool

Issues:

  • Breaks if error messages change
  • Won't work if errors are wrapped differently
  • Locale-dependent error messages could break this

Recommendation: Use errors.Is() or errors.As() with sentinel errors, or implement a custom error type with status codes.


4. Unbounded Memory Growth (da/cmd/local-da/local.go:20-44)

The LocalDA implementation stores all blobs in memory indefinitely:

type LocalDA struct {
    mu          *sync.RWMutex
    blobs       map[uint64][]*blob.Blob
    timestamps  map[uint64]time.Time
    // ...
}

Issues:

  • No pruning mechanism
  • No maximum storage limit
  • Can cause OOM in long-running tests or development environments

Recommendation: Add configurable pruning (e.g., keep last N heights or blobs from last X hours).


5. Potential Panic in Block Submission (block/internal/submitting/da_submitter.go:433-444)

switch res.Code {
case datypes.StatusSuccess:
    submitted := items[:res.SubmittedCount]  // Potential panic if SubmittedCount > len(items)

Issue: If the DA layer returns SubmittedCount greater than the number of items sent, this will panic.

Recommendation: Add bounds checking:

if res.SubmittedCount > uint64(len(items)) {
    s.logger.Error().Msg("DA returned invalid submitted count")
    return fmt.Errorf("invalid submitted count: %d > %d", res.SubmittedCount, len(items))
}

Security Concerns

1. Missing Namespace Validation (pkg/namespace/namespace.go:46-54)

NewNamespaceV0 doesn't validate input length:

func NewNamespaceV0(data []byte) (*Namespace, error) {
    if len(data) > NamespaceVersionZeroDataSize {
        return nil, fmt.Errorf("data too long...")
    }
    ns := &Namespace{Version: NamespaceVersionZero}
    copy(ns.ID[NamespaceVersionZeroPrefixSize:], data)  // Silent truncation if data is short
    return ns, nil
}

Issue: Short input data is silently accepted and zero-padded, which could lead to namespace collisions.

Recommendation: Explicitly document this behavior or require exact length matching.


2. JSON Unmarshalling Without Size Limits (block/internal/da/client.go:116-126)

var submitOpts blob.SubmitOptions
if len(options) > 0 {
    if err := json.Unmarshal(options, &submitOpts); err != nil {
        // ...
    }
}

Issue: No maximum size check on options before unmarshalling. A malicious caller could provide enormous JSON causing DoS.

Recommendation: Add size validation (e.g., max 1KB for options).


3. Address Selector Security (block/internal/submitting/da_submitter.go:317-353)

The mergeSubmitOptions function handles signing addresses but doesn't validate them:

optionsMap["signer_address"] = signingAddress  // No validation

Issue: Invalid addresses could be silently accepted, leading to failed transactions or potential security issues.

Recommendation: Validate address format before merging.


Performance & Resource Management

1. Inefficient Blob Cloning (da/cmd/local-da/local.go:147-154)

func cloneBlob(b *blob.Blob) *blob.Blob {
    if b == nil {
        return nil
    }
    // Creates new blob with deep copy

Every GetAll operation clones all blobs, which is expensive for large blobs.

Recommendation: Consider returning read-only views or implementing copy-on-write semantics.


2. Context Timeout Management (block/internal/da/client.go:202-203)

getIDsCtx, cancel := context.WithTimeout(ctx, c.defaultTimeout)
defer cancel()

Issue: If ctx already has a shorter deadline than defaultTimeout, the timeout is unnecessarily extended.

Recommendation: Use context.WithDeadline and compare with existing deadline.


3. Concurrent Marshaling (block/internal/submitting/da_submitter.go:548)

Semaphore limits to 32 workers but doesn't consider actual resource usage:

sem := make(chan struct{}, 32)

Issue: For very large marshaling operations, 32 concurrent goroutines might consume excessive CPU/memory.

Recommendation: Make worker count configurable or base it on runtime.NumCPU().


Error Handling & Resilience

1. Insufficient Context Cancellation Checks (block/internal/da/client.go:78-186)

The Submit method doesn't check for context cancellation before expensive operations:

func (c *client) Submit(ctx context.Context, data [][]byte, ...) datypes.ResultSubmit {
    // No ctx.Err() check before processing
    for _, raw := range data {
        blobs[i], err = blob.NewBlobV0(ns, raw)  // Expensive operation

Recommendation: Add select { case <-ctx.Done(): ... } checks in loops.


2. Inconsistent Error Wrapping (block/internal/syncing/da_retriever.go)

Some errors are wrapped, others aren't:

return fmt.Errorf("DA retrieval failed: %s", res.Message)  // String formatting
return fmt.Errorf("%w: height from future", datypes.ErrHeightFromFuture)  // Proper wrapping

Recommendation: Consistently use %w for error wrapping to preserve error chains.


3. Silent Failures (block/internal/syncing/da_retriever.go:154-164)

Invalid headers/data are silently discarded:

if header := r.tryDecodeHeader(bz, daHeight); header != nil {
    // ...
} else {
    // Silently continue - no logging of decode failure
}

Recommendation: Add debug logging for decode failures to aid troubleshooting.


Code Quality & Architecture

1. Duplicate Namespace Conversion Logic

Namespace conversion appears in multiple places with slight variations:

  • pkg/namespace/namespace.go
  • block/internal/da/client.go
  • sequencers/single/sequencer.go

Recommendation: Consolidate into shared utility functions.


2. Missing Documentation (pkg/blob/blob.go)

Critical functions lack detailed documentation:

// MakeID constructs a blob ID by prefixing the commitment with the height (little endian).
func MakeID(height uint64, commitment Commitment) []byte {

Issue: Doesn't explain the ID format, when to use it, or how it's used across the system.

Recommendation: Add comprehensive package-level documentation explaining the blob lifecycle and ID scheme.


3. Magic Numbers (block/internal/submitting/da_submitter.go)

submissionTimeout = 60 * time.Second
initialBackoff    = 100 * time.Millisecond
sem := make(chan struct{}, 32)  // Why 32?

Recommendation: Document rationale or make configurable.


Testing Concerns

1. Test Coverage Regression

Coverage dropped 6.84% (64.81% → 57.97%). Key areas with low coverage:

File Coverage Missing Lines
block/internal/da/client.go 54.47% 56 lines
tools/da-debug/main.go 3.57% 54 lines
block/internal/da/local_blob_api.go 0% 35 lines
pkg/blob/submit_options.go 0% 26 lines
da/jsonrpc/server.go 0% 20 lines

Recommendation: Add integration tests covering the full blob submission and retrieval flow.


2. Missing Negative Test Cases

Many tests only cover happy paths. Missing tests for:

  • Blob size exactly at limit
  • Concurrent submissions to same namespace
  • Network failures mid-submission
  • Proof verification edge cases

3. Mock Overuse (block/internal/submitting/da_submitter_retry_test.go)

Tests use mocks extensively, which may not catch real integration issues:

api.On("Submit", mock.Anything, mock.Anything, mock.AnythingOfType("*blob.SubmitOptions"))

Recommendation: Balance unit tests with integration tests using real blob API implementations.


Response to Reviewer's Question

@tac0turtle's question (da/jsonrpc/client.go:16): "are we able to use subscribe here?"

Answer: The current implementation uses synchronous RPC calls for blob operations (Submit, GetAll, GetProof, Included). Adding subscription support would require:

  1. Protocol Support: The go-jsonrpc library used supports subscriptions via websockets, but celestia-node's blob namespace doesn't expose subscription methods in its current RPC API.

  2. Use Case Analysis: Subscriptions would be valuable for:

    • Real-time blob inclusion notifications (reduce polling in VerifyBatch)
    • DA height updates (notify when new heights are available)
    • Namespace-specific blob streams
  3. Implementation Considerations:

    • Would need to add subscription methods to the BlobAPI interface
    • Require websocket connection management (reconnection, cleanup)
    • Add complexity to the client lifecycle

Recommendation: Unless there's a specific performance issue with polling or a need for real-time notifications, the current synchronous approach is simpler and more reliable. If subscriptions are added to celestia-node's blob RPC, we could add them in a future PR with proper connection lifecycle management.


Minor Issues

1. Unused Field (da/jsonrpc/client.go:17-18)

Logger      zerolog.Logger  // Not used in Client struct
MaxBlobSize uint64          // Not validated in client methods

2. Inconsistent Error Messages

  • Some errors include height context, others don't
  • Mix of sentence case and lowercase error messages

3. HTTP Server Security (da/jsonrpc/server.go:73-76)

srv: &http.Server{
    Addr:              address + ":" + port,
    ReadHeaderTimeout: 2 * time.Second,  // Good!
    // Missing: ReadTimeout, WriteTimeout, IdleTimeout, MaxHeaderBytes
}

Recommendation: Add comprehensive timeouts to prevent resource exhaustion attacks.

4. Nolint Comment Without Explanation (da/jsonrpc/server.go:105)

//nolint:errcheck
go s.srv.Serve(listener)

Recommendation: Document why error can be safely ignored.


Positive Observations

  1. Good Error Status Codes: The new datypes.StatusCode system is well-designed and comprehensive.

  2. Proper Use of Context: Most functions properly accept and propagate context.Context.

  3. Clean Namespace Abstraction: The pkg/namespace/ package provides a clear interface for namespace operations.

  4. Retry Logic: The DA submitter's retry and backoff logic is well-implemented with proper exponential backoff.

  5. Test Helpers: Good use of test helpers like newTestDARetriever to reduce boilerplate.

  6. Blob ID Scheme: The MakeID/SplitID pattern using height+commitment is elegant and efficient.


Recommendations Summary

High Priority

  1. Fix goroutine leak in marshalItems
  2. Add input validation in Submit
  3. Replace string-based error matching with errors.Is()
  4. Add bounds checking for SubmittedCount
  5. Improve test coverage to previous levels

Medium Priority

  1. Add size limits for JSON unmarshalling
  2. Implement pruning for LocalDA
  3. Validate signing addresses before use
  4. Add comprehensive HTTP server timeouts
  5. Document magic numbers and rationale

Low Priority

  1. Consolidate namespace conversion logic
  2. Add package-level documentation for pkg/blob/
  3. Add negative test cases
  4. Consider making worker pool size configurable
  5. Add debug logging for silent failures

Overall Assessment: This is a solid refactoring that simplifies the architecture and improves code clarity. The main concerns are around production robustness (concurrency safety, input validation, error handling) and test coverage regression. With the critical issues addressed, this would be ready to merge.


Todo List:

  • Read repository guidelines (CLAUDE.md)
  • Understand the new Blob API architecture
  • Review core migration in block/internal/da/client.go
  • Review DA submission and retrieval logic
  • Review RPC client/server changes
  • Review test migrations and coverage
  • Check error handling and resilience
  • Check security implications
  • Check performance and resource management
  • Address reviewer's question about subscribe functionality
  • Compile comprehensive feedback

@github-actions
Copy link
Contributor

github-actions bot commented Nov 28, 2025

PR Preview Action v1.6.3

🚀 View preview at
https://evstack.github.io/docs-preview/pr-2893/

Built to branch main at 2025-11-28 17:18 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

@randygrok randygrok changed the title reafactor: migrate to blob api and remove da interface (approach 2) refactor: migrate to blob api and remove da interface (approach 2) Nov 28, 2025
Full nodes syncing from DA were not persisting the DAIncludedHeightKey
metadata, causing the E2E test to fail when querying this value.

The submitter (sequencer) already persists this key when verifying DA
inclusion. This change adds the same logic to the syncer so full nodes
also track which blocks have been DA included.
The LocalDA's GetAll method was missing the height-from-future check
that existed in the old GetIDs method. This caused the syncer to
iterate indefinitely instead of backing off when caught up with DA.

Also simplified IsHeightDAIncluded by removing unused variable.

// API defines the jsonrpc service module API
// API exposes the blob RPC methods used by the node.
type API struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we able to use subscribe here?

@codecov
Copy link

codecov bot commented Nov 28, 2025

Codecov Report

❌ Patch coverage is 41.83673% with 285 lines in your changes missing coverage. Please review.
✅ Project coverage is 57.88%. Comparing base (bab058d) to head (3eb1c92).

Files with missing lines Patch % Lines
block/internal/da/client.go 54.47% 50 Missing and 6 partials ⚠️
tools/da-debug/main.go 3.57% 53 Missing and 1 partial ⚠️
block/internal/da/local_blob_api.go 0.00% 35 Missing ⚠️
pkg/blob/submit_options.go 0.00% 26 Missing ⚠️
da/jsonrpc/server.go 0.00% 20 Missing ⚠️
block/public.go 0.00% 18 Missing ⚠️
pkg/blob/blob.go 73.52% 11 Missing and 7 partials ⚠️
pkg/rpc/server/da_visualization.go 0.00% 17 Missing ⚠️
da/jsonrpc/client.go 0.00% 16 Missing ⚠️
block/internal/submitting/da_submitter.go 50.00% 7 Missing and 1 partial ⚠️
... and 7 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2893      +/-   ##
==========================================
- Coverage   64.81%   57.88%   -6.94%     
==========================================
  Files          81       81              
  Lines        7347     7303      -44     
==========================================
- Hits         4762     4227     -535     
- Misses       2043     2558     +515     
+ Partials      542      518      -24     
Flag Coverage Δ
combined 57.88% <41.83%> (-6.94%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Fix broken markdown links by correcting the relative path depth
from ../../../ to ../../ for linking to execution/grpc and
sequencers/single READMEs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants