Skip to content

Conversation

randygrok
Copy link
Contributor

@randygrok randygrok commented Oct 3, 2025

Overview

Closes #2654

Copy link
Contributor

github-actions bot commented Oct 3, 2025

The latest Buf updates on your PR. Results from workflow CI and Release / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedOct 6, 2025, 8:37 AM

Copy link
Contributor

Summary of Changes

Hello @randygrok, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant enhancement to the caching mechanism by implementing a generic iterator pattern for pending data and headers. This change allows for more memory-efficient and scalable processing of pending items by providing a streaming interface, eliminating the need to materialize entire collections. It also includes a minor refactoring for improved clarity and is backed by new benchmarks and thorough testing to ensure robustness and performance.

Highlights

  • New Generic Iterator Implementation: A new generic Iterator[T] struct has been introduced, providing a standardized and efficient way to traverse pending items (data or headers) one by one, rather than loading all into memory.
  • Pending Base Integration: The core pendingBase struct now includes an iterator() method, which leverages the new generic Iterator to enable streaming access to pending items based on their last submitted height.
  • Improved Clarity and Naming: The lastHeight field within pendingBase has been renamed to lastSubmittedHeight to more accurately reflect its purpose and improve code readability.
  • Comprehensive Testing and Benchmarking: Extensive unit tests have been added to validate the iterator's functionality, including order of emission and error handling. New benchmarks were also introduced to measure the performance of the iterator for both pending data and headers.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an iterator-based approach for accessing pending items in pendingBase, which is a significant improvement over the previous method of loading all items into memory. This change will reduce memory consumption, especially when dealing with a large number of pending items. The implementation is clean, includes comprehensive unit tests and new benchmarks, and improves code clarity by renaming lastHeight to lastSubmittedHeight. Overall, this is a high-quality contribution. I have one minor suggestion for improving code consistency.

Comment on lines +115 to +116
return Iterator[T]{}, fmt.Errorf("height of last submitted "+
"item (%d) is greater than height of last item (%d)", lastSubmitted, height)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For consistency with a similar error message in getPending (line 52) and for improved readability, it's better to have this error string on a single line.

return Iterator[T]{}, fmt.Errorf("height of last submitted item (%d) is greater than height of last item (%d)", lastSubmitted, height)

Copy link
Contributor

claude bot commented Oct 3, 2025

Code Review for PR #2728: Iterator Pending Base

Overview

This PR introduces an iterator pattern for pending data and headers in the cache layer, aimed at improving performance when retrieving pending items. The implementation is clean and follows good Go practices.


✅ Strengths

1. Good Design Decisions

  • Generic iterator implementation (Iterator[T]) properly leverages Go generics
  • Consistent error handling with io.EOF convention
  • Thread-safe atomic operations for height tracking
  • Well-structured tests with good coverage of edge cases

2. Code Quality

  • Clear separation of concerns with pendingBase[T] abstraction
  • Comprehensive test coverage including error cases, ordering, and both concrete types
  • Benchmarks included to validate performance improvements (addresses issue Improve cache performace with iterators #2654)
  • Follows existing code patterns in the repository

3. Naming Improvements

  • lastHeightlastSubmittedHeight rename improves clarity (block/internal/cache/pending_base.go:24)

⚠️ Issues & Concerns

1. Missing Resource Cleanup (Memory Leak Risk)

Severity: Medium

The Iterator[T] stores a context but has no Close() method. While the current implementation doesn't hold resources that need cleanup, this creates a fragile API:

  • If future changes add resource acquisition (e.g., database cursors, locks), users won't have a cleanup mechanism
  • Go iterator patterns typically implement Close() for consistency with io.Closer
  • The stored ctx in the iterator is never canceled/checked after creation

Recommendation:

type Iterator[T any] struct {
    base      *pendingBase[T]
    ctx       context.Context
    cursor    uint64
    limit     uint64
    exhausted bool
}

func (it *Iterator[T]) Close() error {
    // Currently no-op, but provides future-proofing
    return nil
}

Update tests to include defer iter.Close() pattern.

2. Context Cancellation Not Respected During Iteration

Severity: Medium

The iterator stores ctx at creation (iterator.go:10, pending_base.go:121) but never checks if it's been canceled during iteration. A long-running iteration over many items could continue even after the context is canceled.

Current behavior:

func (it *Iterator[T]) Next() (T, bool, error) {
    // ... no ctx.Done() check
    val, err := it.base.fetch(it.ctx, it.base.store, it.cursor)
    // ...
}

Recommendation:

func (it *Iterator[T]) Next() (T, bool, error) {
    var zero T
    
    // Check context cancellation
    select {
    case <-it.ctx.Done():
        it.exhausted = true
        return zero, false, it.ctx.Err()
    default:
    }
    
    // ... rest of implementation
}

3. Race Condition: Iterator Snapshot Inconsistency

Severity: Low-Medium

The iterator captures limit at creation time (pending_base.go:123), but lastSubmittedHeight can be updated concurrently via SetLastSubmittedHeaderHeight() or SetLastSubmittedDataHeight(). This could cause:

  1. Iterator created with range [5, 10]
  2. SetLastSubmittedHeight(7) called during iteration
  3. Iterator returns items 5-10, but items 5-7 are no longer "pending"

This may be acceptable depending on use case, but should be documented. If strict consistency is needed, consider:

  • Documenting the snapshot behavior explicitly
  • OR holding a read lock during iteration
  • OR validating items are still pending in Next()

4. Nil Iterator Has Confusing Behavior

Severity: Low

func (pb *pendingBase[T]) iterator(ctx context.Context) (Iterator[T], error) {
    // ...
    if lastSubmitted == height {
        return Iterator[T]{}, nil  // Returns zero-value iterator
    }

This returns an iterator with base == nil, which is checked in Next() (iterator.go:19). However, this is confusing API design:

Recommendation:

  • Return nil, nil for empty case and update signature to *Iterator[T]
  • OR document that zero-value iterators are valid and immediately exhausted

5. Missing Edge Case: Height Overflow

Severity: Low

While unlikely with uint64, the iterator increment (iterator.go:36) doesn't guard against overflow:

it.cursor++  // Could theoretically overflow if cursor == math.MaxUint64

Consider adding a check or documenting that heights are bounded.

6. Removed init() Methods Without Documentation

The PR removes public init() methods from PendingData and PendingHeaders (pending_data.go, pending_headers.go). If these were part of the public API, this is a breaking change. Verify no external callers exist.


🔒 Security Considerations

No critical security issues identified.

Minor notes:

  • Context handling improvements (issue Peer discovery #2) would help prevent resource exhaustion
  • The atomic operations for height tracking are appropriate for concurrent access

⚡ Performance Considerations

Positive:

  • Iterator pattern avoids allocating full slice upfront (good for memory)
  • Benchmarks show the iterator is properly instrumented
  • Atomic operations are efficient for the use case

Suggestions:

  1. The iterator still fetches items one-by-one via fetch(). Consider if batching (e.g., fetch 10 at a time) would improve performance for the underlying store
  2. Benchmark comparison: Add a benchmark comparing Iterator() vs GetPending() to quantify the improvement
  3. The b.Loop() syntax in benchmarks (bench_test.go:113, 160) should be b.N - this appears to be using a newer testing API that may not be available in all Go versions

📝 Test Coverage

Excellent coverage overall:

  • ✅ Ordering verification (pending_base_test.go:86-130)
  • ✅ Error handling during iteration (pending_base_test.go:132-168)
  • ✅ Both concrete types tested (PendingHeaders, PendingData)
  • ✅ Benchmarks included

Missing test cases:

  • ❌ Context cancellation during iteration
  • ❌ Concurrent modification (height changes while iterating)
  • ❌ Empty iterator behavior (when lastSubmitted == height)
  • ❌ Iterator reuse (calling Next() after exhaustion multiple times)

📋 Recommendations Summary

Must Fix:

  1. Add context cancellation checks in Iterator.Next() (issue Peer discovery #2)
  2. Add Close() method to Iterator for resource safety (issue ABCI interface #1)

Should Fix:
3. Document iterator snapshot behavior vs concurrent height updates (issue #3)
4. Clarify nil/empty iterator handling (issue #4)
5. Add missing test cases for edge conditions

Nice to Have:
6. Consider batching fetches for better performance
7. Add comparative benchmarks (Iterator vs GetPending)
8. Verify b.Loop() compatibility with target Go version


✅ Final Verdict

Overall: Good implementation with some important refinements needed.

The code is well-structured and addresses the performance issue described in #2654. The main concerns are around resource cleanup and context handling, which are important for production robustness. Once the "Must Fix" items are addressed, this will be a solid addition.

Recommendation: Request changes for context handling and Close() method before merging.

Copy link

codecov bot commented Oct 3, 2025

Codecov Report

❌ Patch coverage is 66.66667% with 17 lines in your changes missing coverage. Please review.
✅ Project coverage is 62.31%. Comparing base (298f62e) to head (2c55114).

Files with missing lines Patch % Lines
block/internal/cache/pending_base.go 58.33% 7 Missing and 3 partials ⚠️
block/internal/cache/iterator.go 69.56% 5 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2728      +/-   ##
==========================================
+ Coverage   62.28%   62.31%   +0.03%     
==========================================
  Files          79       80       +1     
  Lines        8490     8532      +42     
==========================================
+ Hits         5288     5317      +29     
- Misses       2711     2719       +8     
- Partials      491      496       +5     
Flag Coverage Δ
combined 62.31% <66.66%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@randygrok randygrok changed the title feat: iterator pending base feat: iterator pending base WIP Oct 3, 2025
@randygrok randygrok marked this pull request as draft October 3, 2025 11:23
@randygrok
Copy link
Contributor Author

Unfortunately we can't do at the curremt implementation the usa of Datastore.Query from the badger lib because we encode keys as


func getHeaderKey(height uint64) string {
    return GenerateKey([]string{headerPrefix, strconv.FormatUint(height, 10)})
}

that does not guarantee the ordering of the keys by height, so we would need to load all and then order keys, so we lose all the benefits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

Improve cache performace with iterators
1 participant