Fix Poller Behavior #279

jakeloo · 2025-08-30T07:02:33Z

Summary by CodeRabbit

New Features
- Parallel block polling with lookahead for faster, more responsive syncing and graceful shutdown.
- Configurable TTL-based eviction for staging data and configurable maintenance intervals (GC, cache refresh, staleness) to improve storage hygiene and freshness.
- Increased chain update frequency to every 1 minute for timelier state tracking.
Chores
- Enhanced logs and metrics around staging, publishing/committing, and worker runs.
- Simplified startup by removing an unnecessary background monitor.

coderabbitai · 2025-08-30T07:03:00Z

Walkthrough

Updates span orchestrator polling architecture, storage TTL and maintenance intervals, and observability/logging. Poller is redesigned for concurrent range processing with lookahead and explicit lifecycle. Orchestrator removes WorkModeMonitor. ChainTracker poll interval reduced to 1 minute. Storage adds TTL-backed staging and configurable maintenance tickers. Several logs/metrics are refined.

Changes

Cohort / File(s)	Summary
Polling & Orchestrator `internal/orchestrator/poller.go`, `internal/orchestrator/orchestrator.go`, `internal/orchestrator/chain_tracker.go`, `internal/orchestrator/committer.go`	Poller refactored to a concurrent worker model with task queue, lookahead, new defaults (`DEFAULT_PARALLEL_POLLERS`, `DEFAULT_LOOKAHEAD_BATCHES`), new constructor `NewPoller` and `Start(ctx)` lifecycle; removed old range/ticker fields and `NewBoundlessPoller`. Orchestrator.Start no longer creates/runs WorkModeMonitor. ChainTracker poll interval constant changed to 1 minute. Committer adds structured logging, staging delete duration metric, nil-safe handling, and a corrected error string.
Storage (Badger) `internal/storage/badger.go`	Adds TTL-based staging eviction and configurable intervals: `stagingDataTTL`, `gcInterval`, `cacheRefreshInterval`, `cacheStalenessTimeout`. Uses Badger TTL for staging inserts, replaces hard-coded tickers with configurable intervals, and updates initialization and maintenance tasks accordingly.
Worker Logging `internal/worker/worker.go`	Adjusts Worker.Run logging to report first/last block from actual results (results[0] and results[len-1]) instead of input-based first/last; no control-flow changes.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Orchestrator
  participant Poller
  participant "Worker Goroutines" as Workers
  participant RPC
  participant Storage

  Orchestrator->>Poller: Start(ctx)
  activate Poller
  note right of Poller: init ctx, tasks chan, processingRanges\nspawn parallel workers
  Poller->>Workers: start workerLoop()
  activate Workers

  loop request or lookahead
    Poller->>Poller: mark range processing
    Poller->>Workers: enqueue block range (tasks)
    Workers->>RPC: fetch blocks
    RPC-->>Workers: block data
    Workers->>Storage: stageResults(block data) (With TTL)
    Workers->>Poller: updateLastPolledBlock
    Poller->>Poller: unmark range processing
  end

  Orchestrator-->>Poller: ctx canceled
  Poller->>Workers: close(tasks) & wait wg
  deactivate Workers
  Poller-->>Orchestrator: shutdown complete
  deactivate Poller

  note over Poller,Workers: concurrent range processing + lookahead (no ticker)

sequenceDiagram
  autonumber
  actor Orchestrator
  participant Poller
  participant Committer
  participant ReorgHandler
  participant ChainTracker

  Orchestrator->>Orchestrator: initializeWorkerAndPoller()
  Orchestrator->>Poller: Start(ctx)
  par
    Orchestrator->>Committer: Start(ctx)
  and
    opt if enabled
      Orchestrator->>ReorgHandler: Start(ctx)
    end
  and
    Orchestrator->>ChainTracker: Start(ctx)
  end
  note right of Orchestrator: WorkModeMonitor creation/start removed

sequenceDiagram
  autonumber
  participant BadgerConnector as Storage
  participant BadgerDB as Badger
  participant GC as GC Ticker
  participant Cache as Cache Refresher

  Storage->>BadgerDB: InsertStagingData(SetEntry.WithTTL(stagingDataTTL))
  GC->>Storage: every gcInterval
  Storage->>BadgerDB: run value log GC
  Cache->>Storage: every cacheRefreshInterval
  Storage->>Storage: refresh range cache
  Cache->>Storage: staleness check (cacheStalenessTimeout)
  Storage->>Storage: prune stale ranges

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 43f676e and 0eb6b48.

📒 Files selected for processing (1)

internal/orchestrator/committer.go (4 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

internal/orchestrator/committer.go

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch jl/fix-publish

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

internal/orchestrator/chain_tracker.go (1)

51-53: Compile-time bug: big.Int has no Float64().

Convert via big.Float (or cast Int64 safely) before setting the metric.
-	latestBlockNumberFloat, _ := latestBlockNumber.Float64()
-	metrics.ChainHead.Set(latestBlockNumberFloat)
+	f := new(big.Float).SetInt(latestBlockNumber)
+	v, _ := f.Float64()
+	metrics.ChainHead.Set(v)

internal/orchestrator/committer.go (1)

293-297: Nil deref risk before error check.

latestCommittedBlockNumber.String() is called before verifying err or non-nil value; can panic on error.

-	latestCommittedBlockNumber, err := c.storage.MainStorage.GetMaxBlockNumber(c.rpc.GetChainID())
-	log.Debug().Msgf("Committer found this max block number in main storage: %s", latestCommittedBlockNumber.String())
-	if err != nil {
-		return nil, err
-	}
+	latestCommittedBlockNumber, err := c.storage.MainStorage.GetMaxBlockNumber(c.rpc.GetChainID())
+	if err != nil {
+		return nil, err
+	}
+	if latestCommittedBlockNumber != nil {
+		log.Debug().Msgf("Committer found this max block number in main storage: %s", latestCommittedBlockNumber.String())
+	}

🧹 Nitpick comments (8)

internal/orchestrator/chain_tracker.go (1)
12-12: Nit: fix comment grammar.

"1 minutes" → "1 minute".
-const DEFAULT_CHAIN_TRACKER_POLL_INTERVAL = 60 * 1000 // 1 minutes
+const DEFAULT_CHAIN_TRACKER_POLL_INTERVAL = 60 * 1000 // 1 minute
internal/storage/badger.go (2)

82-92: Make TTL/intervals configurable from BadgerConfig.

Expose stagingDataTTL, gcInterval, cacheRefreshInterval, and cacheStalenessTimeout via config and only fall back to these defaults if zero.

653-667: Micro: avoid string allocations when parsing keys.

Using byte ops (e.g., bytes.IndexByte/bytes.SplitN) avoids converting keys to string in tight loops.

Also applies to: 659-667
internal/orchestrator/poller.go (5)
19-22: Expose lookahead and parallelism via config.

DEFAULT_LOOKAHEAD_BATCHES is fixed; consider reading a Cfg.Poller.LookaheadBatches with sane default.

58-74: Initialize lookahead from config (not constant).

Minor parity gap vs. ParallelPollers.
-	lookaheadBatches := DEFAULT_LOOKAHEAD_BATCHES
+	lookaheadBatches := config.Cfg.Poller.LookaheadBatches
+	if lookaheadBatches == 0 {
+		lookaheadBatches = DEFAULT_LOOKAHEAD_BATCHES
+	}
Also applies to: 64-65

276-326: Reduce RPC fan-out in lookahead.

Fetch latest block once per lookahead invocation; reuse inside loop.
-func (p *Poller) triggerLookahead(currentEndBlock *big.Int, batchSize int64) {
-	// Use configurable lookahead batches
-	for i := 0; i < p.lookaheadBatches; i++ {
+func (p *Poller) triggerLookahead(currentEndBlock *big.Int, batchSize int64) {
+	latestBlock, err := p.rpc.GetLatestBlockNumber(p.ctx)
+	if err != nil {
+		log.Error().Err(err).Msg("Failed to get latest block")
+		return
+	}
+	for i := 0; i < p.lookaheadBatches; i++ {
 		startBlock := new(big.Int).Add(currentEndBlock, big.NewInt(int64(i)*batchSize+1))
 		endBlock := new(big.Int).Add(startBlock, big.NewInt(batchSize-1))
@@
-		// Get latest block to ensure we don't exceed chain head
-		latestBlock, err := p.rpc.GetLatestBlockNumber(p.ctx)
-		if err != nil {
-			log.Error().Err(err).Msg("Failed to get latest block")
-			break
-		}
Also applies to: 300-306

89-89: Remove unused error sentinel.

ErrNoNewBlocks is declared but never used.

344-350: Unused helper.

updateLastPolledBlock isn’t referenced; either call it in poll or remove.
-	p.lastPolledBlock = new(big.Int).Set(highestBlockNumber)
+	p.updateLastPolledBlock(highestBlockNumber)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 101165b and 43f676e.

📒 Files selected for processing (6)

internal/orchestrator/chain_tracker.go (1 hunks)
internal/orchestrator/committer.go (3 hunks)
internal/orchestrator/orchestrator.go (0 hunks)
internal/orchestrator/poller.go (5 hunks)
internal/storage/badger.go (6 hunks)
internal/worker/worker.go (1 hunks)

💤 Files with no reviewable changes (1)

internal/orchestrator/orchestrator.go

🧰 Additional context used

🧬 Code graph analysis (1)

internal/orchestrator/poller.go (3)

internal/rpc/rpc.go (1)

IRPCClient (42-55)

internal/storage/connector.go (1)

IStorage (69-73)

internal/common/block.go (1)

BlockData (61-66)

🔇 Additional comments (13)

internal/orchestrator/committer.go (3)

277-283: Good: richer staging cleanup log with context + metric.

Structured fields make on-call triage easier.

474-478: Good: corrected error string for publish path.

Improves accuracy of surfaced errors.

516-517: Good: range summary in commit log.

Range logging simplifies correlating batches end-to-end.

internal/storage/badger.go (7)

53-54: Sensible default path.

LGTM; temp dir fallback is fine for local/dev.

95-101: GC ticker wiring looks good.

Runs on the configured interval; errors downgraded to debug is appropriate.

120-135: Background range-cache refresher: LGTM.

Clear separation of explicit updates vs. periodic refresh.

200-203: Staleness gate is straightforward.

Timeout-based invalidation keeps cache bounded.

407-410: TTL-backed staging writes: solid.

Matches the PR’s intent to let stale staging self-evict. Ensure TTL comfortably exceeds worst-case commit/publish lag to avoid data churn.

Would you like a quick repo scan to surface current TTL vs. commit/publish interval configs to confirm safe headroom?

641-712: Older-than delete with cache maintenance: LGTM.

Cache min adjustment and background resync keep range metadata accurate.

716-745: Range query with cache miss fallback: LGTM.

Defensive cache population on miss is a good UX for callers.

internal/worker/worker.go (1)

434-436: Good: log first/last from actual results.

Prevents misleading ranges when inputs differ from outputs.

internal/orchestrator/poller.go (2)

91-103: Startup lifecycle: LGTM.

Worker pool + graceful shutdown via parent ctx is clean.

146-177: Ignore stale cache path concern The Poller’s worker is constructed with the provided staging source via NewWorkerWithSources(o.rpc, s3, staging), so pollBlockData correctly reads from staging rather than re-fetching over RPC.

Likely an incorrect or invalid review comment.

coderabbitai · 2025-08-30T07:11:23Z

internal/orchestrator/poller.go

+	// Check if already processing
+	p.processingRangesMutex.RLock()
+	isProcessing := p.processingRanges[rangeKey]
+	p.processingRangesMutex.RUnlock()

-	for i := 0; i < p.parallelPollers; i++ {
-		wg.Add(1)
-		go func() {
-			defer wg.Done()
-			for {
-				select {
-				case <-pollCtx.Done():
-					return
-				case _, ok := <-tasks:
-					if !ok {
-						return
-					}
-
-					blockNumbers, err := p.getNextBlockRange(pollCtx)
-
-					if err != nil {
-						if err != ErrNoNewBlocks {
-							log.Error().Err(err).Msg("Failed to get block range to poll")
-						}
-						continue
-					}
-
-					if pollCtx.Err() != nil {
-						return
-					}
-
-					lastPolledBlock, err := p.poll(pollCtx, blockNumbers)
-					if err != nil {
-						log.Error().Err(err).Msg("Failed to poll blocks")
-						continue
-					}
-
-					if p.reachedPollLimit(lastPolledBlock) {
-						log.Info().Msgf("Reached poll limit at block %s, completing poller", lastPolledBlock.String())
-						return
-					}
-				}
-			}
-		}()
+	if isProcessing {
+		return nil, fmt.Errorf("range %s already being processed", rangeKey)
 	}

-	for {
-		select {
-		case <-ctx.Done():
-			p.shutdown(cancel, tasks, &wg)
-			return
-		case <-ticker.C:
-			select {
-			case tasks <- struct{}{}:
-			default:
-				// Channel full, skip this tick
-			}
-		}
-	}
-}
+	p.markRangeAsProcessing(rangeKey)
+	defer p.unmarkRangeAsProcessing(rangeKey)



🛠️ Refactor suggestion

Eliminate check-then-mark race for processing ranges.

Two-step read then write can double-queue under contention.

- // Check if already processing - p.processingRangesMutex.RLock() - isProcessing := p.processingRanges[rangeKey] - p.processingRangesMutex.RUnlock() - - if isProcessing { - return nil, fmt.Errorf("range %s already being processed", rangeKey) - } - - p.markRangeAsProcessing(rangeKey) + // Atomically try to mark as processing + if !p.tryMarkRange(rangeKey) { + return nil, fmt.Errorf("range %s already being processed", rangeKey) + } defer p.unmarkRangeAsProcessing(rangeKey)

Add helper:

+func (p *Poller) tryMarkRange(rangeKey string) bool { + p.processingRangesMutex.Lock() + defer p.processingRangesMutex.Unlock() + if p.processingRanges[rangeKey] { + return false + } + p.processingRanges[rangeKey] = true + return true +}

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// Check if already processing

p.processingRangesMutex.RLock()

isProcessing := p.processingRanges[rangeKey]

p.processingRangesMutex.RUnlock()

for i := 0; i < p.parallelPollers; i++ {

wg.Add(1)

go func() {

defer wg.Done()

for {

select {

case <-pollCtx.Done():

return

case _, ok := <-tasks:

if !ok {

return

}

blockNumbers, err := p.getNextBlockRange(pollCtx)

if err != nil {

if err != ErrNoNewBlocks {

log.Error().Err(err).Msg("Failed to get block range to poll")

}

continue

}

if pollCtx.Err() != nil {

return

}

lastPolledBlock, err := p.poll(pollCtx, blockNumbers)

if err != nil {

log.Error().Err(err).Msg("Failed to poll blocks")

continue

}

if p.reachedPollLimit(lastPolledBlock) {

log.Info().Msgf("Reached poll limit at block %s, completing poller", lastPolledBlock.String())

return

}

}

}

}()

if isProcessing {

return nil, fmt.Errorf("range %s already being processed", rangeKey)

}

for {

select {

case <-ctx.Done():

p.shutdown(cancel, tasks, &wg)

return

case <-ticker.C:

select {

case tasks <- struct{}{}:

default:

// Channel full, skip this tick

}

}

}

}

p.markRangeAsProcessing(rangeKey)

defer p.unmarkRangeAsProcessing(rangeKey)

// Atomically try to mark as processing

if !p.tryMarkRange(rangeKey) {

return nil, fmt.Errorf("range %s already being processed", rangeKey)

}

defer p.unmarkRangeAsProcessing(rangeKey)

jakeloo added 2 commits August 30, 2025 06:43

new Poller to fix parallel block range fetching

72e5cab

Minor updates

43f676e

coderabbitai bot reviewed Aug 30, 2025

View reviewed changes

nil safety for committer

0eb6b48

jakeloo merged commit b6fdc97 into main Aug 30, 2025
5 checks passed

jakeloo deleted the jl/fix-publish branch August 30, 2025 07:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix Poller Behavior #279

Fix Poller Behavior #279

Uh oh!

jakeloo commented Aug 30, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Aug 30, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Aug 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix Poller Behavior #279

Fix Poller Behavior #279

Uh oh!

Conversation

jakeloo commented Aug 30, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Aug 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jakeloo commented Aug 30, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 30, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)