Skip to content

feat(sync): multi-endpoint round-robin + batched writes + monotonic cursor#37

Merged
satyakwok merged 5 commits into
mainfrom
feat/multi-endpoint-batch-writes
May 14, 2026
Merged

feat(sync): multi-endpoint round-robin + batched writes + monotonic cursor#37
satyakwok merged 5 commits into
mainfrom
feat/multi-endpoint-batch-writes

Conversation

@satyakwok
Copy link
Copy Markdown
Member

@satyakwok satyakwok commented May 14, 2026

Summary

Backfill 38 → 268 b/s (7x) on live mainnet catch-up (verified during ongoing 1.78M-block walk).

Four changes that compound:

  1. Multi-endpoint RPC poolChainProvider + RestClient accept comma-separated URLs, round-robin via atomic counter. Lets the indexer hit fullnodes directly (bypassing the public Caddy edge + its per-IP rate limit) while spreading load across N nodes.

  2. REST_URL env split — Direct fullnode access needs /rpc for JSON-RPC but root for native REST. The Caddy edge papered over this with a path rewrite; bypassing the edge needs the URLs split. REST_URL falls back to RPC_URL when unset, so single-endpoint deployments are unchanged.

  3. Batched writes — Backfill buffers up to INDEXER_BACKFILL_BATCH (default 100) bundles and flushes as one transaction with multi-row INSERTs (insert_batch helpers added to blocks/transactions/logs). One commit per batch instead of one per block.

  4. Monotonic cursor — The tail loop's ingest_one path advances cursor per-block. Running in parallel with the batched backfill, its slower commits clobbered the batched writer's higher cursor values — observed live as 100k+ block regression in seconds. write_cursor now uses GREATEST(_meta.value::int8, EXCLUDED.value::int8) at SQL level so the cursor is monotonic regardless of commit order.

Throughput history

Config Rate Catch-up ETA
Public edge + concurrency 50 38 b/s 8 h
Direct fullnode (single) + IP allowlist 125 b/s 2.4 h
Multi-endpoint x2 (no batch) 131 b/s 2.0 h
Multi + batched writes + monotonic 268 b/s 57 min

Env

  • RPC_URL — comma-separated JSON-RPC endpoints (existing, now multi)
  • REST_URL — optional, comma-separated REST base URLs; falls back to RPC_URL
  • INDEXER_BACKFILL_BATCH — 1..1000, default 100 (new)
  • INDEXER_BACKFILL_CONCURRENCY — 1..500, default 50 (existing)

Test plan

  • cargo check --release -D warnings clean
  • Live deploy on mainnet — cursor advances monotonically (sampled 5x, all increasing)
  • Rate sustained at 268 b/s with concurrency=200, batch=100, no errors
  • CI clippy / test

Summary by CodeRabbit

  • New Features

    • Optional REST base override via env and round‑robin routing across multiple REST/RPC endpoints.
    • Bulk insert helpers for blocks, transactions, and logs to accelerate large imports.
  • Refactor

    • Backfill pipeline buffers and commits data in configurable batched transactions.
    • Batched block writer performs multi-table commits and best‑effort analytics pushes after commit.
  • Bug Fixes

    • Cursor persistence now enforces monotonic writes under concurrent writers.
  • Tests

    • Added unit and smoke checks for URL parsing/rotation and observability/metrics.

Review Change Stack

satyakwok added 2 commits May 14, 2026 20:51
…ursor

Backfill went 38 → 268 b/s (7x) on the live mainnet catch-up by combining
three changes:

1. Multi-endpoint RPC pool. ChainProvider + RestClient now accept comma-
   separated URLs and round-robin requests via an atomic counter. Lets the
   indexer hit fullnodes directly (bypassing the public Caddy edge + its
   per-IP rate limit) while spreading load across N nodes.

2. REST_URL env split. Direct fullnode access needs `/rpc` for JSON-RPC
   but root for native REST. The Caddy edge papered over this with a path
   rewrite; the public-facing single URL doesn't work once we bypass it.
   REST_URL falls back to RPC_URL when unset (no behaviour change for
   single-endpoint deployments).

3. Batched writes. Backfill now buffers up to INDEXER_BACKFILL_BATCH (100)
   bundles and flushes them as one transaction with multi-row INSERTs
   (`insert_batch` helpers added to blocks/transactions/logs). One commit
   per batch instead of one per block.

4. Monotonic cursor. The tail loop's `ingest_one` path advances the cursor
   per-block; running in parallel with the batched backfill, its slower
   commits clobbered the batched writer's higher cursor values, causing
   visible regression of 100k+ blocks in seconds. write_cursor now uses
   `GREATEST(_meta.value::int8, EXCLUDED.value::int8)` at the SQL level so
   the on-disk cursor is monotonic regardless of which writer commits last.

Also: INDEXER_BACKFILL_BATCH env (1..1000, default 100) for tuning.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 14, 2026

📝 Walkthrough

Walkthrough

The PR adds round‑robin selection for JSON‑RPC and REST endpoints, an optional IndexerConfig rest_url override, batch DB insert helpers for blocks/transactions/logs, a transactional batch_write_blocks that flattens/sorts/chunks inserts and advances the cursor post‑commit, and a buffered backfill pipeline that flushes configurable batches (default 100, max 1000) with cancellation and 404‑gap handling. write_cursor now upserts using GREATEST to ensure monotonic cursor advancement.

sequenceDiagram
  participant Backfill
  participant ChainProvider
  participant RestClient
  participant batch_write_blocks
  participant PgPool
  participant Analytics
  Backfill->>ChainProvider: fetch blocks via JSON‑RPC (round‑robin)
  Backfill->>RestClient: fetch native block/tx via HTTP (round‑robin)
  Backfill->>batch_write_blocks: submit Vec<BlockBundle>
  batch_write_blocks->>PgPool: transactional multi-row inserts (blocks, txs, logs)
  PgPool-->>batch_write_blocks: commit
  batch_write_blocks->>Analytics: best‑effort per‑tx pushes
  batch_write_blocks-->>Backfill: flush complete
Loading

Sequence Diagram(s)

sequenceDiagram
  participant Backfill
  participant Fetcher
  participant Buffer
  participant batch_write_blocks
  participant PgPool
  participant Cursor
  Backfill->>Fetcher: concurrently fetch BlockBundle stream
  Fetcher->>Buffer: push BlockBundle
  Buffer->>Buffer: when buf.len() >= batch_size -> prepare batch
  Buffer->>batch_write_blocks: submit batch Vec<BlockBundle>
  batch_write_blocks->>PgPool: begin tx, insert chunks (blocks, txs, logs)
  PgPool-->>batch_write_blocks: commit
  batch_write_blocks->>Cursor: write_cursor(max_height, ts)
  batch_write_blocks-->>Backfill: ack flushed
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title precisely summarizes the main changes: multi-endpoint round-robin routing, batched writes, and monotonic cursor advancement — the core features enabling the 7x throughput improvement.
Description check ✅ Passed The description comprehensively covers all required template sections: summary, scope, checks, and deploy impact are all clearly addressed with specific context about the changes and testing methodology.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/multi-endpoint-batch-writes

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/chain/src/provider.rs`:
- Around line 17-18: The file fails rustfmt checks; run `cargo fmt --all` (or
`cargo fmt` in the crate) to reformat crates/chain/src/provider.rs and commit
the formatting-only changes so the import block and surrounding regions
(including the AtomicUsize/Ordering and Arc use lines and nearby code around the
earlier reported regions) match rustfmt output; ensure no logic changes are
introduced—only whitespace/formatting fixes—to satisfy `cargo fmt --check`.

In `@crates/chain/src/rest.rs`:
- Around line 64-68: The current new() construction only trims and splits the
raw endpoints into bases, but does not validate them; update new() to parse and
validate each configured endpoint (the raw string entries and the resulting
bases Vec<String>) during construction and return an Err (or panic) if any entry
is malformed so the program fails fast; use a URL parser (e.g., url::Url::parse
or reqwest::Url::parse) on each trimmed base and ensure trailing slashes are
normalized only after successful parse, referencing the raw variable, the bases
collection, and the new() constructor to locate where to add the validation.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: b29b317b-2609-4f2a-8702-8f15666d4bd6

📥 Commits

Reviewing files that changed from the base of the PR and between 3017eca and e67451f.

📒 Files selected for processing (9)
  • bin/indexer.rs
  • crates/chain/src/provider.rs
  • crates/chain/src/rest.rs
  • crates/db/src/blocks.rs
  • crates/db/src/logs.rs
  • crates/db/src/transactions.rs
  • crates/sync/src/backfill.rs
  • crates/sync/src/block_writer.rs
  • crates/sync/src/cursor.rs

Comment thread crates/chain/src/provider.rs Outdated
Comment thread crates/chain/src/rest.rs Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/chain/src/rest.rs`:
- Around line 59-62: Add unit tests in crates/chain/src/rest.rs that exercise
the pub fn new(base_url: impl Into<String>) -> ChainResult<Self>: one test
should pass a comma-separated string like "http://a,http://b" to new and assert
the resulting client normalizes/splits into the two distinct base URLs; another
test should instantiate the client with at least two bases and then call the
public operation that selects/uses the base (the method that triggers
round‑robin/base selection) multiple times to assert deterministic rotation
between the bases (e.g., first call uses base A, second uses base B, third uses
base A). Ensure the tests cover trimming/normalization of whitespace and
deterministic ordering across repeated calls.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: f8d62489-5685-439a-8f3a-44495f5680f1

📥 Commits

Reviewing files that changed from the base of the PR and between e67451f and d73446b.

📒 Files selected for processing (2)
  • crates/chain/src/provider.rs
  • crates/chain/src/rest.rs

Comment thread crates/chain/src/rest.rs
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/chain/src/rest.rs (1)

103-116: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add per-request failover across bases.

Round-robin alone means one bad endpoint causes every Nth request to fail, even when other bases are healthy. Please retry on the next base(s) for transport errors and 5xx before returning an error.

Proposed direction
+    async fn get_with_failover(&self, path: &str) -> ChainResult<reqwest::Response> {
+        let mut last_err: Option<ChainError> = None;
+        for _ in 0..self.bases.len() {
+            let url = format!("{}/{}", self.next_base(), path.trim_start_matches('/'));
+            match self.http.get(&url).send().await {
+                Ok(resp) if !resp.status().is_server_error() => return Ok(resp),
+                Ok(resp) => {
+                    let status = resp.status();
+                    let body = resp.text().await.unwrap_or_default();
+                    last_err = Some(ChainError::Rpc(format!("native rest {status}: {body}")));
+                }
+                Err(e) => last_err = Some(e.into()),
+            }
+        }
+        Err(last_err.unwrap_or_else(|| ChainError::Rpc("all rest endpoints failed".to_string())))
+    }
+
     pub async fn tx(&self, hash: &Hash) -> ChainResult<Option<NativeTxResponse>> {
-        let url = format!("{}/tx/{}", self.next_base(), hash);
-        let resp = self.http.get(&url).send().await?;
+        let resp = self.get_with_failover(&format!("tx/{hash}")).await?;
         if resp.status() == reqwest::StatusCode::NOT_FOUND {
             return Ok(None);
         }
         ...
     }

Also applies to: 124-134, 139-152

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/chain/src/rest.rs` around lines 103 - 116, The tx handler currently
hits a single base returned by next_base() and fails immediately on transport
errors or 5xx; change it to attempt the request across all configured bases
(round-robin-starting-from-current) and only return an error after exhausting
bases: for each base build the url (use the same url formation code with
self.next_base()/or a base-at-index helper), call
self.http.get(...).send().await and if send() returns Err (transport error) or
resp.status().is_server_error() then log/continue to the next base and try the
next URL; keep the existing behavior for 404 (return Ok(None) if you get
NOT_FOUND from a healthy response) and for non-success non-5xx responses return
the parsed error immediately; when a request succeeds parse bytes ->
serde_json::from_slice and return Ok(Some(parsed)). Apply the same
retry-across-bases pattern to the other similar methods referenced in the
comment (the blocks at lines 124-134 and 139-152) so all native REST requests
retry on transport errors and 5xx before failing.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@crates/chain/src/rest.rs`:
- Around line 103-116: The tx handler currently hits a single base returned by
next_base() and fails immediately on transport errors or 5xx; change it to
attempt the request across all configured bases
(round-robin-starting-from-current) and only return an error after exhausting
bases: for each base build the url (use the same url formation code with
self.next_base()/or a base-at-index helper), call
self.http.get(...).send().await and if send() returns Err (transport error) or
resp.status().is_server_error() then log/continue to the next base and try the
next URL; keep the existing behavior for 404 (return Ok(None) if you get
NOT_FOUND from a healthy response) and for non-success non-5xx responses return
the parsed error immediately; when a request succeeds parse bytes ->
serde_json::from_slice and return Ok(Some(parsed)). Apply the same
retry-across-bases pattern to the other similar methods referenced in the
comment (the blocks at lines 124-134 and 139-152) so all native REST requests
retry on transport errors and 5xx before failing.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 7014dc59-f018-45ea-8291-1d001372a6d5

📥 Commits

Reviewing files that changed from the base of the PR and between e09314f and fe9c3e2.

📒 Files selected for processing (2)
  • crates/chain/src/rest.rs
  • scripts/smoke.sh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant