Synchronous history backfilling #571

cjonas9 · 2025-12-15T23:26:24Z

What

This PR adds synchronous history backfilling of ledgers through the new configuration parameter backfill. This will fill the local SQL database with the most recent HISTORY_RETENTION_WINDOW ledgers, fetched from CDP. Usage: ./stellar-rpc --backfill, or one can specify this as a config parameter in the TOML or as an environmental variable.

Notes:

After verifying the local DB state, this will backfill the specified number of ledgers from CDP backwards/"leftward", then backfill forwards/"rightward" up to the new most current ledger. By termination, we are guaranteed a database filled with the most recent HISTORY_RETENTION_WINDOW ledgers.
- Leftward/backwards backfill: This backfills the range [oldest ledger we want in our DB <- oldest ledger in local DB-1] into the local DB. This phase is skipped if the DB is empty and this is generally slower than forwards backfill.
- Rightward/forwards backfill: This backfills the range (max(newest ledger currently in local DB, oldest ledger we want in our DB) -> current tip of datastore] into the local DB. This phase is run twice if the DB is empty (on the second run, it just refreshes the current DB tip).
After completion, it will start RPC to begin filling in the ledgers live through captive core. See my design document for details for how this is done.
This design guarantees that, regardless of faults, the database will never be left with gaps in it. The client may supply a partially filled database, but backfill will error if this database initially has gaps in it. Note that (due to how the BufferedStorageBackend prevents backwards iteration) supplying a partially filled DB may be slower than the empty DB case if the partially filled database is 1.) minimally filled and/or 2.) close to the datastore current tip ledger (i.e. the larger the backwards backfilling run is, the slower this can be).
The backfilled ledgers are ingested via CDP through the bufferedStorageBackend. It's datastore agnostic and has been tested on GCS and S3.
Using the (rate-limited) public S3 datastore in the k8s soroban-rpc-pubnet-dev deployment, to ingest one week of ledgers (120,960 ledgers, ~150Gb), this takes about 3 hours.
- Of three runs backfilling from no local DB, we see the following timing: 3h6m, 3h7m, 2h41m, 2h48m. I suspect that if any ledgers are in the DB (and a backwards phase runs), we would see longer runtimes.
- In a truly pessimistic case (e.g. the backfill backwards phase is maximal), the runtime for this scenario is around ~3h45m.

Why

There is no way to backfill RPC's local database with historical ledgers, let alone a time-conscious method for this. This PR provides one.
See issues/discussions on this: #203, 1718

Known limitations

[N/A]

cmd/stellar-rpc/internal/config/options.go

…y DB

cmd/stellar-rpc/internal/ingest/backfill.go

…ress log

…ackfilling

…tart post-backfill

cmd/stellar-rpc/internal/config/options.go

cmd/stellar-rpc/internal/db/ledger.go

cmd/stellar-rpc/internal/ingest/backfill.go

cmd/stellar-rpc/internal/daemon/daemon.go

cmd/stellar-rpc/internal/ingest/backfill.go

cmd/stellar-rpc/internal/ingest/service.go

…stellar/stellar-rpc into synchronous-history-backfilling

…nter fix

Shaptic

another pass

Shaptic · 2026-01-23T20:00:38Z

cmd/stellar-rpc/internal/integrationtest/infrastructure/test.go

+			info, err := i.getCoreInfo()
+			return err == nil && info.Info.Ledger.Num >= ledger
+		},
+		90*time.Second,


Should probably be scaled based on the ledger count and close time, otherwise this will time out for large enough values w/o much explanation

cmd/stellar-rpc/internal/ingest/backfill.go

Shaptic · 2026-01-23T20:06:12Z

cmd/stellar-rpc/internal/ingest/backfill.go

+			lChunkBound, rChunkBound, 100*(rBound-lChunkBound)/max(rBound-lBound, 1))
+
+		if err := tempBackend.Close(); err != nil {
+			backfill.logger.Warnf("error closing temporary backend: %v", err)


no return here? is the implication that we can just keep going? fine with that just want to make that clear

wait this should probably be a defer (they're scoped) cuz otherwise you're not closing it in any error case

no return was intentional, I saw this as recoverable. will scope!

Shaptic · 2026-01-23T20:08:30Z

cmd/stellar-rpc/internal/ingest/backfill.go

+		return fmt.Errorf("post-backfill verification failed: expected at least %d ledgers, "+
+			"got %d ledgers (exceeds acceptable threshold of %d missing ledgers)", nBackfill, count, ledgerThreshold)


We should give a hint to the operator about what to actually do in this case, even if that's just "Try again". Otherwise they need support and we'd ideally want them to be self-sufficient as much as possible. In fact in might be worth doing a pass and adding info like this to any other "recoverable"ish error.

absolutely agreed! this threshold is arbitrary and I feel like making this a warnf/"you may want to try again to avoid a longer catch up" is totally defensible

Shaptic · 2026-01-23T20:10:02Z

cmd/stellar-rpc/internal/ingest/backfill.go

+// Backfills the local DB with older ledgers from oldest to newest within the retention window
+func (backfill *BackfillMeta) runFrontfill(ctx context.Context, bounds *backfillBounds) error {
+	numIterations := 1
+	// If we skipped backfilling, do a second forwards push to a refreshed current tip


I feel like I know what you mean but this comment could use some reclarifying. Something like,

If we skipped backfilling, we want to fill forwards twice because the latest ledger may be significantly further in the future after the first fill completes and fills are faster than catch-up.

much more clear!

Shaptic · 2026-01-23T20:12:01Z

cmd/stellar-rpc/internal/ingest/backfill.go

+
+// Backfills the local DB with ledgers in [lBound, rBound] from the cloud datastore
+// Used to fill local DB backwards towards older ledgers (starting from newest)
+func (backfill *BackfillMeta) backfillChunks(ctx context.Context, bounds *backfillBounds) error {


You should just pass bounds by value so there's no risk of a nil pointer; it's a small structure.

got it, this is because bounds are updated in backfill for later checking. i can just have it be returned here if it's really preferable, but I liked making the fn signatures match across back/frontfill (of course that's not very deep, but I thought it emphasized that these functions do the same thing in opposite directions). also, a nil pointer would alert to something being borked; if that happens then (imo) it shouldn't pretend everything is ok or that no-op here is normal

Ah I see, I missed that nuance. Yeah generally modifying parameters by reference is a bit of an anti-pattern, let's do return/reassigns everywhere, instead. nil is 2 spooky 👻

cmd/stellar-rpc/internal/ingest/backfill.go

Shaptic · 2026-01-23T20:16:38Z

CHANGELOG.md

 ```

 ### Added
+- Added `--backfill` configuration parameter providing synchronous backfilling of `HISTORY_RETENTION_WINDOW` ledgers to the local DB prior to RPC starting ([#571](https://github.com/stellar/stellar-rpc/pull/571)).


We should add some details about resource consumption and timing here, also the fact that you can only use this if you set up a datastore which also enables getLedger

Shaptic · 2026-01-23T20:17:04Z

cmd/stellar-rpc/internal/config/options.go

+			Validate: func(_ *Option) error {
+				// Ensure config is valid for backfill
+				if cfg.Backfill && !cfg.ServeLedgersFromDatastore {
+					return errors.New("backfill requires serving ledgers from datastore to be enabled")


a ref to the actual flag so someone can grok the --help afterwards would be 💯 here

cmd/stellar-rpc/internal/config/options.go

Shaptic · 2026-01-23T21:30:26Z

cmd/stellar-rpc/internal/db/event.go

-				return err
-			}
+	for _, event := range allLedgerEvents {
+		query, err = insertEvents(query, lcm, event)


this is fine but it'd be a smaller diff if you kept building the insert statement in the loop (this has no db effect) and only moved the exec part outside of it. no strong preference but just wanted to note that

Shaptic · 2026-01-23T21:30:28Z

cmd/stellar-rpc/internal/db/event.go

 	//
 	var beforeIndex, afterIndex uint32
+	// Accumulate all ledger events to insert
+	var allLedgerEvents []dbEvent


recommend pre-sizing this to something reasonable based on transaction count, obviously can't be as precise as before but still nice

cjonas9 added 2 commits December 12, 2025 11:53

added initial backfill parameter/function scaffolding

066e74e

added scaffolding for ledger ingestion backfill

c7a19e9

cjonas9 linked an issue Dec 15, 2025 that may be closed by this pull request

Implement synchronous history backfilling #203

Open

expanded relevant helper functions

e7a1b93

Shaptic reviewed Dec 16, 2025

View reviewed changes

cmd/stellar-rpc/internal/config/options.go Outdated Show resolved Hide resolved

cjonas9 added 11 commits December 16, 2025 17:00

refactored code, made CLI arg a bool instead of int

10fccc0

moved major legwork to ingest folder

ce14b81

completed structure without backfilling logic

e27dee6

completed backfilling logic, untested code fully written

11da703

fixed several off-by-one issues, improved error handling

a855501

refactored code, discovered monotonicity constract in storage backend

bb49dcc

working implemetnation, not tested robustly or on testnet

32ee4ce

fixed bug with backwards to forwards transition if starting from empt…

749da32

…y DB

patched bug in empty DB case

0214ae2

large refactoring

4bf5abd

minor refactoring/bud ID

885c220

Shaptic reviewed Dec 18, 2025

View reviewed changes

cjonas9 added 6 commits December 18, 2025 21:53

changed to errors.wrap, cleaned up code

cf02b2e

major refactoring; design made pointer-receiver oriented

1499ef2

added context timeout for main function

965b479

further refactoring and edge case guarding

b09a526

handled rare/unlikely forwards backfill, already written up to tip case

c440592

handled extremely rare division by zero case

7537872

Shaptic reviewed Dec 19, 2025

View reviewed changes

cmd/stellar-rpc/internal/ingest/backfill.go Outdated Show resolved Hide resolved

cjonas9 added 6 commits December 19, 2025 16:39

refactored; fixed history_retention>available ledgers bug; fixed prog…

c31a468

…ress log

minor: fixed accidental integer division rounding bug

7b9ffa7

Merge remote-tracking branch 'origin/main' into synchronous-history-b…

02978e2

…ackfilling

abstracted ingestion/chunk filling to service.go, debugging service s…

397f581

…tart post-backfill

service start post-backfill working

91f0c73

patched verification bug

5e25abf

cjonas9 added 2 commits January 16, 2026 11:47

integration test debugging

7f69666

working robust integration tests, not flaky

01e316c

stellar deleted a comment from Copilot AI Jan 16, 2026

cjonas9 added 3 commits January 16, 2026 18:44

fixed race in tests, refactored backfill and changed empty DB behavior

e524b04

linter fixes and refactoring

9da6f4b

refactored to reduce cyclomatic complexity

7c669e6

stellar deleted a comment from Copilot AI Jan 20, 2026

cjonas9 and others added 9 commits January 20, 2026 13:21

linter duration multiplication fix

eb7639d

added nolint for funcorder linter directive in test.go

e6addaf

Merge branch 'main' into synchronous-history-backfilling

9148fb4

fixed minor integration test infra bug

069348d

clean/restructured code, decreased excessive logging

c12256e

minor comment update

604cbf5

removed daemon logging, linter fixes

8f5c8b3

removed integration test hardcoded captive core binary path

df347bc

delete accidentally committed toml

66e3c5c

Shaptic reviewed Jan 21, 2026

View reviewed changes

cjonas9 added 7 commits January 22, 2026 01:33

optimized backfill, improved names/comments, removed dead code

815095f

Merge branch 'synchronous-history-backfilling' of https://github.com/…

0f9d318

…stellar/stellar-rpc into synchronous-history-backfilling

return backend directly

9914db0

reduced memory footprint of gapless check, minor test patches

0499d2d

linter: fixed casting danger, lll

729867e

getLedgerCountInRange minor improvement, service naming in Start() li…

f0169ed

…nter fix

linter lll

e85a091

Shaptic reviewed Jan 23, 2026

View reviewed changes

optimized InsertEvents

43e38ee

Shaptic reviewed Jan 23, 2026

View reviewed changes

bounds setting improvements, minor cosmetic changes

b106492

		return fmt.Errorf("post-backfill verification failed: expected at least %d ledgers, "+
		"got %d ledgers (exceeds acceptable threshold of %d missing ledgers)", nBackfill, count, ledgerThreshold)

Synchronous history backfilling #571

Are you sure you want to change the base?

Synchronous history backfilling #571

Uh oh!

Conversation

cjonas9 commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

Known limitations

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Shaptic left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cjonas9 Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cjonas9 Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

cjonas9 commented Dec 15, 2025 •

edited

Loading

cjonas9 Jan 23, 2026 •

edited

Loading

cjonas9 Jan 23, 2026 •

edited

Loading