-
Notifications
You must be signed in to change notification settings - Fork 49
Synchronous history backfilling #571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…tart post-backfill
…stellar/stellar-rpc into synchronous-history-backfilling
Shaptic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another pass
| info, err := i.getCoreInfo() | ||
| return err == nil && info.Info.Ledger.Num >= ledger | ||
| }, | ||
| 90*time.Second, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably be scaled based on the ledger count and close time, otherwise this will time out for large enough values w/o much explanation
| lChunkBound, rChunkBound, 100*(rBound-lChunkBound)/max(rBound-lBound, 1)) | ||
|
|
||
| if err := tempBackend.Close(); err != nil { | ||
| backfill.logger.Warnf("error closing temporary backend: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no return here? is the implication that we can just keep going? fine with that just want to make that clear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait this should probably be a defer (they're scoped) cuz otherwise you're not closing it in any error case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no return was intentional, I saw this as recoverable. will scope!
| return fmt.Errorf("post-backfill verification failed: expected at least %d ledgers, "+ | ||
| "got %d ledgers (exceeds acceptable threshold of %d missing ledgers)", nBackfill, count, ledgerThreshold) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should give a hint to the operator about what to actually do in this case, even if that's just "Try again". Otherwise they need support and we'd ideally want them to be self-sufficient as much as possible. In fact in might be worth doing a pass and adding info like this to any other "recoverable"ish error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
absolutely agreed! this threshold is arbitrary and I feel like making this a warnf/"you may want to try again to avoid a longer catch up" is totally defensible
| // Backfills the local DB with older ledgers from oldest to newest within the retention window | ||
| func (backfill *BackfillMeta) runFrontfill(ctx context.Context, bounds *backfillBounds) error { | ||
| numIterations := 1 | ||
| // If we skipped backfilling, do a second forwards push to a refreshed current tip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like I know what you mean but this comment could use some reclarifying. Something like,
If we skipped backfilling, we want to fill forwards twice because the latest ledger may be significantly further in the future after the first fill completes and fills are faster than catch-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
much more clear!
|
|
||
| // Backfills the local DB with ledgers in [lBound, rBound] from the cloud datastore | ||
| // Used to fill local DB backwards towards older ledgers (starting from newest) | ||
| func (backfill *BackfillMeta) backfillChunks(ctx context.Context, bounds *backfillBounds) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should just pass bounds by value so there's no risk of a nil pointer; it's a small structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it, this is because bounds are updated in backfill for later checking. i can just have it be returned here if it's really preferable, but I liked making the fn signatures match across back/frontfill (of course that's not very deep, but I thought it emphasized that these functions do the same thing in opposite directions). also, a nil pointer would alert to something being borked; if that happens then (imo) it shouldn't pretend everything is ok or that no-op here is normal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see, I missed that nuance. Yeah generally modifying parameters by reference is a bit of an anti-pattern, let's do return/reassigns everywhere, instead. nil is 2 spooky 👻
| ``` | ||
|
|
||
| ### Added | ||
| - Added `--backfill` configuration parameter providing synchronous backfilling of `HISTORY_RETENTION_WINDOW` ledgers to the local DB prior to RPC starting ([#571](https://github.com/stellar/stellar-rpc/pull/571)). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add some details about resource consumption and timing here, also the fact that you can only use this if you set up a datastore which also enables getLedger
| Validate: func(_ *Option) error { | ||
| // Ensure config is valid for backfill | ||
| if cfg.Backfill && !cfg.ServeLedgersFromDatastore { | ||
| return errors.New("backfill requires serving ledgers from datastore to be enabled") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a ref to the actual flag so someone can grok the --help afterwards would be 💯 here
| return err | ||
| } | ||
| for _, event := range allLedgerEvents { | ||
| query, err = insertEvents(query, lcm, event) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is fine but it'd be a smaller diff if you kept building the insert statement in the loop (this has no db effect) and only moved the exec part outside of it. no strong preference but just wanted to note that
| // | ||
| var beforeIndex, afterIndex uint32 | ||
| // Accumulate all ledger events to insert | ||
| var allLedgerEvents []dbEvent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
recommend pre-sizing this to something reasonable based on transaction count, obviously can't be as precise as before but still nice
What
This PR adds synchronous history backfilling of ledgers through the new configuration parameter
backfill. This will fill the local SQL database with the most recentHISTORY_RETENTION_WINDOWledgers, fetched from CDP. Usage:./stellar-rpc --backfill, or one can specify this as a config parameter in the TOML or as an environmental variable.Notes:
HISTORY_RETENTION_WINDOWledgers.bufferedStorageBackend. It's datastore agnostic and has been tested on GCS and S3.soroban-rpc-pubnet-devdeployment, to ingest one week of ledgers (120,960 ledgers, ~150Gb), this takes about 3 hours.Why
There is no way to backfill RPC's local database with historical ledgers, let alone a time-conscious method for this. This PR provides one.
See issues/discussions on this: #203, 1718
Known limitations
[N/A]