-
Notifications
You must be signed in to change notification settings - Fork 4.1k
backup: add ListRestorableBackup helper #160047
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
backup: add ListRestorableBackup helper #160047
Conversation
|
Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
0aebfd4 to
385422a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces a new ListRestorableBackups helper function that reads through the backup index to return all restorable backups within a specified time range, along with their backup IDs. To support this functionality, the PR refactors the ExternalStorage.List method signature across all cloud storage implementations to use a structured ListOptions parameter instead of individual delimiter and prefix parameters, adding support for AfterKey filtering.
- Refactored all
Listmethod signatures to accept aListOptionsstruct instead of separatedelimiterandprefixparameters - Added
ListRestorableBackupsfunction that returns restorable backups with IDs within a time range, with logic to elide compacted duplicates - Implemented
AfterKeyfiltering support in all cloud storage providers (S3, GCS, Azure, nodelocal, userfile) with client-side filtering for consistency
Reviewed changes
Copilot reviewed 33 out of 33 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/cloud/external_storage.go | Defines new ListOptions struct with Delimiter and AfterKey fields, along with CanonicalAfterKey helper method |
| pkg/backup/backupinfo/backup_index.go | Adds ListRestorableBackups, listIndexesWithinRange, and related helper functions for parsing index paths and encoding backup IDs |
| pkg/backup/backupinfo/backup_index_test.go | Adds comprehensive test coverage for ListRestorableBackups with various backup chain scenarios including compacted and revision-history backups |
| pkg/cloud/gcp/gcs_storage.go | Updates List to use ListOptions and implements client-side AfterKey filtering |
| pkg/cloud/azure/azure_storage.go | Updates List to use ListOptions and implements client-side AfterKey filtering for both blob prefixes and items |
| pkg/cloud/amazon/s3_storage.go | Updates List to use ListOptions and implements client-side AfterKey filtering |
| pkg/cloud/nodelocal/nodelocal_storage.go | Updates List to use ListOptions and implements client-side AfterKey filtering with delimiter grouping support |
| pkg/cloud/userfile/file_table_storage.go | Updates List to use ListOptions and implements client-side AfterKey filtering with delimiter grouping support |
| pkg/cloud/httpsink/http_storage.go | Updates List method signature to accept ListOptions (no-op implementation) |
| pkg/cloud/nullsink/nullsink_storage.go | Updates List method signature to accept ListOptions (no-op implementation) |
| pkg/cloud/impl_registry.go | Updates esWrapper.List to pass through ListOptions to wrapped storage |
| pkg/cloud/cloudtestutils/cloud_test_helpers.go | Adds comprehensive test cases for AfterKey filtering behavior with various prefix and delimiter combinations |
| pkg/sql/importer/*.go | Updates all List call sites to use cloud.ListOptions{} |
| pkg/sql/bulkutil/*.go | Updates all List call sites to use cloud.ListOptions{} |
| pkg/backup/backupinfo/manifest_handling.go | Updates List calls to use cloud.ListOptions{Delimiter: ...} |
| pkg/backup/backupdest/*.go | Updates all List call sites to use cloud.ListOptions{} with appropriate delimiter settings |
| pkg/backup/backupencryption/encryption.go | Updates List call to use cloud.ListOptions{Delimiter: ...} |
| pkg/backup/backup_job.go | Updates List call to use cloud.ListOptions{} |
| pkg/backup/backup_test.go | Updates all List call sites to use cloud.ListOptions{} |
| pkg/storage/shared_storage.go | Updates List call to use cloud.ListOptions{Delimiter: delimiter} |
| pkg/roachprod/blobfixture/registry.go | Updates all List call sites to use cloud.ListOptions{} |
| pkg/cmd/roachtest/tests/cdc_helper.go | Updates List call to use cloud.ListOptions{} |
| pkg/ccl/workloadccl/fixture.go | Updates List call to use cloud.ListOptions{Delimiter: "/"} |
| pkg/ccl/changefeedccl/sink_cloudstorage_test.go | Updates mock List method signature to accept ListOptions |
| pkg/cli/userfile.go | Updates all List call sites to use cloud.ListOptions{} |
| pkg/cloud/cloudtestutils/cloud_nemesis.go | Updates List call to use cloud.ListOptions{} |
| pkg/cloud/externalconn/utils/connection_utils.go | Updates List call to use cloud.ListOptions{} |
| pkg/backup/backupinfo/BUILD.bazel | Adds dependency on //pkg/util/besteffort package |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
385422a to
e3348b8
Compare
Potential Bug(s) DetectedThe three-stage Claude Code analysis has identified potential bug(s) in this PR that may warrant investigation. Next Steps: Note: When viewing the workflow output, scroll to the bottom to find the Final Analysis Summary. After you review the findings, please tag the issue as follows:
|
692c005 to
6930f26
Compare
Potential Bug(s) DetectedThe three-stage Claude Code analysis has identified potential bug(s) in this PR that may warrant investigation. Next Steps: Note: When viewing the workflow output, scroll to the bottom to find the Final Analysis Summary. After you review the findings, please tag the issue as follows:
|
a5176c5 to
87fcf98
Compare
Potential Bug(s) DetectedThe three-stage Claude Code analysis has identified potential bug(s) in this PR that may warrant investigation. Next Steps: Note: When viewing the workflow output, scroll to the bottom to find the Final Analysis Summary. After you review the findings, please tag the issue as follows:
|
| // | ||
| // NB: Backups with duplicate end times (e.g. compacted backups) are elided | ||
| // and only one is returned. In the case of revision-history backups, the | ||
| // backups will be marked as containing revision-history, despite the fact |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm slightly confused by this comment: is an invariant of this function that no backup ids associated with compacted backups will be returned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well since backup IDs are an encoding of the full end time and the backup end time, then every backup with the same end time within a chain have the same ID. This is why I didn't specify "no compacted backup IDs will be returned" since there is no such thing as a compacted backup ID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, reading it back though, I do think that in an effort to be specific about my wording, I just made it more confusing. I'll reword it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got tripped up by the following docs line which suggested to me that compacted backups are returned in the list.
despite the fact that the compacted backups specifically do not contain revision history
maybe remove this snippet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree, I think that's too much implementation-specific detail in the docstring.
| end: end, | ||
| } | ||
| // Maintain descending end time order. May need to swap with the last | ||
| // index added. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is to deal with full backups, yeah?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, specifically the case where you have incrementals from an older chain having an newer end time than the full backup in the next chain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be nice state that explicitly in the docstring above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added it to the comment — don't think it fits in the docstring since its an implementation detail. The consumers of the function just need to know that everything is returned in descending end time order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just to nitpick, I would contend this is a notable implementation detail, as this sorting approach is only correct due to backup index ordering invariants. In general, swapping the last two elements of list as you append to it would not lead to a sorted list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I do think that the mention of the invariants that allow this is important to be included (added it), but I still don't think it belongs in the docstring. If someone is reading the function's body itself, then it's important for them to know this context. But for someone using the function, knowing why swapping the last two elements works (or even the fact that is what we do) doesn't really give them valuable information.
e5d0b54 to
c5658f8
Compare
msbutler
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nearly there!
| // ten milliseconds (the maximum granularity of the timestamp encoding) to | ||
| // ensure an inclusive start. | ||
| maxEndTime := before.Add(10 * time.Millisecond) | ||
| startPoint, err := endTimeToIndexSubdir(maxEndTime) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uber nit: s/startPoint/maxEndTimeSubdir/r
| end: end, | ||
| } | ||
| // Maintain descending end time order. May need to swap with the last | ||
| // index added. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be nice state that explicitly in the docstring above.
| return hlc.Timestamp{WallTime: int64(t)*1e9 + int64(t)} | ||
| } | ||
|
|
||
| // fakeBackupCollection represents a collection of backup chains. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice cleanup!
| require.NoError(t, WriteBackupIndexMetadata( | ||
| ctx, execCfg, username.RootUserName(), storageFactory, details, hlc.Timestamp{}, | ||
| )) | ||
| simpleChain := fakeBackupChain{{0, 2, false}, {2, 4, false}, {4, 6, false}, {6, 8, false}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you could further prettify these cases via some constructors:
func Chain(backup....) fakeBackupChain
func b(start,end) (fakeBackupSpec)
func bRH(start,end) (fakeBackupSpec) // with revision history
simpleChain := Chain(b(0,2),b(2,4),...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooh just tried that out, looks nice
| { | ||
| // Chain with compacted backup and last backup intersects next chain. | ||
| {0, 10, false}, {10, 14, false}, {14, 18, false}, {10, 22, false}, | ||
| {18, 22, false}, {22, 26, false}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think there are few more cases around prev full with incs that overlap with next chain to write tests for:
- inc on prev full has matching end time to next full
- multiple incs of prev full before next full
- compacted backup while next full is running
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added those tests! Also realized that one more that needed to be tested is a query on overlapping chains that doesn't include the full itself.
| { | ||
| "simple chain/full chain inclusive", | ||
| 1, 6, | ||
| []output{{end: 6}, {end: 4}, {end: 2}}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this could be prettified with a constructor for all the non-rev history backups:
func o(endtimes ...) []output
// example:
o(6,4,2)
| { | ||
| "revision history/ignore compacted", | ||
| 51, 58, | ||
| []output{{end: 56, rev: true}, {end: 54, rev: true}, {end: 52, rev: true}}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert RevisionHistoryStartTime too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test itself sets rev to true if the output has a non-zero revision history start time, which I think is sufficient? The fakes just sets the revision start time to the start time of the fake (or in the case of a full, half of the end time). I think changing the logic to assert the actual value of the revision start time worsens the test readability without much benefit since we'd mostly just be testing the fake's value. Knowing that it's not zero tells us that we are reading the value from the index, which I think gives us the coverage we need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sg
de02866 to
87a43f2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
87a43f2 to
efba41f
Compare
msbutler
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work and good discussion!
This commit adds the `ListRestorableBackup` helper, which reads through the index and returns all restorable times along with their associated backup IDs. Epic: CRDB-57536 Informs: cockroachdb#159647 Release note: None
efba41f to
79cbaac
Compare
|
TFTR! bors r=msbutler |
This commit adds the
ListRestorableBackuphelper, which reads through the index and returns all restorable times along with their associated backup IDs.Epic: CRDB-57536
Informs: #159647
Release note: None