Skip to content

Conversation

@kev-cao
Copy link
Contributor

@kev-cao kev-cao commented Dec 23, 2025

This commit adds the ListRestorableBackup helper, which reads through the index and returns all restorable times along with their associated backup IDs.

Epic: CRDB-57536

Informs: #159647

Release note: None

@kev-cao kev-cao requested review from a team as code owners December 23, 2025 01:43
@kev-cao kev-cao requested review from golgeek, srosenberg and xxmplus and removed request for a team December 23, 2025 01:43
@blathers-crl
Copy link

blathers-crl bot commented Dec 23, 2025

Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@kev-cao kev-cao requested review from ZhouXing19, msbutler and rharding6373 and removed request for a team, ZhouXing19, golgeek, rharding6373, srosenberg and xxmplus December 23, 2025 01:43
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new ListRestorableBackups helper function that reads through the backup index to return all restorable backups within a specified time range, along with their backup IDs. To support this functionality, the PR refactors the ExternalStorage.List method signature across all cloud storage implementations to use a structured ListOptions parameter instead of individual delimiter and prefix parameters, adding support for AfterKey filtering.

  • Refactored all List method signatures to accept a ListOptions struct instead of separate delimiter and prefix parameters
  • Added ListRestorableBackups function that returns restorable backups with IDs within a time range, with logic to elide compacted duplicates
  • Implemented AfterKey filtering support in all cloud storage providers (S3, GCS, Azure, nodelocal, userfile) with client-side filtering for consistency

Reviewed changes

Copilot reviewed 33 out of 33 changed files in this pull request and generated no comments.

Show a summary per file
File Description
pkg/cloud/external_storage.go Defines new ListOptions struct with Delimiter and AfterKey fields, along with CanonicalAfterKey helper method
pkg/backup/backupinfo/backup_index.go Adds ListRestorableBackups, listIndexesWithinRange, and related helper functions for parsing index paths and encoding backup IDs
pkg/backup/backupinfo/backup_index_test.go Adds comprehensive test coverage for ListRestorableBackups with various backup chain scenarios including compacted and revision-history backups
pkg/cloud/gcp/gcs_storage.go Updates List to use ListOptions and implements client-side AfterKey filtering
pkg/cloud/azure/azure_storage.go Updates List to use ListOptions and implements client-side AfterKey filtering for both blob prefixes and items
pkg/cloud/amazon/s3_storage.go Updates List to use ListOptions and implements client-side AfterKey filtering
pkg/cloud/nodelocal/nodelocal_storage.go Updates List to use ListOptions and implements client-side AfterKey filtering with delimiter grouping support
pkg/cloud/userfile/file_table_storage.go Updates List to use ListOptions and implements client-side AfterKey filtering with delimiter grouping support
pkg/cloud/httpsink/http_storage.go Updates List method signature to accept ListOptions (no-op implementation)
pkg/cloud/nullsink/nullsink_storage.go Updates List method signature to accept ListOptions (no-op implementation)
pkg/cloud/impl_registry.go Updates esWrapper.List to pass through ListOptions to wrapped storage
pkg/cloud/cloudtestutils/cloud_test_helpers.go Adds comprehensive test cases for AfterKey filtering behavior with various prefix and delimiter combinations
pkg/sql/importer/*.go Updates all List call sites to use cloud.ListOptions{}
pkg/sql/bulkutil/*.go Updates all List call sites to use cloud.ListOptions{}
pkg/backup/backupinfo/manifest_handling.go Updates List calls to use cloud.ListOptions{Delimiter: ...}
pkg/backup/backupdest/*.go Updates all List call sites to use cloud.ListOptions{} with appropriate delimiter settings
pkg/backup/backupencryption/encryption.go Updates List call to use cloud.ListOptions{Delimiter: ...}
pkg/backup/backup_job.go Updates List call to use cloud.ListOptions{}
pkg/backup/backup_test.go Updates all List call sites to use cloud.ListOptions{}
pkg/storage/shared_storage.go Updates List call to use cloud.ListOptions{Delimiter: delimiter}
pkg/roachprod/blobfixture/registry.go Updates all List call sites to use cloud.ListOptions{}
pkg/cmd/roachtest/tests/cdc_helper.go Updates List call to use cloud.ListOptions{}
pkg/ccl/workloadccl/fixture.go Updates List call to use cloud.ListOptions{Delimiter: "/"}
pkg/ccl/changefeedccl/sink_cloudstorage_test.go Updates mock List method signature to accept ListOptions
pkg/cli/userfile.go Updates all List call sites to use cloud.ListOptions{}
pkg/cloud/cloudtestutils/cloud_nemesis.go Updates List call to use cloud.ListOptions{}
pkg/cloud/externalconn/utils/connection_utils.go Updates List call to use cloud.ListOptions{}
pkg/backup/backupinfo/BUILD.bazel Adds dependency on //pkg/util/besteffort package

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@kev-cao kev-cao force-pushed the backup/list-restorable-backups branch from 385422a to e3348b8 Compare December 24, 2025 20:14
@github-actions
Copy link

Potential Bug(s) Detected

The three-stage Claude Code analysis has identified potential bug(s) in this PR that may warrant investigation.

Next Steps:
Please review the detailed findings in the workflow run.

Note: When viewing the workflow output, scroll to the bottom to find the Final Analysis Summary.

After you review the findings, please tag the issue as follows:

  • If the detected issue is real or was helpful in any way, please tag the issue with O-AI-Review-Real-Issue-Found
  • If the detected issue was not helpful in any way, please tag the issue with O-AI-Review-Not-Helpful

@github-actions github-actions bot added the o-AI-Review-Potential-Issue-Detected AI reviewer found potential issue. Never assign manually—auto-applied by GH action only. label Dec 24, 2025
@kev-cao kev-cao force-pushed the backup/list-restorable-backups branch from 692c005 to 6930f26 Compare December 26, 2025 19:26
@github-actions
Copy link

Potential Bug(s) Detected

The three-stage Claude Code analysis has identified potential bug(s) in this PR that may warrant investigation.

Next Steps:
Please review the detailed findings in the workflow run.

Note: When viewing the workflow output, scroll to the bottom to find the Final Analysis Summary.

After you review the findings, please tag the issue as follows:

  • If the detected issue is real or was helpful in any way, please tag the issue with O-AI-Review-Real-Issue-Found
  • If the detected issue was not helpful in any way, please tag the issue with O-AI-Review-Not-Helpful

@github-actions github-actions bot added the o-AI-Review-Potential-Issue-Detected AI reviewer found potential issue. Never assign manually—auto-applied by GH action only. label Dec 26, 2025
@kev-cao kev-cao removed the o-AI-Review-Potential-Issue-Detected AI reviewer found potential issue. Never assign manually—auto-applied by GH action only. label Dec 26, 2025
@kev-cao kev-cao force-pushed the backup/list-restorable-backups branch 4 times, most recently from a5176c5 to 87fcf98 Compare December 30, 2025 16:32
@github-actions
Copy link

Potential Bug(s) Detected

The three-stage Claude Code analysis has identified potential bug(s) in this PR that may warrant investigation.

Next Steps:
Please review the detailed findings in the workflow run.

Note: When viewing the workflow output, scroll to the bottom to find the Final Analysis Summary.

After you review the findings, please tag the issue as follows:

  • If the detected issue is real or was helpful in any way, please tag the issue with O-AI-Review-Real-Issue-Found
  • If the detected issue was not helpful in any way, please tag the issue with O-AI-Review-Not-Helpful

@github-actions github-actions bot added the o-AI-Review-Potential-Issue-Detected AI reviewer found potential issue. Never assign manually—auto-applied by GH action only. label Dec 30, 2025
@kev-cao kev-cao added O-No-AI-Review Prevents AI Review from running and removed o-AI-Review-Potential-Issue-Detected AI reviewer found potential issue. Never assign manually—auto-applied by GH action only. labels Dec 30, 2025
//
// NB: Backups with duplicate end times (e.g. compacted backups) are elided
// and only one is returned. In the case of revision-history backups, the
// backups will be marked as containing revision-history, despite the fact
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm slightly confused by this comment: is an invariant of this function that no backup ids associated with compacted backups will be returned?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well since backup IDs are an encoding of the full end time and the backup end time, then every backup with the same end time within a chain have the same ID. This is why I didn't specify "no compacted backup IDs will be returned" since there is no such thing as a compacted backup ID.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, reading it back though, I do think that in an effort to be specific about my wording, I just made it more confusing. I'll reword it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got tripped up by the following docs line which suggested to me that compacted backups are returned in the list.

despite the fact that the compacted backups specifically do not contain revision history

maybe remove this snippet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree, I think that's too much implementation-specific detail in the docstring.

end: end,
}
// Maintain descending end time order. May need to swap with the last
// index added.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is to deal with full backups, yeah?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, specifically the case where you have incrementals from an older chain having an newer end time than the full backup in the next chain.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be nice state that explicitly in the docstring above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added it to the comment — don't think it fits in the docstring since its an implementation detail. The consumers of the function just need to know that everything is returned in descending end time order.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to nitpick, I would contend this is a notable implementation detail, as this sorting approach is only correct due to backup index ordering invariants. In general, swapping the last two elements of list as you append to it would not lead to a sorted list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I do think that the mention of the invariants that allow this is important to be included (added it), but I still don't think it belongs in the docstring. If someone is reading the function's body itself, then it's important for them to know this context. But for someone using the function, knowing why swapping the last two elements works (or even the fact that is what we do) doesn't really give them valuable information.

@kev-cao kev-cao force-pushed the backup/list-restorable-backups branch 2 times, most recently from e5d0b54 to c5658f8 Compare January 2, 2026 19:30
Copy link
Collaborator

@msbutler msbutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nearly there!

// ten milliseconds (the maximum granularity of the timestamp encoding) to
// ensure an inclusive start.
maxEndTime := before.Add(10 * time.Millisecond)
startPoint, err := endTimeToIndexSubdir(maxEndTime)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uber nit: s/startPoint/maxEndTimeSubdir/r

end: end,
}
// Maintain descending end time order. May need to swap with the last
// index added.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be nice state that explicitly in the docstring above.

return hlc.Timestamp{WallTime: int64(t)*1e9 + int64(t)}
}

// fakeBackupCollection represents a collection of backup chains.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice cleanup!

require.NoError(t, WriteBackupIndexMetadata(
ctx, execCfg, username.RootUserName(), storageFactory, details, hlc.Timestamp{},
))
simpleChain := fakeBackupChain{{0, 2, false}, {2, 4, false}, {4, 6, false}, {6, 8, false}}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you could further prettify these cases via some constructors:

func Chain(backup....) fakeBackupChain

func b(start,end) (fakeBackupSpec)

func bRH(start,end) (fakeBackupSpec) // with revision history

simpleChain := Chain(b(0,2),b(2,4),...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh just tried that out, looks nice

{
// Chain with compacted backup and last backup intersects next chain.
{0, 10, false}, {10, 14, false}, {14, 18, false}, {10, 22, false},
{18, 22, false}, {22, 26, false},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think there are few more cases around prev full with incs that overlap with next chain to write tests for:

  • inc on prev full has matching end time to next full
  • multiple incs of prev full before next full
  • compacted backup while next full is running

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added those tests! Also realized that one more that needed to be tested is a query on overlapping chains that doesn't include the full itself.

{
"simple chain/full chain inclusive",
1, 6,
[]output{{end: 6}, {end: 4}, {end: 2}},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this could be prettified with a constructor for all the non-rev history backups:

func o(endtimes ...) []output

// example:
o(6,4,2)

{
"revision history/ignore compacted",
51, 58,
[]output{{end: 56, rev: true}, {end: 54, rev: true}, {end: 52, rev: true}},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert RevisionHistoryStartTime too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test itself sets rev to true if the output has a non-zero revision history start time, which I think is sufficient? The fakes just sets the revision start time to the start time of the fake (or in the case of a full, half of the end time). I think changing the logic to assert the actual value of the revision start time worsens the test readability without much benefit since we'd mostly just be testing the fake's value. Knowing that it's not zero tells us that we are reading the value from the index, which I think gives us the coverage we need.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sg

@kev-cao kev-cao force-pushed the backup/list-restorable-backups branch 2 times, most recently from de02866 to 87a43f2 Compare January 5, 2026 16:57
@kev-cao kev-cao requested a review from Copilot January 5, 2026 21:01
@kev-cao kev-cao requested a review from msbutler January 5, 2026 21:02
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@kev-cao kev-cao force-pushed the backup/list-restorable-backups branch from 87a43f2 to efba41f Compare January 6, 2026 16:40
Copy link
Collaborator

@msbutler msbutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work and good discussion!

This commit adds the `ListRestorableBackup` helper, which reads through
the index and returns all restorable times along with their associated
backup IDs.

Epic: CRDB-57536

Informs: cockroachdb#159647

Release note: None
@kev-cao kev-cao force-pushed the backup/list-restorable-backups branch from efba41f to 79cbaac Compare January 6, 2026 19:06
@kev-cao
Copy link
Contributor Author

kev-cao commented Jan 6, 2026

TFTR!

bors r=msbutler

@craig
Copy link
Contributor

craig bot commented Jan 6, 2026

@craig craig bot merged commit befcf4b into cockroachdb:master Jan 6, 2026
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

O-AI-Review-Not-Helpful AI reviewer produced result which was incorrect or unhelpful O-No-AI-Review Prevents AI Review from running target-release-26.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants