-
Notifications
You must be signed in to change notification settings - Fork 4.1k
backup: add support for locality aware backup compaction #160558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
backup: add support for locality aware backup compaction #160558
Conversation
846cefe to
af26346
Compare
63d1625 to
ff6b533
Compare
| } | ||
|
|
||
| var executionLocality roachpb.Locality | ||
| if options.ExecutionLocality != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh this slipped by in a prior review, but we should probably check options.ExecutionLocality.NonEmpty here (in general we should prefer length checks over nil checks for slices).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not seeing options.ExecutionLocality.NonEmpty, this is a tree.Expr that we expect to be a *tree.StrVal, I might be missing something though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh whoops that was mb — I was thinking it was a roachpb.Locality. We're good then
pkg/backup/compaction_processor.go
Outdated
| var destLocalityKV string | ||
| if len(p.spec.URIsByLocalityKV) > 0 { | ||
| var localitySinkURI string | ||
| // When matching, more specific KVs in the node locality take precedence | ||
| // over less specific ones so search back to front. | ||
| for i := len(p.FlowCtx.EvalCtx.Locality.Tiers) - 1; i >= 0; i-- { | ||
| tier := p.FlowCtx.EvalCtx.Locality.Tiers[i].String() | ||
| if dest, ok := p.spec.URIsByLocalityKV[tier]; ok { | ||
| localitySinkURI = dest | ||
| destLocalityKV = tier | ||
| break | ||
| } | ||
| } | ||
| if localitySinkURI != "" { | ||
| destURI = localitySinkURI | ||
| } else if p.spec.StrictLocality { | ||
| // This shouldn't happen unless there was a bug in distsql planning. | ||
| return errors.Errorf( | ||
| "sql processor locality %s does not match any of the backup localities %v: cannot proceed with strict locality", | ||
| p.FlowCtx.EvalCtx.Locality, p.spec.URIsByLocalityKV, | ||
| ) | ||
| } | ||
| } | ||
| defaultConf, err := cloud.ExternalStorageConfFromURI(destURI, user) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I'd like to move this into its own selectLocalityMatchingStorageConf (or whatever name suits your fancy) helper function. Most of the other code in this function has been delegated into separate functions, so I think it'd be good to keep the pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh and the nice thing is, since both regular backup and compacted backups share this logic, this gives us a chance to do some code deduping. Less code, best code
pkg/backup/compaction_processor.go
Outdated
| return errors.Errorf( | ||
| "sql processor locality %s does not match any of the backup localities %v: cannot proceed with strict locality", | ||
| p.FlowCtx.EvalCtx.Locality, p.spec.URIsByLocalityKV, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For errors like these, where we're largely doing a sanity check, we can use errors.AssertionFailedf.
| ) (*SSTSinkKeyWriter, cloud.ExternalStorage) { | ||
| conf, store := sinkTestSetup(t, settings, elideMode) | ||
| sink, err := MakeSSTSinkKeyWriter(conf, store) | ||
| sink, err := MakeSSTSinkKeyWriter(conf, store, "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: one convention we have here is if we provide an r-value to a function like an empty string, boolean, int, etc. we denote the parameter name after the value, so like "" /* localityKV */.
Granted, most IDEs these days can add parameter hints themselves, so I don't think it's necessarily enforced everywhere.
| if len(instanceIDs) == 0 { | ||
| err = errors.Newf("no healthy instances found matching filters: %+v", localityFilters) | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This feels a little out of scope for this, but I do like the change. Maybe we could move this into its own PR?
How'd you run into this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So setupAllNodesPlanningSystem didn't previously filter for localities correctly, the gateway node would always be included. This was fine for regular backups as they don't use the instance ids returned from this function, and rely on PartitionSpans to do filtering instead. Compaction doesn't use PartitionSpans, so this seemed like the right spot to get filtered instances.
With that fix, it was possible for us to get an empty list if no instances matched our filters, which we don't check for, and divide by the length, so this seemed like the right spot to throw in a check, happy to move any of that logic around though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah nice catch — looks like we also divide by length in restore too. If it's a bug that's actively blocking this PR, then we can leave it in. Maybe could move it into its own commit, but I don't think it matters too much.
pkg/backup/compaction_test.go
Outdated
| compactedPath := filepath.Join( | ||
| "incrementals", | ||
| fullBackupPath, | ||
| strings.TrimPrefix(end.GoTime().Format(backupbase.DateBasedIncFolderName), "/"), | ||
| ) + "-" + start.GoTime().Format(backupbase.DateBasedIncFolderNameSuffix) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ughh... I know the reason for this, but doesn't make me hate this any less :(
This could be made a bit more resilient with:
backupbase.DefaultIncrementalsSubdirbackupinfo.ConstructDateBasedIncrementalFolderName
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's also perhaps a route here where instead of constructing these paths ourselves, we can use SHOW BACKUP FILES and call it individually on each locality aware backup.
With some fancy SQL queries, we could determine the exact number of unique data directories and correlate that against the number of expected backups.
Here's a PR of mine where I did that query to find unique data directories:
https://github.com/cockroachdb/cockroach/pull/160027/changes
ff6b533 to
7e83650
Compare
pkg/backup/compaction_test.go
Outdated
| tempDir, | ||
| fmt.Sprintf("foo/TestBackupCompactionLocAware/partition-by-unique-key/%d", i), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is better than what we had before, but not exactly quite what I had in mind.
I played around with SHOW BACKUP FILES on a locality aware backup to see what we could tease out of it, and looks like the output looks something like this:
path | backup_type | start_pretty | end_pretty | start_key | end_key | size_bytes | rows | locality | file_bytes
--------------------------------------------------------------------------------------------------------+-------------+---------------------------+---------------------------+----------------------+----------------------+------------+---------+----------------+-------------
/2026/01/09-170405.74/data/1140024429631832067.sst | full | /Table/4/1 | /Table/4/3 | \x8c89 | \x8c8b | 163 | 2 | region=us-west | -1
/2026/01/09-170405.74/data/1140024429632290818.sst | full | /Table/5/1 | /Table/5/2 | \x8d89 | \x8d8a | 380 | 15 | region=us-east | -1
/2026/01/09-170405.74/data/1140024429561839617.sst | full | /Table/6/1 | /Table/6/2 | \x8e89 | \x8e8a | 307 | 5 | default | -1
/2026/01/09-170405.74/data/1140024429763133443.sst | full | /Table/21/1 | /Table/21/2 | \x9d89 | \x9d8a | 261 | 5 | region=us-west | -1
/2026/01/09-170405.74/data/1140024429561937921.sst | full | /Table/23/1 | /Table/23/7 | \x9f89 | \x9f8f | 217 | 1 | default | -1
/2026/01/09-170405.74/data/1140024429562429441.sst | full | /Table/37/1 | /Table/37/3 | \xad89 | \xad8b | 560 | 2 | default | -1
/2026/01/09-170405.74/data/1140024429920059394.sst | full | /Table/48/1 | /Table/48/2 | \xb889 | \xb88a | 11 | 1 | region=us-east | -1
/2026/01/09-170405.74/data/1140024429845348355.sst | full | /Table/106/1 | /Table/106/2 | \xf289 | \xf28a | 619 | 10 | region=us-west | -1
/2026/01/09-170405.74/data/1140024429846036482.sst | full | /Table/107/1 | /Table/107/2 | \xf389 | \xf38a | 11029 | 100 | region=us-east | -1
/2026/01/09-170405.74/data/1140024429682196483.sst | full | /Table/108/1 | /Table/108/1/3/9/1684 | \xf489 | \xf4898b91f70694 | 67109864 | 115687 | region=us-west | -1
/2026/01/09-170405.74/data/1140024429682196483.sst | full | /Table/108/1/3/9/1684 | /Table/108/1/7/8/396 | \xf4898b91f70694 | \xf4898f90f7018c | 67110399 | 115716 | region=us-west | -1
/2026/01/09-170405.74/data/1140024429682196483.sst | full | /Table/108/1/7/8/396 | /Table/108/2 | \xf4898f90f7018c | \xf48a | 39779999 | 68607 | region=us-west | -1
/2026/01/09-170405.74/data/1140024429858291713.sst | full | /Table/108/2 | /Table/108/3 | \xf48a | \xf48b | 15554430 | 0 | default | -1
/2026/01/09-170405.74/data/1140024429969965058.sst | full | /Table/109/1 | /Table/109/2 | \xf589 | \xf58a | 25255259 | 300001 | region=us-east | -1
/2026/01/09-170405.74/data/1140024433051074562.sst | full | /Table/110/1 | /Table/110/2 | \xf66e89 | \xf66e8a | 11488300 | 300000 | region=us-east | -1
/2026/01/09-170405.74/data/1140024429983662082.sst | full | /Table/110/2 | /Table/110/3 | \xf66e8a | \xf66e8b | 10258200 | 0 | region=us-east | -1
/2026/01/09-170405.74/data/1140024429630750721.sst | full | /Table/111/1 | /Table/111/2 | \xf66f89 | \xf66f8a | 2070000 | 90000 | default | -1
/2026/01/09-170405.74/data/1140024432041394178.sst | full | /Table/112/1 | /Table/112/2 | \xf67089 | \xf6708a | 9150783 | 100000 | region=us-east | -1
/2026/01/09-170405.74/data/1140024437997699075.sst | full | /Table/113/1 | /Table/113/1/2/2904 | \xf67189 | \xf671898af70b58 | 67109335 | 202907 | region=us-west | -1
/2026/01/09-170405.74/data/1140024437997699075.sst | full | /Table/113/1/2/2904 | /Table/113/1/4/5796 | \xf671898af70b58 | \xf671898cf716a4 | 67109306 | 202896 | region=us-west | -1
/2026/01/09-170405.74/data/1140024437997699075.sst | full | /Table/113/1/4/5796 | /Table/113/1/6/8715 | \xf671898cf716a4 | \xf671898ef7220b | 67109261 | 202923 | region=us-west | -1
/2026/01/09-170405.74/data/1140024437997699075.sst | full | /Table/113/1/6/8715 | /Table/113/1/8/11619 | \xf671898ef7220b | \xf6718990f72d63 | 67109324 | 202908 | region=us-west | -1
/2026/01/09-170405.74/data/1140024437997699075.sst | full | /Table/113/1/8/11619 | /Table/113/2 | \xf6718990f72d63 | \xf6718a | 62309502 | 188385 | region=us-west | -1
/2026/01/09-170405.74/data/1140024430389035009.sst | full | /Table/114/1 | /Table/114/1/3/4/-2915/5 | \xf67289 | \xf672898b8c86f49d8d | 67108977 | 992586 | default | -1
/2026/01/09-170405.74/data/1140024430389035009.sst | full | /Table/114/1/3/4/-2915/5 | /Table/114/1/6/7/-2748/10 | \xf672898b8c86f49d8d | \xf672898e8f86f54492 | 67108962 | 992600 | default | -1
/2026/01/09-170405.74/data/1140024430389035009.sst | full | /Table/114/1/6/7/-2748/10 | /Table/114/2 | \xf672898e8f86f54492 | \xf6728a | 68809986 | 1017648 | default | -1
/incrementals/2026/01/09-170405.74/20260109/170952.72-20260109-170851.77/data/1140025566716493825.sst | incremental | /Table/106/1 | /Table/106/2 | \xf289 | \xf28a | 529 | 10 | default | -1
/incrementals/2026/01/09-170405.74/20260109/170952.72-20260109-170851.77/data/1140025566723211267.sst | incremental | /Table/107/1 | /Table/107/2 | \xf389 | \xf38a | 3666 | 36 | region=us-west | -1
/incrementals/2026/01/09-170405.74/20260109/170952.72-20260109-170851.77/data/1140025566726946818.sst | incremental | /Table/108/1 | /Table/108/2 | \xf489 | \xf48a | 42258 | 74 | region=us-east | -1
/incrementals/2026/01/09-170405.74/20260109/170952.72-20260109-170851.77/data/1140025566873714689.sst | incremental | /Table/109/1 | /Table/109/2 | \xf589 | \xf58a | 3047 | 46 | default | -1
/incrementals/2026/01/09-170405.74/20260109/170952.72-20260109-170851.77/data/1140025566723506179.sst | incremental | /Table/110/1 | /Table/110/2 | \xf66e89 | \xf66e8a | 897 | 30 | region=us-west | -1
/incrementals/2026/01/09-170405.74/20260109/170952.72-20260109-170851.77/data/1140025566716952577.sst | incremental | /Table/110/2 | /Table/110/3 | \xf66e8a | \xf66e8b | 771 | 0 | default | -1
/incrementals/2026/01/09-170405.74/20260109/170952.72-20260109-170851.77/data/1140025566725308418.sst | incremental | /Table/111/1 | /Table/111/2 | \xf66f89 | \xf66f8a | 270 | 30 | region=us-east | -1
/incrementals/2026/01/09-170405.74/20260109/170952.72-20260109-170851.77/data/1140025566933254146.sst | incremental | /Table/114/1 | /Table/114/2 | \xf67289 | \xf6728a | 19370 | 308 | region=us-east | -1
I'm thinking we could first compute the data directory paths, and then count the number of unique data paths per locality. This will tell us how many backups per locality had SSTs in them. So if we took 3 backups and one compacted backup, if locality aware backups executed correctly, we should see 4 backups per locality. If any one of the locality aware backups was missing an SST, we'd see less than 4 backups for that locality.
I was thinking a query like this:
WITH with_dir AS (
SELECT *, regexp_replace(path, '/data/[^/]+\.sst$', '') AS dir
FROM [SHOW BACKUP FILES
FROM LATEST IN (<collection_uris>)]
)
SELECT count(DISTINCT dir), locality
FROM with_dir
GROUP BY locality;With this, you wouldn't even need the findSSTs.
This patch teaches backup compactions about locality aware backups, allowing backup compactions to compact locality aware backups, and respect the passed filters. Fixes: cockroachdb#149593 Release note: None
7e83650 to
4b390fb
Compare
This patch teaches backup compactions about locality aware backups, allowing backup compactions to compact locality aware backups, and respect the passed filters.
Fixes: #149593
Release note: None