Skip to content

CNDB-14577: Compact all SSTables of a level shard if their number reaches a limit (#1873) #1939

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main-5.0
Choose a base branch
from

Conversation

driftx
Copy link

@driftx driftx commented Aug 1, 2025

CNDB-14577: UCS by default does not compact many small non-overlapping sstables with very few
rows

This PR limits the number of SSTables for a given compaction level shard by executing a major compaction of the shard instead of the regular compaction of overlapping SSTables if the number of SSTables reaches a threshold.

The threshold is controlled by the max_sstables_per_shard_factor setting:

  `max_sstables_per_shard_factor` Limits the number of SSTables per shard. If the number of sstables in a shard
  exceeds this factor times the shard compaction threshold, a major compaction of the shard will be triggered.
  Some conditions like slow writes can lead to SSTables being very small, and never overlap with enough other SSTables
  to be compacted.
  So this setting is useful to prevent the number of SSTables in a shard from growing too large, which can cause
  problems due to the per-sstable overhead. Also these small SSTables may still have overlaps even if under the
  compaction threshold (eg. due to write replicas) and never compacting them wastes storage space.
  The default value is 10.

…ches a limit (#1873)

CNDB-14577: [UCS by default does not compact many small non-overlapping
sstables with very few
rows](riptano/cndb#14577)

This PR limits the number of SSTables for a given compaction level shard
by executing a major compaction of the shard instead of the regular
compaction of overlapping SSTables if the number of SSTables reaches a
threshold.

The threshold is controlled by the `max_sstables_per_shard_factor`
setting:
```md
  `max_sstables_per_shard_factor` Limits the number of SSTables per shard. If the number of sstables in a shard
  exceeds this factor times the shard compaction threshold, a major compaction of the shard will be triggered.
  Some conditions like slow writes can lead to SSTables being very small, and never overlap with enough other SSTables
  to be compacted.
  So this setting is useful to prevent the number of SSTables in a shard from growing too large, which can cause
  problems due to the per-sstable overhead. Also these small SSTables may still have overlaps even if under the
  compaction threshold (eg. due to write replicas) and never compacting them wastes storage space.
  The default value is 10.
```

---------

Co-authored-by: Branimir Lambov <[email protected]>
Copy link

github-actions bot commented Aug 1, 2025

Checklist before you submit for review

  • Make sure there is a PR in the CNDB project updating the Converged Cassandra version
  • Use NoSpamLogger for log lines that may appear frequently in the logs
  • Verify test results on Butler
  • Test coverage for new/modified code is > 80%
  • Proper code formatting
  • Proper title for each commit staring with the project-issue number, like CNDB-1234
  • Each commit has a meaningful description
  • Each commit is not very long and contains related changes
  • Renames, moves and reformatting are in distinct commits
  • All new files should contain the DataStax copyright header instead of the Apache License one

@driftx
Copy link
Author

driftx commented Aug 1, 2025

Relocated to CassandraRelevantProperties, UUID -> TimeUUID, gcBefore is a long, Clock.Global changes, import adjustments.

The UCS test times out after this patch, I believe on the tests using 16 shards. I can make them pass with less shards, but I think we should ticket it if we're going to modify the test. I also tried mocking out ShardManager since it was previously mocked but that didn't help.

@cassci-bot
Copy link

❌ Build ds-cassandra-pr-gate/PR-1939 rejected by Butler


3 regressions found
See build details here


Found 3 new test failures

Test Explanation Runs Upstream
o.a.c.db.compaction.UnifiedCompactionStrategyTest.testMaximalSelection[useDiskBoundaries true] () NEW 🔴 0 / 19
o.a.c.distributed.test.repair.ForceRepairTest.forceWithDifference () NEW 🔴 4 / 19
o.a.c.metrics.TrieMemtableMetricsTest.testContentionMetrics (compression) REGRESSION 🔴 6 / 19

Found 9 known test failures

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants