You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
compact: add point tombstone density compaction heuristic
This change adds a heuristic to compact point tombstones based on
their density across the LSM. We add a new table property called
`TombstoneDenseBlocksRatio` and a corresponding field in `TableStats` that
tracks the ratio of data blocks in each table which are considered
tombstone-dense. This value is calculated on the fly while tables are being
written, so no extra I/O is required later on to compute it.
A data block is considered tombstone-dense if it fulfills either of the
following criteria:
1. The block contains at least `options.Experimental.NumDeletionsThreshold`
point tombstones. The default value is `100`.
2. The ratio of the uncompressed size of point tombstones to the uncompressed
size of the block is at least `options.Experimental.DeletionSizeRatioThreshold`.
For example, with the default value of `0.5`, a data block of size 4KB
would be considered tombstone-dense if it contains at least 2KB of point
tombstones.
The intuition for these criteria is best described in
[this discussion](#918 (comment)),
which highlights that dense clusters are bad because they a) waste CPU when
skipping over tombstones, and b) waste I/O because we end up loading more
blocks per live key. The two criteria above are meant to tackle these two
issues respectively; the the count-based threshold prevents CPU waste,
and the size-based threshold prevents I/O waste.
A table is considered eligible for the new tombstone compaction type if
its ratio of tombstone-dense blocks is at least `options.Experimental.MinTombstoneDenseRatio`.
The default value is `0.05`. We use an Annotator in a similar way to
elision-only compactions in order to prioritize compacting the table with
the most tombstone-dense blocks if there are multiple eligible tables.
The default here was chosen through experimentation on CockroachDB KV
workloads; with a lower value we were compacting too aggressively leading
to very high write amplification, but lower values led to very few
noticeable performance improvements.
Fixes: #918
0 commit comments