Skip to content

analyze: add HLLSketch for per-region singleton NDV estimation#412

Closed
0xPoe wants to merge 2 commits into
pingcap:masterfrom
0xPoe:issue-67449-ndv-rate-hll
Closed

analyze: add HLLSketch for per-region singleton NDV estimation#412
0xPoe wants to merge 2 commits into
pingcap:masterfrom
0xPoe:issue-67449-ndv-rate-hll

Conversation

@0xPoe
Copy link
Copy Markdown
Member

@0xPoe 0xPoe commented May 25, 2026

What

Adds a HLLSketch message and hll_ndv_sketch / hll_singleton_sketch repeated fields to RowSampleCollector.

These let TiDB estimate the global singleton (f1) count for ANALYZE NDV sub-sampling via a per-region leave-one-out using fixed-size HyperLogLog sketches, instead of retaining large per-region FM sketches (which is O(regions) memory and spiked to ~20 GiB at ~6700 regions). The FM sketch path is unchanged and still backs the global/stored NDV.

Notes

  • Draft. Stacks on the proto: add NDV rate to analyze request commit (also in this branch).
  • Consumed by the matching TiKV (producer) and TiDB (consumer) PRs, which will pin this branch.
  • HLL registers are one byte each; len(registers) == 1<<precision. Rust (TiKV) and Go (TiDB) use an identical register layout so a TiKV-built sketch reads correctly in TiDB.

0xPoe added 2 commits May 11, 2026 12:39
Add a HLLSketch message and hll_ndv_sketch/hll_singleton_sketch fields to
RowSampleCollector. These let TiDB estimate the global singleton (f1) count
via a per-region leave-one-out using fixed-size HyperLogLog sketches, instead
of retaining large per-region FM sketches. The FM sketch path is unchanged for
the global/stored NDV.

Signed-off-by: 0xPoe <techregister@pm.me>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant