Add include_blob_files option to GetApproximateSizes#14501
Add include_blob_files option to GetApproximateSizes#14501xingbowang wants to merge 5 commits intofacebook:mainfrom
Conversation
Add a new boolean flag include_blob_files (default: false) to SizeApproximationOptions and a corresponding INCLUDE_BLOB_FILES enum value to SizeApproximationFlags. When set to true, the returned size includes the total size of blob files referenced by SST files that overlap the queried key range. Implementation: - VersionSet::ApproximateBlobSize() collects SST file numbers overlapping the range via GetOverlappingInputs, then sums blob file sizes for any blob file with linked SSTs in that set. - Validation updated to allow include_blob_files=true as a standalone option (without include_files or include_memtables). Test: new DBBlobBasicTest.GetApproximateSizesIncludingBlobFiles
|
| Check | Count |
|---|---|
performance-unnecessary-copy-initialization |
2 |
| Total | 2 |
Details
db/blob/db_blob_basic_test.cc (2 warning(s))
db/blob/db_blob_basic_test.cc:2575:17: warning: local copy 'r1_end' of the variable 'mid' is never modified; consider avoiding the copy [performance-unnecessary-copy-initialization]
db/blob/db_blob_basic_test.cc:2576:17: warning: local copy 'r2_start' of the variable 'mid' is never modified; consider avoiding the copy [performance-unnecessary-copy-initialization]
- Remove .claude/ and stray .md files from previous work - Change blob size approximation from linked_ssts approach to proportional: blob_size_in_range = total_blob_size * (sst_size_in_range / total_sst_size) - Update test to verify proportional behavior (partial range returns less than full range) - Update API comment to describe the proportional approach
eb07828 to
f304aea
Compare
Move the SST range size computation (ApproximateSize call) to the caller in DBImpl::GetApproximateSizes so it is computed once and shared between include_files and include_blob_files. Simplify ApproximateBlobSize to take pre-computed sst_size_in_range instead of recomputing it internally.
|
The CI failure in |
|
@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D97984211. |
✅ Claude Code ReviewAuto-triggered after CI passed — reviewing commit e03b58b SummaryA well-structured PR that adds blob file size approximation to Issues Found🔴 Critical1. Validation allows
2.
🟡 Suggestion3. Stress test can produce
4. Blob size approximation doesn't account for blob files linked to specific SSTs in different levels
5.
6. Test uses
🟢 Nitpick7. Inconsistent formatting change in
8. Java magic number
Cross-Component Analysis
Positive Observations
ℹ️ About this responseGenerated by Claude Code. Limitations:
Commands:
|
- Remove VersionSet::ApproximateBlobSize(); inline the blob-to-SST ratio computation in DBImpl::GetApproximateSizes, hoisted out of the per-range loop so total_sst_size and total_blob_size are computed once per call. - Update doc comments for include_memtables/include_files to reflect include_blob_files as a third valid option. - Add INCLUDE_BLOB_FILES to C API enum (c.h) and Java JNI (SizeApproximationFlag.java, rocksjni.cc). - Add include_blob_files to stress test random flag selection.
|
Addressed the following from the Claude Code Review: #1 (Critical) — Updated doc comments for #3 (C API / Java JNI) — Added #4 (Stress test) — Added #5 (Performance) — Hoisted #7 (const) — Moot since the method was removed entirely; the inlined code only reads from the Intentionally not addressed:
|
| // Query the full range - all keys are covered. | ||
| std::string start = Key(0); | ||
| std::string end = Key(kNumKeys); | ||
| Range r(start, end); |
There was a problem hiding this comment.
Seem like both stress test and unit test only cover one single range, may worth twisting either of them to cover two ranges at least.
There was a problem hiding this comment.
Done — added a multi-range test case that queries two non-overlapping sub-ranges and verifies both return positive sizes, and that their sum is approximately equal to the full-range result (within 10%).
|
You probably need to reimport so that I can stamp internally |
Two non-overlapping sub-ranges should sum to approximately the full-range result. Verifies the n>1 ranges code path.
|
@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D97984211. |
|
@xingbowang merged this pull request in 1a4b1e4. |
Summary:
Add a new boolean flag
include_blob_files(default:false) toSizeApproximationOptionsand a correspondingINCLUDE_BLOB_FILESenum value toSizeApproximationFlags. When set totrue, the returned size includes an approximation of blob file data in the queried key range.Algorithm:
The blob file size contribution is prorated using the SST size ratio:
The blob-to-SST ratio (
total_blob_size / total_sst_size) is computed once before the per-range loop, so iterating levels and blob files only happens once perGetApproximateSizescall regardless of how many ranges are queried. The per-range SST size (ApproximateSize) is computed once and shared betweeninclude_filesandinclude_blob_files.Limitations:
Changes:
include/rocksdb/options.h: Newinclude_blob_filesfield inSizeApproximationOptions; updated doc comments forinclude_memtables/include_filesinclude/rocksdb/db.h: NewINCLUDE_BLOB_FILESinSizeApproximationFlagsenum, updated flags-to-options mappinginclude/rocksdb/c.h: Newrocksdb_size_approximation_flags_include_blob_filesC API enum valuejava/: AddedINCLUDE_BLOB_FILEStoSizeApproximationFlag.javaand JNI flag mapping inrocksjni.ccdb/db_impl/db_impl.cc: Blob-to-SST ratio computed once before loop, SST range size computed once per range and shareddb_stress_tool/db_stress_test_base.cc: Randomizedinclude_blob_filesin stress testTest Plan:
DBBlobBasicTest.GetApproximateSizesIncludingBlobFiles— verifies:SizeApproximationFlagsAPI works