You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add include_blob_files option to GetApproximateSizes (#14501)
Summary:
**Summary:**
Add a new boolean flag `include_blob_files` (default: `false`) to `SizeApproximationOptions` and a corresponding `INCLUDE_BLOB_FILES` enum value to `SizeApproximationFlags`. When set to `true`, the returned size includes an approximation of blob file data in the queried key range.
**Algorithm:**
The blob file size contribution is prorated using the SST size ratio:
```
blob_size_in_range ≈ total_blob_size * (sst_size_in_range / total_sst_size)
```
The blob-to-SST ratio (`total_blob_size / total_sst_size`) is computed once before the per-range loop, so iterating levels and blob files only happens once per `GetApproximateSizes` call regardless of how many ranges are queried. The per-range SST size (`ApproximateSize`) is computed once and shared between `include_files` and `include_blob_files`.
**Limitations:**
- Assumes blob data is distributed proportionally to SST data across the key space. May be inaccurate if blob value sizes vary significantly across different key ranges (e.g., one range has large blobs while another has small ones).
- If there are no SST files (all data in memtables), the blob size contribution will be 0 even if blob files exist on disk.
**Changes:**
- `include/rocksdb/options.h`: New `include_blob_files` field in `SizeApproximationOptions`; updated doc comments for `include_memtables`/`include_files`
- `include/rocksdb/db.h`: New `INCLUDE_BLOB_FILES` in `SizeApproximationFlags` enum, updated flags-to-options mapping
- `include/rocksdb/c.h`: New `rocksdb_size_approximation_flags_include_blob_files` C API enum value
- `java/`: Added `INCLUDE_BLOB_FILES` to `SizeApproximationFlag.java` and JNI flag mapping in `rocksjni.cc`
- `db/db_impl/db_impl.cc`: Blob-to-SST ratio computed once before loop, SST range size computed once per range and shared
- `db_stress_tool/db_stress_test_base.cc`: Randomized `include_blob_files` in stress test
Pull Request resolved: #14501
Test Plan:
- New `DBBlobBasicTest.GetApproximateSizesIncludingBlobFiles` — verifies:
- Size with blobs > without (full range)
- Non-overlapping range returns 0
- Partial range returns proportionally less than full range
- `SizeApproximationFlags` API works
- Multi-range query: two sub-ranges sum approximately to the full-range result
- Stress test now exercises the new option randomly
Reviewed By: hx235
Differential Revision: D97984211
Pulled By: xingbowang
fbshipit-source-id: e9127eac3308687fd4f0b17a771fd61fba6a8380
0 commit comments