Skip to content

Add include_blob_files option to GetApproximateSizes#14501

Closed
xingbowang wants to merge 5 commits intofacebook:mainfrom
xingbowang:2026_03_23_GetAppoximateSizes
Closed

Add include_blob_files option to GetApproximateSizes#14501
xingbowang wants to merge 5 commits intofacebook:mainfrom
xingbowang:2026_03_23_GetAppoximateSizes

Conversation

@xingbowang
Copy link
Copy Markdown
Contributor

@xingbowang xingbowang commented Mar 24, 2026

Summary:
Add a new boolean flag include_blob_files (default: false) to SizeApproximationOptions and a corresponding INCLUDE_BLOB_FILES enum value to SizeApproximationFlags. When set to true, the returned size includes an approximation of blob file data in the queried key range.

Algorithm:
The blob file size contribution is prorated using the SST size ratio:

blob_size_in_range ≈ total_blob_size * (sst_size_in_range / total_sst_size)

The blob-to-SST ratio (total_blob_size / total_sst_size) is computed once before the per-range loop, so iterating levels and blob files only happens once per GetApproximateSizes call regardless of how many ranges are queried. The per-range SST size (ApproximateSize) is computed once and shared between include_files and include_blob_files.

Limitations:

  • Assumes blob data is distributed proportionally to SST data across the key space. May be inaccurate if blob value sizes vary significantly across different key ranges (e.g., one range has large blobs while another has small ones).
  • If there are no SST files (all data in memtables), the blob size contribution will be 0 even if blob files exist on disk.

Changes:

  • include/rocksdb/options.h: New include_blob_files field in SizeApproximationOptions; updated doc comments for include_memtables/include_files
  • include/rocksdb/db.h: New INCLUDE_BLOB_FILES in SizeApproximationFlags enum, updated flags-to-options mapping
  • include/rocksdb/c.h: New rocksdb_size_approximation_flags_include_blob_files C API enum value
  • java/: Added INCLUDE_BLOB_FILES to SizeApproximationFlag.java and JNI flag mapping in rocksjni.cc
  • db/db_impl/db_impl.cc: Blob-to-SST ratio computed once before loop, SST range size computed once per range and shared
  • db_stress_tool/db_stress_test_base.cc: Randomized include_blob_files in stress test

Test Plan:

  • New DBBlobBasicTest.GetApproximateSizesIncludingBlobFiles — verifies:
    • Size with blobs > without (full range)
    • Non-overlapping range returns 0
    • Partial range returns proportionally less than full range
    • SizeApproximationFlags API works
    • Multi-range query: two sub-ranges sum approximately to the full-range result
  • Stress test now exercises the new option randomly

Add a new boolean flag include_blob_files (default: false) to
SizeApproximationOptions and a corresponding INCLUDE_BLOB_FILES enum
value to SizeApproximationFlags. When set to true, the returned size
includes the total size of blob files referenced by SST files that
overlap the queried key range.

Implementation:
- VersionSet::ApproximateBlobSize() collects SST file numbers
  overlapping the range via GetOverlappingInputs, then sums blob
  file sizes for any blob file with linked SSTs in that set.
- Validation updated to allow include_blob_files=true as a
  standalone option (without include_files or include_memtables).

Test: new DBBlobBasicTest.GetApproximateSizesIncludingBlobFiles
@meta-cla meta-cla bot added the CLA Signed label Mar 24, 2026
@xingbowang xingbowang marked this pull request as draft March 24, 2026 00:19
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 24, 2026

⚠️ clang-tidy: 2 warning(s) on changed lines

Completed in 1063.5s.

Summary by check

Check Count
performance-unnecessary-copy-initialization 2
Total 2

Details

db/blob/db_blob_basic_test.cc (2 warning(s))
db/blob/db_blob_basic_test.cc:2575:17: warning: local copy 'r1_end' of the variable 'mid' is never modified; consider avoiding the copy [performance-unnecessary-copy-initialization]
db/blob/db_blob_basic_test.cc:2576:17: warning: local copy 'r2_start' of the variable 'mid' is never modified; consider avoiding the copy [performance-unnecessary-copy-initialization]

- Remove .claude/ and stray .md files from previous work
- Change blob size approximation from linked_ssts approach to
  proportional: blob_size_in_range = total_blob_size *
  (sst_size_in_range / total_sst_size)
- Update test to verify proportional behavior (partial range
  returns less than full range)
- Update API comment to describe the proportional approach
@xingbowang xingbowang force-pushed the 2026_03_23_GetAppoximateSizes branch from eb07828 to f304aea Compare March 24, 2026 16:40
Move the SST range size computation (ApproximateSize call) to the
caller in DBImpl::GetApproximateSizes so it is computed once and
shared between include_files and include_blob_files. Simplify
ApproximateBlobSize to take pre-computed sst_size_in_range instead
of recomputing it internally.
@xingbowang
Copy link
Copy Markdown
Contributor Author

The CI failure in build-linux-cmake-with-folly-coroutines is the flaky EventListenerTest.BackgroundJobPressure test, unrelated to this PR. It was introduced in #14474 and "fixed" in #14476, but is still intermittently failing. Our previous push (commit b8dcb3b) passed this same job; the failure only appeared on the latest push with the same test code.

@xingbowang xingbowang marked this pull request as ready for review March 24, 2026 17:52
@meta-codesync
Copy link
Copy Markdown

meta-codesync bot commented Mar 24, 2026

@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D97984211.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 24, 2026

✅ Claude Code Review

Auto-triggered after CI passed — reviewing commit e03b58b


Summary

A well-structured PR that adds blob file size approximation to GetApproximateSizes using a prorated SST ratio. The algorithm is efficient (one-time precomputation), the API additions are consistent, and test coverage is reasonable. However, there are several correctness and accuracy issues worth addressing.


Issues Found

🔴 Critical

1. Validation allows include_blob_files=true with both include_memtables=false and include_files=false, but the result is always 0

  • File: db/db_impl/db_impl.cc:4955
  • Perspective: Correctness & Edge Cases
  • The updated validation !options.include_memtables && !options.include_files && !options.include_blob_files means {include_blob_files=true, include_files=false, include_memtables=false} passes validation. However, in this case sst_size_in_range is only computed when options.include_files || options.include_blob_files, which is true — so ApproximateSize runs. But then blob_to_sst_ratio * sst_size_in_range is returned. The result is NOT zero; it's the blob approximation based on SST range overlap. This is actually fine behavior — but the comment in options.h says "at least one of include_files or include_blob_files must be true", which is misleading. The real invariant is that at least one of the three flags must be true.
  • Suggested fix: Update options.h comments to say: "If all three options are false, GetApproximateSizes returns InvalidArgument." Keep it simple — avoid specifying complex pairwise constraints.

2. GetBlobFileSize() includes garbage data, inflating the blob estimate

  • File: db/db_impl/db_impl.cc:4986
  • Perspective: Cross-Component Analysis
  • GetBlobFileSize() returns BlobLogHeader::kSize + total_blob_bytes_ + BlobLogFooter::kSize — the full physical file size including headers/footers and garbage (data from deleted/overwritten keys). After GC compaction, blob files may contain significant garbage. The proration will overestimate actual blob data in-range.
  • Suggested fix: Consider using GetTotalBlobBytes() - GetGarbageBlobBytes() to approximate only live blob data. Similarly, SST file sizes (from NumLevelBytes) include tombstones and overwritten values, so there's a conceptual mismatch — but the garbage in blob files can be much larger proportionally. At minimum, document this limitation in the options.h comment.

🟡 Suggestion

3. Stress test can produce include_blob_files=true, include_memtables=false, include_files=false

  • File: db_stress_tool/db_stress_test_base.cc:2783-2786
  • Perspective: Cross-Component Analysis
  • The original logic only randomized include_files when include_memtables was true (to avoid the all-false case). The new code sets include_blob_files independently, then gates include_files randomization on sao.include_memtables || sao.include_blob_files. This means the combination {include_memtables=false, include_files=false, include_blob_files=true} is possible. While this passes validation (due to the updated check), it's an unusual mode that only returns prorated blob size. This is fine for stress testing, but verify this is the intended stress test coverage.

4. Blob size approximation doesn't account for blob files linked to specific SSTs in different levels

  • File: db/db_impl/db_impl.cc:4978-4989
  • Perspective: Cross-Component & Correctness
  • BlobFileMetaData has a GetLinkedSsts() method that tracks which SST files reference each blob file. A more accurate estimation could prorate each blob file's size based on whether its linked SSTs overlap the queried range. The current global ratio approach is simpler but less accurate for non-uniform distributions.
  • Suggested fix: Consider mentioning this potential improvement in a TODO comment, or document the tradeoff in the PR description. The current approach is reasonable for a first version.

5. files_size_error_margin interaction with blob estimation

  • File: db/db_impl/db_impl.cc:5004-5008
  • Perspective: Correctness
  • The files_size_error_margin in SizeApproximationOptions allows ApproximateSize to take shortcuts for SST file size estimation. When include_blob_files is true, the blob size inherits this error through the prorated sst_size_in_range value. This means a loose margin (e.g., 0.1) compounds: 10% error on SST size → 10% error on blob estimate. This is acceptable but worth documenting.

6. Test uses env_->GetChildren + string suffix check to verify blob files

  • File: db/blob/db_blob_basic_test.cc:2479-2487
  • Perspective: Testing
  • The blob file check f.substr(f.size() - 5) == ".blob" is fragile. Consider using ROCKSDB_NAMESPACE::IsBlobFile() or a similar utility if available, or at least EndsWith.
  • Suggested fix: Use f.size() >= 5 && f.compare(f.size() - 5, 5, ".blob") == 0 or a utility function. The current f.size() > 5 should be f.size() >= 5 for correctness (though a 5-char blob filename is unlikely).

🟢 Nitpick

7. Inconsistent formatting change in c.h

  • File: include/rocksdb/c.h:2732-2740
  • Perspective: Code Quality
  • The diff includes a formatting change to rocksdb_slicetransform_create that is unrelated to this feature. This should ideally be in a separate commit.

8. Java magic number 4 instead of named constant

  • File: java/rocksjni/rocksjni.cc:2545
  • Perspective: Code Quality
  • if (jinclude_flags & 4) uses a magic number. The existing code uses 1 and 2 for the other flags, so this is consistent — but all three should ideally reference the enum values. At minimum, add a comment like // INCLUDE_BLOB_FILES.

Cross-Component Analysis

Context Assessment
ReadOnly DB / SecondaryInstance Safe — inherits DBImpl::GetApproximateSizes, blob files are read-only
StackableDB Safe — delegates to underlying DB
User-defined timestamps Safe — timestamp handling occurs before blob ratio computation
WritePreparedTxnDB Safe — uses same GetApproximateSizes path
FIFO / Universal compaction Safe — blob ratio is computed from current Version, agnostic to compaction style
BlobDB (legacy) N/A — this is the integrated blob support, not the legacy BlobDB wrapper
Concurrent writers Safe — SuperVersion is ref-counted, blob_to_sst_ratio computed from pinned Version
No SST files (memtable-only) Handled — total_sst_size == 0blob_to_sst_ratio = 0.0 → blob contribution is 0. Documented as limitation.

Positive Observations

  • Efficient algorithm: Computing blob_to_sst_ratio once before the per-range loop is a good design choice that avoids redundant iteration over levels and blob files.
  • Backward compatible: Default false for include_blob_files preserves existing behavior.
  • Comprehensive API coverage: C, C++, and Java APIs are all updated consistently.
  • Test covers multiple scenarios: Full range, empty range, partial range, and flags API are all tested.
  • Stress test integration: The new option is exercised in the stress test framework.

ℹ️ About this response

Generated by Claude Code.
Review methodology: claude_md/code_review.md

Limitations:

  • Claude may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /claude-review [context] — Request a code review
  • /claude-query <question> — Ask about the PR or codebase

- Remove VersionSet::ApproximateBlobSize(); inline the blob-to-SST
  ratio computation in DBImpl::GetApproximateSizes, hoisted out of
  the per-range loop so total_sst_size and total_blob_size are
  computed once per call.
- Update doc comments for include_memtables/include_files to reflect
  include_blob_files as a third valid option.
- Add INCLUDE_BLOB_FILES to C API enum (c.h) and Java JNI
  (SizeApproximationFlag.java, rocksjni.cc).
- Add include_blob_files to stress test random flag selection.
@xingbowang
Copy link
Copy Markdown
Contributor Author

Addressed the following from the Claude Code Review:

#1 (Critical) — Updated doc comments for include_memtables and include_files to reflect include_blob_files as a third valid option.

#3 (C API / Java JNI) — Added rocksdb_size_approximation_flags_include_blob_files to C API enum (c.h), INCLUDE_BLOB_FILES to Java SizeApproximationFlag.java, and JNI flag mapping in rocksjni.cc.

#4 (Stress test) — Added sao.include_blob_files = thread->rand.OneIn(2) to the stress test's GetApproximateSizes path.

#5 (Performance) — Hoisted total_sst_size and total_blob_size computation out of the per-range loop. The blob-to-SST ratio is now computed once per GetApproximateSizes call. Also removed the separate VersionSet::ApproximateBlobSize() method — the logic is inlined in DBImpl::GetApproximateSizes.

#7 (const) — Moot since the method was removed entirely; the inlined code only reads from the Version*.

Intentionally not addressed:

@meta-codesync
Copy link
Copy Markdown

meta-codesync bot commented Mar 26, 2026

@hx235 has imported this pull request. If you are a Meta employee, you can view this in D97984211.

// Query the full range - all keys are covered.
std::string start = Key(0);
std::string end = Key(kNumKeys);
Range r(start, end);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seem like both stress test and unit test only cover one single range, may worth twisting either of them to cover two ranges at least.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — added a multi-range test case that queries two non-overlapping sub-ranges and verifies both return positive sizes, and that their sum is approximately equal to the full-range result (within 10%).

Copy link
Copy Markdown
Contributor

@hx235 hx235 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@hx235
Copy link
Copy Markdown
Contributor

hx235 commented Mar 26, 2026

You probably need to reimport so that I can stamp internally

Two non-overlapping sub-ranges should sum to approximately the
full-range result. Verifies the n>1 ranges code path.
@meta-codesync
Copy link
Copy Markdown

meta-codesync bot commented Mar 27, 2026

@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D97984211.

@meta-codesync
Copy link
Copy Markdown

meta-codesync bot commented Mar 27, 2026

@xingbowang merged this pull request in 1a4b1e4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants