fix(stats): perf regression because of memory-aware chunking logic error#3598
Merged
jqnatividad merged 5 commits intomasterfrom Mar 10, 2026
Merged
fix(stats): perf regression because of memory-aware chunking logic error#3598jqnatividad merged 5 commits intomasterfrom
jqnatividad merged 5 commits intomasterfrom
Conversation
`--infer-boolean` was forcing full statistics computation (sum, dist, online stats, string lengths, precision tracking) even when `--typesonly` should limit work to type inference only. Now `which_stats()` no longer enables sum/dist for typesonly, and `Stats::add()` has a targeted early return that only computes minmax + cardinality — the minimum needed for boolean inference — skipping ~60-70% of per-sample work. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add debug_assert! before unsafe unwrap_unchecked on minmax in the typesonly+infer_boolean path to catch invariant violations in debug builds. Extract modes/cardinality update logic into update_modes() helper to eliminate duplication between typesonly and normal paths. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…g not needed The outer chunking condition always evaluated true because max_chunk_memory_mb defaults to Some(0), forcing memory-aware chunking even for plain stats. This caused 1M single-record chunks on large CSVs (massive thread scheduling overhead). Three fixes: - Only enter memory-aware path when needs_memory_aware_chunking is true - Add needs_memory_aware_chunking guard in Some(0) branch for defense-in-depth - Prefer CPU-based chunking when memory-based chunk size is degenerate Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove unreachable `else` (None) branch in `chunking_mode` match since `needs_memory_aware_chunking` already guarantees `max_chunk_memory_mb.is_some()` - Remove redundant `needs_memory_aware_chunking` guard in `Some(0)` branch of `calculate_memory_aware_chunk_size` since the caller already gates on this condition - Add clarifying comments for both simplifications Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Fixes a performance regression in stats chunk sizing by tightening when memory-aware chunking is used and adjusting chunk-size selection logic intended to balance memory constraints vs parallelism.
Changes:
- Switch
statsto only do sampling/memory-aware chunk sizing when non-streaming stats are enabled (avoids overhead for streaming-only stats). - Modify dynamic chunk-size selection logic in
util::calculate_dynamic_chunk_size. - Refactor mode/cardinality updates into a helper and adjust
typesonly + infer_booleanbehavior to compute only what boolean inference needs.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/util.rs | Updates dynamic chunk-size selection criteria (memory-based vs CPU-based). |
| src/cmd/stats.rs | Limits memory-aware chunking to non-streaming stats; refactors mode updates; tweaks typesonly/infer_boolean stats selection. |
Comments suppressed due to low confidence (2)
src/util.rs:1204
- In calculate_dynamic_chunk_size, the new preference condition selects cpu_based_chunk_size whenever memory_based_chunk_size <= cpu_based_chunk_size. That’s the “memory constrained” case where we generally need the smaller, memory-based chunk size to avoid high peak memory for non-streaming stats. As written, this can negate memory-aware chunk sizing and potentially reintroduce OOM risk. Consider restoring logic that prefers memory_based_chunk_size when it is smaller, and only prefers cpu_based_chunk_size when memory_based_chunk_size is at least as large (i.e., memory permits CPU-based chunking) and CPU-based yields better parallelization.
// Prefer CPU-based chunking if:
// 1. Memory-based chunk size is smaller than CPU-based (degenerate/low memory), OR
// 2. It creates more chunks (better parallelization)
if memory_based_chunk_size <= cpu_based_chunk_size || cpu_based_chunks > memory_based_chunks
{
cpu_based_chunk_size
} else {
memory_based_chunk_size
}
src/cmd/stats.rs:3200
- The new update_modes() helper is inserted between the doc comment for add() and the add() method itself, so Rust will attach the long "Adds a sample value…" doc comment to update_modes() instead of add(). As a result, add() now only has the trailing bullet doc lines and its main documentation is misplaced. Move update_modes() above the doc comment or below add(), and give update_modes its own brief doc comment (or a regular // comment) so docs render correctly.
/// # Safety
///
/// * Uses unsafe code for performance-critical operations
/// Updates modes/cardinality trackers with a sample value.
/// Weighted modes and unweighted modes are mutually exclusive.
#[inline(always)]
fn update_modes(&mut self, sample: &[u8], weight: f64) {
if let Some(ref mut wm) = self.weighted_modes {
// Weighted modes: accumulate weights per value
// Use get_mut first to avoid heap-allocating sample.to_vec() when key already exists
if let Some(val) = wm.get_mut(sample) {
*val += weight;
} else {
wm.insert(sample.to_vec(), weight);
}
} else if let Some(v) = self.modes.as_mut() {
v.add_bytes(sample);
}
}
/// * Assumes valid UTF-8 input for string operations
/// * Bounds checking is avoided where safe
#[allow(clippy::inline_always)]
#[inline(always)]
fn add(
…safety comment Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.