Skip to content

fix(stats): perf regression because of memory-aware chunking logic error#3598

Merged
jqnatividad merged 5 commits intomasterfrom
stats-fix-perf-regression-typesonly
Mar 10, 2026
Merged

fix(stats): perf regression because of memory-aware chunking logic error#3598
jqnatividad merged 5 commits intomasterfrom
stats-fix-perf-regression-typesonly

Conversation

@jqnatividad
Copy link
Collaborator

No description provided.

jqnatividad and others added 4 commits March 9, 2026 23:07
`--infer-boolean` was forcing full statistics computation (sum, dist,
online stats, string lengths, precision tracking) even when `--typesonly`
should limit work to type inference only. Now `which_stats()` no longer
enables sum/dist for typesonly, and `Stats::add()` has a targeted early
return that only computes minmax + cardinality — the minimum needed for
boolean inference — skipping ~60-70% of per-sample work.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add debug_assert! before unsafe unwrap_unchecked on minmax in the
typesonly+infer_boolean path to catch invariant violations in debug
builds. Extract modes/cardinality update logic into update_modes()
helper to eliminate duplication between typesonly and normal paths.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…g not needed

The outer chunking condition always evaluated true because
max_chunk_memory_mb defaults to Some(0), forcing memory-aware chunking
even for plain stats. This caused 1M single-record chunks on large CSVs
(massive thread scheduling overhead). Three fixes:
- Only enter memory-aware path when needs_memory_aware_chunking is true
- Add needs_memory_aware_chunking guard in Some(0) branch for defense-in-depth
- Prefer CPU-based chunking when memory-based chunk size is degenerate

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove unreachable `else` (None) branch in `chunking_mode` match since
  `needs_memory_aware_chunking` already guarantees `max_chunk_memory_mb.is_some()`
- Remove redundant `needs_memory_aware_chunking` guard in `Some(0)` branch of
  `calculate_memory_aware_chunk_size` since the caller already gates on this condition
- Add clarifying comments for both simplifications

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a performance regression in stats chunk sizing by tightening when memory-aware chunking is used and adjusting chunk-size selection logic intended to balance memory constraints vs parallelism.

Changes:

  • Switch stats to only do sampling/memory-aware chunk sizing when non-streaming stats are enabled (avoids overhead for streaming-only stats).
  • Modify dynamic chunk-size selection logic in util::calculate_dynamic_chunk_size.
  • Refactor mode/cardinality updates into a helper and adjust typesonly + infer_boolean behavior to compute only what boolean inference needs.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/util.rs Updates dynamic chunk-size selection criteria (memory-based vs CPU-based).
src/cmd/stats.rs Limits memory-aware chunking to non-streaming stats; refactors mode updates; tweaks typesonly/infer_boolean stats selection.
Comments suppressed due to low confidence (2)

src/util.rs:1204

  • In calculate_dynamic_chunk_size, the new preference condition selects cpu_based_chunk_size whenever memory_based_chunk_size <= cpu_based_chunk_size. That’s the “memory constrained” case where we generally need the smaller, memory-based chunk size to avoid high peak memory for non-streaming stats. As written, this can negate memory-aware chunk sizing and potentially reintroduce OOM risk. Consider restoring logic that prefers memory_based_chunk_size when it is smaller, and only prefers cpu_based_chunk_size when memory_based_chunk_size is at least as large (i.e., memory permits CPU-based chunking) and CPU-based yields better parallelization.
        // Prefer CPU-based chunking if:
        // 1. Memory-based chunk size is smaller than CPU-based (degenerate/low memory), OR
        // 2. It creates more chunks (better parallelization)
        if memory_based_chunk_size <= cpu_based_chunk_size || cpu_based_chunks > memory_based_chunks
        {
            cpu_based_chunk_size
        } else {
            memory_based_chunk_size
        }

src/cmd/stats.rs:3200

  • The new update_modes() helper is inserted between the doc comment for add() and the add() method itself, so Rust will attach the long "Adds a sample value…" doc comment to update_modes() instead of add(). As a result, add() now only has the trailing bullet doc lines and its main documentation is misplaced. Move update_modes() above the doc comment or below add(), and give update_modes its own brief doc comment (or a regular // comment) so docs render correctly.
    /// # Safety
    ///
    /// * Uses unsafe code for performance-critical operations
    /// Updates modes/cardinality trackers with a sample value.
    /// Weighted modes and unweighted modes are mutually exclusive.
    #[inline(always)]
    fn update_modes(&mut self, sample: &[u8], weight: f64) {
        if let Some(ref mut wm) = self.weighted_modes {
            // Weighted modes: accumulate weights per value
            // Use get_mut first to avoid heap-allocating sample.to_vec() when key already exists
            if let Some(val) = wm.get_mut(sample) {
                *val += weight;
            } else {
                wm.insert(sample.to_vec(), weight);
            }
        } else if let Some(v) = self.modes.as_mut() {
            v.add_bytes(sample);
        }
    }

    /// * Assumes valid UTF-8 input for string operations
    /// * Bounds checking is avoided where safe
    #[allow(clippy::inline_always)]
    #[inline(always)]
    fn add(

…safety comment

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jqnatividad jqnatividad merged commit 6aeb627 into master Mar 10, 2026
13 checks passed
@jqnatividad jqnatividad deleted the stats-fix-perf-regression-typesonly branch March 10, 2026 04:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants