feat(pragmastat): use stats cache to only process numeric/date/datetime columns#3593
Merged
jqnatividad merged 8 commits intomasterfrom Mar 9, 2026
Merged
feat(pragmastat): use stats cache to only process numeric/date/datetime columns#3593jqnatividad merged 8 commits intomasterfrom
jqnatividad merged 8 commits intomasterfrom
Conversation
Use the existing stats cache to opportunistically filter out non-numeric columns when running pragmastat without an explicit --select. Adds numeric_columns_from_cache() which reads a stats JSONL cache (if newer than the input file) and keeps only Integer/Float columns; it never triggers a stats run and only applies to path-based inputs. Integrates this filtering into read_columns and logs how many columns were skipped. Adds tests to verify cache filtering and that explicit --select bypasses the cache.
Use the stats cache to detect Date/DateTime columns and parse them as epoch milliseconds for pragmastat analysis. Point estimates (center, bounds) are formatted as RFC3339 dates; dispersion/shift values (spread, shift, bounds) are formatted as days with millisecond precision. Also expands the stats cache integration to include Date/DateTime columns alongside Integer/Float, filtering out only truly non-analysable types (String, Boolean, NULL). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…race Replace `get_stats_records` routing in `columns_from_cache` with direct JSONL file reading. The previous approach could trigger a full stats run under `QSV_STATSCACHE_MODE=auto` if the cache became stale between the manual freshness pre-check and the `get_stats_records` call. Reading the file directly truly guarantees no stats run, removes the dummy SchemaArgs, and adds a comment documenting the duplicate-column-name assumption. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…estamps
- Skip mixed Date/Numeric column pairs in two-sample and compare2 with a
warning, as comparing epoch-ms values against plain numbers is nonsensical
- Return empty string instead of 1970-01-01 for out-of-range timestamps in
fmt_timestamp (e.g. confidence intervals that overshoot valid date ranges)
- Use chrono's format("%Y-%m-%d") instead of fragile string slicing for dates
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…from_cache The JSONL cache line index already corresponds directly to the column index in the CSV headers, so re-opening the input file to read headers and matching by name was unnecessary. This simplifies the code and eliminates a potential edge case with duplicate column names. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR makes qsv pragmastat a “smart” command by leveraging the stats.csv.data.jsonl cache to (1) automatically exclude non-numeric columns when --select is not provided and (2) add Date/DateTime column support by converting parsed timestamps to epoch milliseconds for analysis and formatting results back as dates/datetimes (with spreads/shifts expressed in days).
Changes:
- Add opportunistic stats-cache loading in
pragmastatto filter analyzable columns and detect Date/DateTime types. - Implement Date/DateTime parsing + output formatting (timestamp formatting for center/bounds; day conversion for spread/shift).
- Add integration tests and update docs/README to reflect the new “smart” behavior; switch
pragmastatcrate to a patched git fork.
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
src/cmd/pragmastat.rs |
Loads stats cache for column filtering/type detection; parses Date/DateTime to epoch-ms; formats outputs for dates and day-based spreads/shifts. |
tests/test_pragmastat.rs |
Adds integration tests for stats-cache filtering, --select override behavior, and Date/DateTime output expectations. |
docs/PERFORMANCE.md |
Documents pragmastat as a stats-cache “smart” command and summarizes its cache usage. |
README.md |
Updates pragmastat description to mention stats-cache filtering and Date/DateTime support. |
Cargo.toml / Cargo.lock |
Pins pragmastat to a patched git fork/branch and updates lockfile accordingly. |
… validation - Increase DAY_DECIMAL_PLACES from 5 to 8 for actual millisecond precision (1ms / 86_400_000 ms-per-day ≈ 1.16e-8) - Round epoch-ms values before i64 cast in fmt_timestamp to avoid truncation - Validate cache record count matches header count in columns_from_cache, ignoring caches generated with --select Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.