Commit 6d57023
feat: new
* feat: add `scoresql` command to analyze SQL query performance before execution
Analyzes SQL queries against stats, moarstats, and frequency caches of input CSV files
to produce a performance score (0-100) with actionable optimization suggestions.
Supports both Polars (default) and DuckDB query plan analysis.
Scoring covers type optimization, join cardinality, filter selectivity, data distribution,
and query anti-pattern detection (SELECT *, ORDER BY without LIMIT, cartesian joins, etc.).
Caches are auto-generated when missing.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(scoresql): address review findings — feature gate, dead code, robustness
- Fix test feature gate to require both `polars` and `feature_capable`
(matching main.rs registration), preventing test failures on polars-only builds
- Remove unused SqlInfo fields (_has_group_by, _referenced_tables,
_select_columns) and the extract_select_columns function
- Improve subquery detection: check for SELECT inside parentheses instead of
counting all SELECT occurrences, avoiding false positives from string literals
- Fix DuckDB plan table name substitution: sort replacements longest-first
to prevent partial matches (e.g., "data" matching inside "data2")
- Surface user-visible warnings (wwarn!) when cache generation fails,
not just log::warn, so users know scoring may be inaccurate
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(scoresql): harden subquery detection and deduplicate replacements
- Track single-quote state in subquery detector to avoid false positives
from SELECT appearing inside string literals (e.g. WHERE col = '(SELECT ...')
- Add word-boundary check after SELECT to prevent matching identifiers
like SELECTIVITY
- Deduplicate table-name replacements to avoid redundant substitutions
when a table name happens to equal an alias
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(scoresql): add pre-SELECT word boundary check and document quote handling
Add a preceding word-boundary check so identifiers like PRESELECT are
not falsely detected as subqueries. Also add a comment explaining why
SQL '' escaped quotes work correctly via toggle symmetry.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(scoresql): address Copilot review — stats cache, alias ordering, SQL parsing, regex
* Remove incorrect first-line skip in load_stats_cache (stats JSONL has no metadata header)
* Sort alias replacements by length descending to prevent partial matches (_t_1 inside _t_10)
* Add split_on_operators() to correctly parse column names from `col=value` patterns
* Use word-boundary regex in get_duckdb_plan instead of naive String::replace
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(scoresql): labeled break in join parsing, word-boundary regex in Polars plan
- Use labeled loop (`'on_clause`) in `extract_join_columns` so that
stop-keywords (WHERE, ORDER, etc.) break the outer token loop, not
just the inner operator-split loop.
- Apply word-boundary regex replacement in `get_polars_plan` (matching
the existing `get_duckdb_plan` strategy) to prevent alias partial
matches (e.g., alias "data" inside "metadata_col").
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(scoresql): address Copilot review round 2 — delimiter passthrough, cache freshness, dedup
* Use >= in is_cache_fresh to avoid churn on coarse-timestamp filesystems
* Deduplicate extracted join columns to prevent double-counting
* Pass --delimiter to qsv stats/frequency when generating caches
* Use canonical path for cache generation (matches cache lookup path)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(scoresql): make dedup truly case-insensitive and add ASCII delimiter assertions
- Use sort_unstable_by/dedup_by with case-insensitive comparison to match
the existing comment's claim of case-insensitive deduplication
- Add debug_assert!(delim.is_ascii()) guards in stats/freq cache generation
to document the ASCII delimiter assumption
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>scoresql cmd (#3612)1 parent 81d028c commit 6d57023
File tree
5 files changed
+1581
-0
lines changed- src
- cmd
- tests
5 files changed
+1581
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
87 | 87 | | |
88 | 88 | | |
89 | 89 | | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
90 | 95 | | |
91 | 96 | | |
92 | 97 | | |
| |||
0 commit comments