Releases: dathere/qsv
16.1.0
[16.1.0] - 2026-02-15 π "The Accelerated Civic Intelligence (ACI) Release" π
Statistical analysis gets faster and more robust; User & Agent Experience (UAX) improvements keep the CLI parser, docs, shell completions, and MCP tool definitions in sync from a single source; and the qsv MCP Server gets leaner and smarter.
With a properly configured environment, a User can team up with several AI Agents for accelerated analysis of large, real-world, messy data β raw datasets, presentations, reports, spreadsheets, etc. β without uploading it all to the cloud or manually wrangling it into shape first. Analyzing in a few minutes, what would otherwise take a few days, if not a few weeks to compile.
π Major Features
New pragmastat Command
A pragmatic statistical toolkit by @AndreyAkinshin β Compute robust, median-of-pairwise statistics with the Pragmastat library. Designed for messy, heavy-tailed, or outlier-prone data where mean/stddev can mislead. See pragmastat.dev for details on the underlying algorithms and design philosophy.
Frequency Cache System
New --frequency-jsonl option for the frequency command creates a JSONL cache (analogous to stats --stats-jsonl) that accelerates repeated frequency analysis. Uses a hybrid strategy for high-cardinality columns with configurable thresholds.
Improved UAX: Unified Documentation & Shell Completions
A new docopt-based parsing system now generates markdown documentation, shell completions, and MCP tool definitions from the same USAGE text that powers qsv's CLI parsing. Everything stays in sync automatically β no more drift between help text, docs, completions and AI tooling.
--generate-help-mdflag produces polished markdown docs with section navigation, emoji legends, clickable URLs, and argument/option tables that are both Human and Agent-friendly.- Shell completions are now auto-generated, replacing 68 manually maintained completion files.
qsv MCP Server: Leaner Architecture
The qsv_pipeline tool has been removed in favor of direct sequential command execution. In practice, agents were already calling commands one at a time, and removing the pipeline abstraction made the server simpler, more predictable, and easier to debug. Additional MCP improvements include:
- Extended AI agent guidance to take advantage of frequency and stats caches
- Seamless support for Google Gemini CLI thanks to @kulnor's continuing contributions
- Major codebase refactoring: deduplicated helpers, extracted filesystem tools, fixed
anytypes, and various bug fixes
Detailed MCP changes are documented in the MCP CHANGELOG for full details.
Added
- feat:
pragmastatcommand β pragmatic statistical toolkit with parallelism, progress bar, and memcheck (by @AndreyAkinshin) - feat:
frequency --frequency-jsonlβ JSONL frequency cache with hybrid strategy for high-cardinality columns - feat:
--generate-help-mdflag β auto-generate markdown docs from USAGE text with section navigation, emoji legends, and clickable URLs - docs: add
QSV_FREQ_HIGH_CARD_THRESHOLDandQSV_FREQ_HIGH_CARD_THRESHOLD_PCTenv vars
Changed
- perf:
statsβ skip redundant modes tracking, reduce allocations, optimize cache line layout, deterministic antimode sorting - perf:
pragmastatβ reduce redundant computations, add parallelism - perf:
frequencyβ usesort_unstable_byfor faster sorting; parallel computation for high-cardinality columns - refactor: shell completions auto-generated from USAGE text (removed 68 manual files)
- refactor:
describegptβ disambiguate "Other" bucket from literal "Other" in Data Dictionary Examples column - deps: bump anstream from 0.6.21 to 1.0.0
- deps: bump futures to 0.3.32
- deps: bump jsonschema from 0.41 to 0.42
- deps: bump libc from 0.2.180 to 0.2.181
- deps: bump memmap2 from 0.9.9 to 0.9.10
- deps: bump polars to latest upstream
- deps: bump pyo3 from 0.28.0 to 0.28.1
- deps: bump quickcheck from 1.0.3 to 1.1.0
- deps: bump rand from 0.9 to 0.10, rand_hc to 0.5, rand_xoshiro to 0.8
- deps: bump sysinfo from 0.37.2 to 0.38.2
- deps: bump tempfile from 3.24.0 to 3.25.0
- deps: bump toml from 0.9.12 to 1.0.1
- deps: bump uuid from 1.20.0 to 1.21.0
- deps: bump zmij from 1.0.20 to 1.0.21
- deps: update csv patched fork MSRV to 1.93
Fixed
- fix:
frequencyβ normalize delimiter for cache compatibility; deterministic output with secondary sort key; hybrid cache for high-cardinality columns - fix:
statsβ remove unsafe block; deterministic antimode sorting - fix(help): section detection, acronym casing, and option word-wrap in markdown generation
Removed
- removed 68 manual shell completion files (now auto-generated from USAGE text)
Full Changelog: 16.0.0...16.1.0
16.0.0
[16.0.0] - 2026-02-08 π€ "The AI-Native Release" π€
This release makes qsv deeply AI-native β from smarter date detection that flows through to Polars schemas, to a MCP Plugin layer that lets AI agents wield qsv as a first-class data tool.
Claude Desktop, Code, and Cowork users can now use qsv's powerful data-wrangling capabilities directly within their AI workflows, with intelligent guidance and seamless integration. Google Gemini is now also supported thanks to @kulnor.
π Major Features
Smarter Date/DateTime Detection
qsv can now automatically detect date and datetime columns and carry that knowledge through the entire pipeline:
stats --dates-whitelist sniffis now the default β qsv sniffs the first 1000 rows to identify date/datetime field candidates for further guaranteed date/datetime type inferencingschemaauto-detects Date/DateTime columns when generating Polars schemas (.pschema.json)- DateTime type support in Polars schema parsing β temporal types are preserved through
sqlp,joinp, and Parquet conversion
Hardened Stats Cache
The stats cache system that accelerates frequency, schema, tojsonl, sqlp, joinp, pivotp, diff, and sample is now more robust:
- Simplified API: Removed
dataset_statsfromget_stats_records(), streamlining all downstream consumers - Safe fallback: Corrupted or unparsable cache files are gracefully handled instead of erroring out
- Auto-regeneration: Stats cache regenerates on parse error rather than failing
Enhanced MCP Server (16.0.0)
The qsv MCP Server receives its largest update yet β see MCP CHANGELOG for full details.
Breaking Changes
diffcommand:--forceoption removed- Was used for short-circuiting diffs based on dataset_stats
- No longer needed after stats cache API simplification
tocommand:parquetsubcommand removed- Use dedicated
qsv_to_parquetMCP tool orsqlpfor Parquet output
- Use dedicated
Added
- feat:
statsβ add 'sniff' support for--dates-whitelist - feat:
schemaβ auto-detect Date/DateTime columns for Polars schema via sniff - feat: Support DateTime type in Polars schema parsing
Changed
- refactor:
statsβ make--dates-whitelist sniffthe default - perf: Use foldhash HashMap/HashSet across codebase for faster hashing
- Replaces std::collections with foldhash in 14 modules
- foldhash is much faster than std::collections for non-crypto hashing
- refactor:
statsRemove dataset_stats from stats cache system- Simplified get_stats_records() API
- Centralized rowcount handling in sample command
- Adapted diff, pivotp, sample, and other commands to new API
- refactor:
statsStats cache now regenerates on parse error (improved robustness) - refactor:
statsSafe fallback on corrupted stats cache - refactor:
pivotpuse sparsity for suggestions and uniqueness_ratio for pivot heuristics - refactor:
samplelazily compute row_count only for sampling methods that need it - deps: bump async-compression to 0.4.39
- deps: bump bytes from 1.11.0 to 1.11.1
- deps: bump calamine to 0.33
- deps: bump csv-nose from 0.7.0 to 0.8.0
- deps: bump csvlens to latest upstream (PR merged)
- deps: bump geosuggest to latest upstream
- deps: bump flate2 from 1.1.8 to 1.1.9
- deps: bump jsonschema from 0.40.0 to 0.41 (latest upstream with unreleased perf improvements)
- deps: bump polars from 0.52.0 at py-1.38.1 tag to 0.53
- deps: bump pyo3 from 0.27.2 to 0.28.0
- deps: bump redis from 1.0.2 to 1.0.3
- deps: bump regex from 1.12.2 to 1.12.3
- deps: bump reqwest from 0.13.1 to 0.13.2
- deps: bump zerocopy from 0.8.35 to 0.8.36
- deps: bump zip from 6 to 7
- deps: bump zmij from 1.0.17 to 1.0.20
- deps: we now bundle Luau 0.708 from 0.706
- deps: bump @modelcontextprotocol/sdk (MCP)
- applied several clippy lint suggestions
- applied several GH Copilot and Claude review suggestions
Fixed
- fix:
frequencycolumn selection when using--selectoption in different order- Now lookup cardinality by column name instead of index
- Handles user-selected/reordered column subsets correctly
- fix:
samplehandle missing min weight in stats cache - fix:
validateadapt tests to jsonschema 0.40.2 error message format changes - fix:
joinpswitch pschema serialization to serde_json for compound type support - fix:
exceladjust jsonl path usage caused by calamine 0.33 release - fix:
statsreturn sentinel when sniff finds no date columns - fix:
configβQSV_NO_HEADERSenvironment variable being ignored; split no_headers into explicit setter and CLI flag method
Removed
- removed
to parquetsubcommand in favor of dedicatedqsv_to_parquetMCP tool andsqlpParquet output support - removed
cargo installinstructions from README as qsv is rarelycargo installable as it uses patched forks on a regular basis andcargo installdoesn't support git dependencies.
Full Changelog: 15.0.1...16.0.0
15.0.1
[15.0.1] - 2026-01-28
Ooops, we celebrated color and the magika-powered revamped sniff but forgot to actually enable them in the release prebuilts! π€¦π»ββοΈ
This patch enables the new color command, turns on magika, along with several fixes and dependency bumps.
Changed
- deps: bump polars to latest upstream
- deps: bump csv-nose from 0.6.0 to 0.7.0
- deps: bump mlua from 0.11.5 to 0.11.6
- deps: bump minijinja from 2.14.0 to 2.15.1
- deps: bump minijinja-contrib from 2.14.0 to 2.15.1
- deps: bump siphasher from 1.0.1 to 1.0.2
- deps: bump iana-time-zone from 0.1.64 to 0.1.65
- deps: bump hono from 4.11.4 to 4.11.7 (MCP)
- build: add
colorfeature to build and test workflows - build: add
magikafeature to publishing workflows - docs: updated luau documentation to reflect bundled Luau 0.706
- docs:
sniffis now also π€-powered with its use of Magika mime-type detection
Fixed
- tests: fix flaky
colortest_get_theme test (now ignored due to environment dependencies) - tests: fix flaky
searchJSON test by using semantic rather than byte-by-byte compare
Full Changelog: 15.0.0...15.0.1
15.0.0
[15.0.0] - 2026-01-26 ππ» "The Mind Meld Release" ππ½
This is the biggest release of qsv yet thanks to many expert contributions from the community!
- @kulnor's deep expertise in statistics and data standards has been instrumental in enhancing qsv's data analysis capabilities across the entire qsv suite! His well-crafted issue reports, detailed design proposals, thorough testing and detailed documentation on top of our weekly mind-melds have vastly improved commands like
frequency,stats,moarstatsanddescribegpt. His contributions and advocacy have been invaluable and I've learned a lot from him. - @ws-garcia's research on the Table Uniformity Method (TUM) - the algorithm behind the revamped
sniffcommand will be the linchpin behind our upcoming next-gen CKAN harvester. Though it took a while, our implementation is now complete and achieves 99.55% accuracy on the W3C-CSVW test suite. - @gurgeous' new
colorcommand contribution makes viewing CSVs in the terminal a joy! His attention to detail and design aesthetics have resulted in a command that is both functional and visually appealing, with more features on the way! - If you look at the recent commit history, you can see I went on a Claude-bender over the holiday break π€. Collaborating heavily with @claude (running Opus 4.5) appropriately enough, to build up qsv's Generative AI capabilities in
describegptand its US Census-aware MCP server.
π Major Features
An entire section courtesy of @kulnor's mind-melds.
Enhanced frequency Command
Powerful new filtering and display options:
--no-float: Exclude Float columns from frequency analysis--pct-nulls: Include NULL values in percentage calculations--null-sorted: Sort NULL values with other entries (not at end)--no-other: Exclude the "Other" aggregation category--null-text: Customize the NULL display text--stats-filter: Luau-based column filtering using statistics- Filter columns based on any stats field (nullcount, cardinality, type, etc.)
- Full Luau expression support for complex conditions
- Omit stats in JSON output when using
--weight
Enhanced describegpt Command
AI-powered data description gets smarter. Now optimized to work with LM Studio and openai/gpt-oss-20b out-of-the-box:
--frequency-options/--freq-opts: Pass options to underlying frequency command--enum-thresholdIntegration: Control enum constraint compilation thresholdsfile:Prefix Support: Load prompts from files withfile:my_prompt.txt- CLI Supersedes Environment Variables: Command-line options take precedence
- Updated LLM Base URLs: Current endpoints for major providers
- Robust Frequency Parsing: Better handling of frequency output formats
QSV_TEST_DESCRIBEGPT: Environment variable for testing describegpt features
Enhanced stats Command
- File Metadata in JSON: JSON output now includes source file information
- Removed
--dataset-stats: Statistics are now always populated (was optional flag)
Enhanced transpose Command
--selectOption: Select specific columns during transposition- Uses standard qsv select syntax
- Filter columns before wide-to-long transformation
Revamped sniff Command
Complete overhaul of CSV sniffing capabilities with state-of-the-art detection algorithms:
- csv-nose Integration: Replaced qsv-sniffer with csv-nose for more robust and accurate detection using @ws-garcia's TUM algorithm
- Magika-Powered Inference: Feature-gated integration with Google's Magika for advanced, AI-powered file type detection
- Inference labels for detected types
- Confidence scores for type predictions
- 1-Based Field Numbering: More intuitive field indexing
- Robust Remote URLs: Improved handling of remote CSV sources
- 'Unknown' Fallback: Graceful handling of undetectable data types
NEW: color Command by @gurgeous
A vibrant new command for displaying CSVs as colorized, pretty-printed tables:
- Pretty Tables: Transform your CSVs into beautiful, readable terminal output
- Row Numbers (
--row-numbers): Add line numbers for easy reference - Custom Titles (
--title): Add descriptive headers to your output - Color Themes (
--color): Choose from multiple color schemes - Placeholder Support: Configurable placeholders for empty values
- Environment Variables:
QSV_TERMWIDTH(max 1000) andQSV_FORCE_COLORsupport - Microoptimized: Fast rendering even for large datasets
Enhanced MCP Server
Major improvements to the Model Context Protocol server, making qsv even more AI-native:
Token Optimization π
- 66-76% token reduction in tool definitions
- Removed redundant
defaultsandtest_filefields from schemas - Streamlined tool and prompts for efficient LLM consumption
Tool Lazy Loading
- Tool Search: Dynamically discover available tools and load them as required
- Expose-All-Tools Mode: Option to expose the complete tool catalog
- Universal
--help: Even deeper help across all MCP-exposed commands if the Agent needs more information
Documentation & Integration
- Census Integration Guide: If you have the US Census' Official MCP Server installed, prime @claude to use it together with qsv efficiently to do deep research and analysis on data without overunning the context window.
- Updated Claude/MCP Documentation: Comprehensive Documentation
- qsv Prompts: Pre-built prompts for common data wrangling tasks
- SkillExecutor Unit Tests: Robust testing for skill execution
ποΈ Infrastructure & Quality
Testing
- Test suite expanded to 2,448 tests
- Comprehensive coverage for new MCP features
- SkillExecutor unit tests added
Documentation
- DeepWiki Badge: Added project documentation badge
- Emoji Legend: Added π₯οΈ for UI commands, Luau logos for scripting
- COMMAND_DEPENDENCIES.md: New comprehensive command dependency documentation (by @kulnor)
- Detailed Examples: Enhanced examples for numerous commands, formatted to be both human and AI-readable
- Magika in Version Metadata: File type detection engine now shown in version info
π¦ Dependencies
Major Updates
reqwest: 0.12 β 0.13jsonschema: 0.39 β 0.40crossterm: 0.28.1 β 0.29.0csv-nose: 0.2.0 β 0.6.0sysinfo: 0.37.2 β 0.38.0rust_decimal: 1.39.0 β 1.40.0
Minor Updates
zmij: 1.0.13 β 1.0.17flexi_logger: 0.31.7 β 0.31.8cmov: 0.4.3 β 0.4.5filetime: 0.2.26 β 0.2.27get-size2: 0.7.3 β 0.7.4hono: 4.11.3 β 4.11.4lodash: 4.17.21 β 4.17.23- Polars: Latest upstream
CI/Actions
actions/checkout: 4 β 6actions/setup-python: 6.1.0 β 6.2.0
Other
- Patched calamine fork with unreleased fixes
- MSRV: Rust 1.93
π Environment Variables
New
QSV_MCP_MAX_EXAMPLES: Maximum examples per MCP toolQSV_TERMWIDTH: Terminal width for color command (max 1000)QSV_FORCE_COLOR: Force color outputQSV_TEST_DESCRIBEGPT: Enable describegpt testing mode
Updated
QSV_PREAMBLE_ROWS: Enhanced preamble detection- Various
QSV_STATS_*andQSV_FORCE_*variables
Migration Notes
Breaking Changes
-
statscommand:--dataset-statsoption removed- Statistics are now always computed
- No migration needed if not using this flag
-
sniffcommand: Field numbering changed to 1-based- Scripts parsing field numbers may need adjustment
- More consistent with other qsv commands
Added
- feat: NEW
colorcommand for pretty-printed colorized tables by @gurgeous - feat:
frequencyadd--no-floatoption to exclude Float columns - feat:
frequencyadd--pct-nullsoption for NULL percentage calculations - feat:
frequencyadd--null-sortedoption for sorting NULL values - feat:
frequencyadd--no-otheroption to exclude Other category - feat:
frequencyadd--null-textoption for custom NULL display - feat:
frequencyadd--stats-filterfor Luau-based column filtering - feat:
describegptadd--frequency-options/--freq-optsoption - feat:
describegptadd--enum-thresholdintegration - feat:
describegptaddfile:prefix support for prompt files - feat:
statsadd file metadata to JSON output - feat:
transposeadd--selectoption for column selection - feat:
sniffintegrate csv-nose for improved CSV detection - feat:
sniffadd Magika-powered file type inference (feature-gated) - feat:
mcpadd Tool Search capability - feat:
mcpadd expose-all-tools mode - feat:
mcpadd universal--helpsupport - feat:
mcpadd subcommand enum support - feat:
mcpaddQSV_MCP_MAX_EXAMPLESconfiguration - docs: add COMMAND_DEPENDENCIES.md by @kulnor
- docs: add DeepWiki badge
- docs: add emoji legend for UI commands and Luau
- docs: add Census integration g...
14.0.0
[14.0.0] - 2026-01-12 π¦ "The qsv MCP for Everyone Release" π
Building on our 13.0.0 "AI-native Agent" release last week, qsv 14.0.0 is dedicated to making AI integration seamless, reliable, and easy for everyone.
Previously, installing the qsv MCP Server required a full-fledged development environment and familiarity with command line tools and was not readily usable by non-developers.
This release transforms the qsv MCP Server from a powerful developer tool into a user-friendly, transparently integrated Claude Desktop data-wrangling agent with robust cross-platform support, automatic updates, and comprehensive testing infrastructure.
MCP Desktop Extension (Bundle) - One-Click Installation
The new MCP Desktop Extension provides a streamlined installation experience for Claude Desktop users:
- User-Friendly Package - Pre-configured bundle with automatic qsv binary detection - and if not found, provide installation guidance1
- Cross-Platform Support - Works seamlessly on macOS, Windows, and Linux
- Smart Data-wrangling - it's deep knowledge of qsv insulates the User from the nitty-gritty details of the comprehensive toolkit with its hundreds of options, while ensuring fast, effective operations
- Token Efficient - Despite this deep knowledge, the MCP server is still token efficient by including intelligent contextual guidance to help Claude make optimal decisions (USE WHEN, COMMON PATTERNS, ERROR PREVENTION, PERFORMANCE HINTS prompt guidance along with lazy-loading of full qsv
--helptext when more info is required) - Security Enhanced - Raw Data is not sent to Claude, only statistical metadata2
- Welcome Experience - Includes prompts and examples to get started quickly
- Seamlessly works with both Claude Code and the just launched Claude Cowork! Take qsv beyond data-wrangling chats and unlock even greater potential with an agentic qsv.
The Desktop Extension follows the official MCP Bundle (MCPB) manifest specification v0.3, ensuring compatibility with Claude Desktop and future MCP-compatible applications.
See the MCP documentation for installation instructions.
Breaking Changes
- MCP Skills:
qsv-skill-genbinary removed - useqsv --update-mcp-skillsinstead (requiresmcpfeature flag)
Added
- feat: MCP Desktop Extension - user friendly installation of qsv MCP Server #3296
- feat: MCP Server: numerous QoL improvements to MCP Desktop Bundle #3298
- feat: MCP skills auto update #3292
- feat: MCP - add expert guidance, common patterns, MCP optimized descriptions & usage hints #3303
- feat: MCP skills generator now extracts performance hints (π indexed, π€― memory-intensive, π£ proportional memory) from README.md command table
- feat: MCP Server automatically enables --stats-jsonl flag for stats command to create cache for smart commands
- feat: MCP enhanced tool descriptions with intelligent guidance - USE WHEN, COMMON PATTERNS, ERROR PREVENTION hints
- feat: MCP parameter enhancements with examples for common options (selection, delimiter, etc.)
- feat: MCP comprehensive pipeline tool description with workflows and limitations
- feat: MCP enhanced filesystem tools (list_files, set_working_dir, get_working_dir) with usage guidance
- feat: MCP add auto-detection of qsv binary path for Desktop Extension 5c09672e
- feat: MCP various Quality-of-Life UI/UX improvements b5b338f6
- feat: MCP enhance Desktop Extension with validation and fixes e2e20551
- feat: MCP add prompts for welcome message and examples 2672a74b
- feat: Claude Code GitHub App integration - PR review and issue assistance workflows #3312
- tests: MCP add CI test workflow for qsv MCP server 8732fee3
- docs: MCP add comprehensive Claude Code (CLI) documentation 97a88c4e
- docs: MCP add an MCP Server-specific CLAUDE.md e7e5f9e1
- docs: add qsv pro download badges to README and update description #3295
- docs: add alt text to all download badges cc1c3819
- docs: add mise alternate installation documentation #3304
- docs: MCP update skills markdown documentation #3308
- docs: add MCP Server environment variables section to ENVIRONMENT_VARIABLES.md & dotenv.template
Changed
- refactor: MCP Server - removed applydp command (datapusher+ specific, not needed for general use)
- refactor: MCP use qsv --update-mcp-skill instead of separate qsv-skill-gen binary 13380ba1
- refactor: MCP remove qsv-skill-gen binary, make it an option in qsv gated behind
mcpfeature flag 9c771ee6 - refactor: MCP more robust output processing - use temp output file and stdout intelligently #3291
- refactor: MCP qsv-skill-gen.rs to preserve positional docopt args when generating skills JSON file 9618a25c
- refactor: MCP make output/temp file processing smarter 207274c7
- refactor: MCP use directory type for filesystem config to clarify restricted access 9650fb41
- refactor: MCP added null checks before iterating arrays 2d0747ab
- refactor: MCP fixed TS output directory to account for prod and test builds b0b12a40
- refactor: MCP address all issues identified during Copilot review 27027e50
- refactor: MCP optimize tokens use - extract concise command descriptions from README #3307
- refactor: MCP fine-tune
selectguidance 37964123 - docs: with MCP fully implemented - update the logo to make the horse robotic 33f3b9f5
- docs: comprehensive STATS_DEFINITION.md update b443ccc4
- chore: address valid robustness issues in last Copilot review 55a5a300
- chore: delete CITATION.cff file and just depend on Zenodo integration which auto-assigns a DOI on release 9b981b8c
- deps: bump polars to 0.52.0 at py-1.37.1 tag 3bbad1ea
- deps: bump atoi_simd and calamine c7cd928f
- deps: bump data-encoding from 2.9.0 to 2.10.0 09bf3c33
- deps: bump unicase from 2.8.1 to 2.9.0 99f66a3b
- deps: bump csvlens to 15.1 and remove our patched fork d588e36e
- deps: use latest csvlens with marked row export fd706255
- deps: bump blake3 to 1.8.3 and remove our patched fork 05f0efbb
- deps: bump toml from 0.9.10+spec-1.1.0 to 0.9.11+spec-1.1.0 2330b1d2
- deps: bump zerocopy from 0.8.32 to 0.8.33 950564d1
- build(deps): bump serde_json from 1.0.148 to 1.0.149 #3290
- build(deps): bump @modelcontextprotocol/sdk from 1.25.1 to 1.25.2 #3293
- build(deps): bump indexmap from 2.12.1 to 2.13.0 #3294
- build(deps): bump libc from 0.2.179 to 0.2.180 #3299
- build(deps): bump zmij from 1.0.12 to 1.0.13 #3305
- build(deps): bump actions/checkout from 4 to 6 #3309
- build(deps): bump actions/setup-node from 4 to 6 #3310
- deps: bump nightly from 2025-10-24 to 2026-01-09; same as polars f77ea524
- bumped several indirect dependencies
- applied select clippy & Codacy suggestions
- applied several GH Copilot and Claude review suggestions
- bumped nightly from 2025-10-24 to 2026-01-09, same as polars
Fixed
- fix:
statsuse .get() instead of [] indexing to avoid panics on missing keys when using old stats cache file #3306 - fix: MCP force add tsconfig.json #3301
- fix: MCP correct manifest.json to match official spec v0.3 c783cf2c
- fix: MCP expand template variables in config paths 3177cfe1
- fix: MCP address Copilot review issues in package-mcpb.js ec37b7c7
- fix: MCP replace execSync with execFileSync for security reasons 5209c751
- fix: MCP add promise-based deduplication for metadata cache to prevent race conditions https...
13.0.0
[13.0.0] - 2026-01-06 π¦Ύ "The Statistical Data-Wrangling Agent Release" π€
We welcome 2026 with qsv 13.0.0 - a major milestone that transforms qsv into an AI-native Agent!
This is in addition to the online AI-Chatbot for CKAN portals we released last September and the expanded describegpt command we released last month as we continue our march towards even more AI/ML/Graph/FAIR and Data Librarian/Concierge/Advisor/Analyst capabilities across the datHere suite in the coming months as we embark on a strategic partnership with the Open Knowledge Foundation to Strengthen Open, FAIR, AI-Ready Data Infrastructure powered by CKAN.
This release introduces first-class support for AI agents through three major new capabilities:
MCP Server - Model Context Protocol Integration
qsv now ships with a built-in Model Context Protocol (MCP) Server enabling seamless integration with AI Chatbots starting with Claude Desktop.
- Local Data - Its "zero-copy" inspired approach allows you to wrangle very large datasets - WITHOUT sending raw data1, only sending statistical metadata to Claude! This is not only good for security and privacy reasons - it overcomes Claude's upload size limit, saves tokens and improves performance!
- 22 MCP Tools: 20 common qsv commands as individual tools + 1 generic tool to access all other 46 commands + 1 pipeline tool
- Natural Language Interface: No need to remember command syntax
- Pipeline Support: Chain multiple operations together seamlessly
See the MCP documentation for detailed setup instructions.
Claude Agent SDK Helper Utilities
New Agent Skills infrastructure provides:
qsv-skill-genCLI - Generate skill definitions for AI agents- Parses qsv USAGE text using qsv-docopt to generate JSON skill definitions. This allows quick update of Agent Skills as commands and options are added & modified.
- Shell-safe example generation with proper quoting
- Comprehensive documentation for AI agent integration to integrate qsv into your own AI solutions!
moarstats - Massive Statistical Expansion
The moarstats command received substantial enhancements, adding 24+ MOAR statistical measures:
Advanced Univariate Statistics:
- Bimodality Coefficient - Detect multimodal distributions
- Normalized Entropy - Scaled information content measure (0-1)
- Atkinson Index - Inequality measure with configurable epsilon parameter
Bivariate Statistics:
- Pearson's correlation - Linear correlation coefficient
- Spearman's rank correlation - Monotonic relationship measure
- Kendall's tau - Concordance-based correlation
- Covariance - Joint variability measure
- Mutual Information - Information-theoretic dependency
- Normalized Mutual Information - Scaled mutual information (0-1)
- Multi-dataset joins -
--join-inputsfor bivariate analysis ACROSS datasets
XSD Type Mapping:
- Automatic inference of W3C XML Schema Definition (XSD) datatypes
- Smart XSD Gregorian date type inferencing with "quick" and "thorough" modes (#3259)
- Support for gYear, gMonth, gDay, gMonthDay, gYearMonth validation
See STATS_DEFINITIONS.md for a comprehensive list of the ~100 statistical metrics qsv compiles!
Breaking Changes
lens: Default behavior changed to NOT stream from stdin (use explicit flag if needed)moarstats: Output now includes additional columns (xsd_type, bivariate stats)
Added
- feat: qsv MCP server #3269
- feat:
MCP- expanded file selector for more supported tabular file formats; auto index for files larger than 10mb #3278 - feat: added Claude Agent Skills SDK support π€ #3264
- feat:
moarstatsadd "xsd_type" column #3242 - feat:
moarstatsadd Atkinson Index with configurable inequality aversion parameter, Normalized Entropy & Bimodal Coefficient #3243 - feat:
moarstatsadd bivariate stats #3247 - feat:
moarstatsadd normalized mutual info #3256 - feat:
moarstatsadd--forceand--jobsoptions #3253 - feat:
moarstatsadd "xsd_subtype" Gregorian date data types inferencing with--xsd-gdate-scanhaving fast (default) and comprehensive modes #3259 - feat:
qsvdpenable join command that moarstats uses #3252 - docs: added comprehensive stats documentation #3240
Changed
- refactor:
describegpt- consolidate JSON response parsing; cache handling; and make DuckDB & Polars error handling more consistent #3241 - refactor:
frequencyreduce duplication introduced by--weightoption #3236 - perf:
frequencyprecomputeother_prefixfor performance 2dc75ee - perf:
frequencysimplifyapply_limits*helper functions f0b7f9c - perf:
pivotpconvert directly toPlSmallStrfor performance b7dbb3f - refactor
MCP Serverto optimize for Local Access to Files #3272 - refactor:
MCP Serverimprovements #3274 - refactor:
MCP Serverremove examples from ci tests #3277 - refactor:
MCP Serveradd LIFO converted cache #3280 - refactor:
MCP Servermoar refactoring after tests #3282 - perf:
moarstatsmuch faster bivariate calculation #3248 - perf:
moarstatsoptimize non-streaming bivariate stats compilation #3250 - refactor:
qsv Skills Agent#3267 - deps: polars bump to rev c241260 #3276
- build(deps): bump itoa from 1.0.16 to 1.0.17 by @dependabot[bot] in #3239
- build(deps): bump human-panic from 2.0.4 to 2.0.5 by @dependabot[bot] in #3234
- build(deps): bump human-panic from 2.0.5 to 2.0.6 by @dependabot[bot] in #3249
- build(deps): bump libc from 0.2.178 to 0.2.179 by @dependabot[bot] in #3265
- build(deps): bump redis from 1.0.1 to 1.0.2 by @dependabot[bot] in #3232
- build(deps): bump rfd from 0.16.0 to 0.17.0 by @dependabot[bot] in #3279
- build(deps): bump rfd from 0.17.0 to 0.17.1 by @dependabot[bot] in #3284
- build(deps): bump serde_json from 1.0.147 to 1.0.148 by @dependabot[bot] in #3238
- build(deps): bump serial_test from 3.2.0 to 3.3.0 by @dependabot[bot] in #3273
- build(deps): bump serial_test from 3.3.0 to 3.3.1 by @dependabot[bot] in #3275
- build(deps): bump tokio from 1.48.0 to 1.49.0 by @dependabot[bot] in #3266
- build(deps): bump url from 2.5.7 to 2.5.8 by @dependabot[bot] in #3286
- build(deps): numerous bumps zmij from 0.1.7 to 1.0.12
- bumped several indirect dependencies
- applied select clippy & Codacy suggestions
- applied several GH Copilot and Claude review suggestions
Fixed
- fix: refresh_cpu_all() -> refresh_cpu_list(sysinfo::CpuRefreshKind::nothing())β¦ #3261
- fix:
statsremove redundant check 0977ebf - fix:
moarstatscorrectkendall_tauformula cf16543 - fix:
describegptandutil::run_qsv_cmd- add special case forsampleas it expects output differently 6b6039f - fix: CVE-2025-66414 security vulnerability GHSA-w48q-cv73-mx4w
- fix: RUSTSEC-2026-0001 (rkyv bump) c2d4937
- typo: Portugese β Portuguese
- typo: stats asummes β assumes
AI Contributors
- @jqnatividad collaborated with and orchestrated @Copilot, Claude Code, Cursor and Gemini using various models
Full Changelog: 12.0.0...13.0.0
12.0.0
[12.0.0] - 2025-12-24 π
Stuff your virtual stocking and jingle your data bells - qsv 12.0.0 slides down the chimney packed fuller than Santaβs sleigh! Unwrap delightful surprises like the shiny new moarstats command, gift-wrapped weighted statistics, and AI-powered FAIR metadata inferencing now speaking in multiple languages (no elf translation required). As the star on top, meet TOON - the brand new LLM-optimized, token-efficient format - ready to sleigh your AI projects all through 2026. Ho-ho-hold my data, this updateβs a festive feast!
Special thanks to @kulnor for advocating, brainstorming & testing many of the new features below!
π Major Features
NEW: moarstats Command
A powerful new command for "moar" advanced statistical analysis, providing statistics beyond what the stats command offers:
-
Comprehensive Statistics: Over 50+ advanced statistical measures including:
- Detailed outlier analysis (count, sum, average)
- Winsorized and trimmed means (5%, 10%, 20%, 25%)
- Multiple dispersion measures (IQR to range ratio, quartile coefficient of dispersion)
- Distribution statistics (skewness, multiple kurtosis measures)
-
Advanced Option (
--advanced): Access computationally intensive statistics:- Gini coefficient for inequality measurement
- Excess Kurtosis to measure "tailedness" of the distribution
- Shannon Entropy for data diversity analysis
-
Available on all binary variants for universal access
Enhanced describegpt Command
Major enhancements to AI-powered data description capabilities:
-
β©οΈ Minijinja Template Engine Integration:
- Custom prompt templating with full Minijinja and Minijinja-contrib filters
- More powerful and flexible prompt customization
-
Multilingual Support:
--languageoption for generating descriptions in any language/dialect- Languages: Spanish, Portuguese, Italian, Japanese, Hindi, Arabic
- Dialects: Franglais, Taglish, Pennsylvania Dutch
- Constructed Languages: Klingon, High Valyrian, Quenya
- Personalities: Snoop Dog, Hans Rosling, Christopher Walken
- Personas: Gen Z Slang, Silly, Emoji-loving Santa
- Automatic language detection in
--promptmode - SQL comments also generated in requested language
-
Advanced Features:
--addl-columnsoption with detailed attribution and system metadata--export-prompt <file>to save the default prompts to the specified file.
This file can then be tailored and used with the--prompt-file <file>option.- Iterative, session-based SQL RAG with
--promptoption - Sampling in prompt mode for better SQL generation
- Lookup table and CKAN support for controlled vocabularies
- Convenience values for
--addl-cols-list
(i.e., "everything", "everything!", "moar", "moar!")
Weighted Statistics Support
Comprehensive weighted statistics implementation across multiple commands:
-
stats Command (
--weight <column>):- Weighted mean, standard deviation, variance
- Weighted MAD (Median Absolute Deviation) and percentiles
- Weighted modes and antimodes
- Weighted harmonic and geometric means
- All weighted calculations handle non-finite values gracefully
-
frequency Command (
--weight <column>):- Weighted frequency distributions
- Proper handling of weighted "Other" and "ALL UNIQUE" category
- Non-finite weights automatically skipped
Token Object Oriented Notation (TOON) Format Support
-
A compact, human-readable encoding of the JSON data model for LLM prompts
-
Commands Supporting TOON:
describegpt --format TOONfrequency --toon
-
Benefits: More readable than JSON, easier to parse than CSV for hierarchical data
and more token-efficient, terse format targeted for LLMs
stats Command Enhancements
-
Percentile Improvements:
--percentile-listspecial values: "deciles" and "quintiles"- Percentile labels now include prefix before value (e.g., "p50: 42.5")
- Validation of percentile-list on startup
-
New Columns: Added
n_countsfor more detailed count information -
Performance Optimizations:
- Optimized Stats struct layout
- Eliminated redundant, unnecessary sorting
- Removed redundant filtering for weighted stats functions
- Microoptimizations throughout
transpose Command
- New
--longOption: Transform data from wide to long format- Column selection support using select syntax
- Streaming implementation per GitHub Copilot review suggestions
diff Command
- upgraded csv-diff from 0.1.1 to faster 0.1.2, improving performance
in optimal cases by up to 25% π
lens Command
- Aligned
--no-streaming-stdinbehavior with csvlens upstream
π Output Format Changes
schema Command
- Updated
$schemafrom Draft 7 to JSON Schema Draft 2020-12
β‘ Performance Improvements
suite-wide
- replaced already fast ryu float to string conversion crate crate with even
faster zmij crate (https://vitaut.net/posts/2025/faster-dtoa/)
stats Command
- Optimized Stats struct memory layout
- Eliminated redundant sorting operations
- Removed unnecessary clone operations
- Better handling of real-world data (assumes no infinity values)
frequency Command
- Microoptimizations for faster frequency computation
- Optimized top_n/bottom_n retrieval
π Bug Fixes
frequency Command
- Fixed behavior when compiling weighted frequencies with
ALL_UNIQUE - Fixed issue where "Other (0),0,0,0" could appear in output
- Proper handling of non-finite weights (automatically skipped)
ποΈ Infrastructure & Quality
Testing
- Test suite expanded from 2,060 to 2,380 tests
- Comprehensive test coverage for all new features
- Weighted statistics thoroughly tested
- Advanced moarstats options validated
Code Quality
- Extensive GitHub Copilot review integration
- Multiple refactoring passes for code clarity
- Clippy suggestions incorporated throughout
- Better error handling and edge case management
FAIR Principles
- Added CITATION.cff (by @rzmk) for academic citation
- Added Zenodo DOI badge for dataset citation
- Enhanced FAIRification of qsv as a research tool
π Documentation Improvements
Statistical Documentation
- Comprehensive documentation for statistics produced by stats command (by @kulnor) WIP
- Enhanced usage text for stats, frequency, and moarstats
- Better examples throughout documentation
Command Documentation
- Updated describegpt with multilingual examples
- Added controlled tag vocabulary examples
- Enhanced TOON format documentation
- Better SQL RAG workflow documentation
Migration Notes
Breaking Changes
-
schema command:
$schemaoutput changed from Draft 7 to Draft 2020-12- Most schemas should be compatible
- Validation tools must support JSON Schema Draft 2020-12
-
stats command: Output now includes percentile label prefixes
- Example: "p50: 10" of the 50th percentile value instead of just the value "10"
- May affect parsing scripts that expect raw numbers
Added
- feat:
describegptadd--add-colsand--addl-cols-list <list>options #3179 - feat:
describegptadd--languageoption #3184 - feat:
describegptuse minijinja engine for prompt processing #3188 - feat:
describegptadd language autodetection in--prompt(chat) mode #3193 - feat:
describegptsampling in prompt mode for better SQL generation⦠#3198 - feat:
describegptadd --prompt sessions for iterative SQL RAG refinement #3200 - feat:
describegptadd TOON format support #3205 - feat:
frequencyadd TOON format #3206 - feat:
frequencyadd weighted frequencies #3218 - feat: add new
moarstatscommand #3207 - feat:
moarstatsadd even moar! Now with detailed outliers info! #3208 - feat:
moarstats- add configurable ...
11.0.2
[11.0.2] - 2025-12-08
qsv 11.0.2 brings significant enhancements to larger-than-memory data processing, AI-powered metadata inferencing, JSON Schema inferencing & validation, and data viewing capabilities, along with important bug fixes and performance improvements.
All in preparation for at-scale, secure, interactive, "zero-copy" "Data Steward-in-the-Loop" FAIRification on the desktop in qsv pro.
π Major Features
stats & frequency
- Larger than Memory Files:
stats&frequencycan now handle arbitrarily large files, even when "advanced" statistics are enabled with its new dynamic parallel chunk sizing algorithm! (example stats, frequency) - N Counts: Added "n_counts" (
n_negative,n_zeroandn_positive) columns tostatsoutput for more detailed count information for numeric fields.
describegpt
The describegpt command has received substantial improvements for AI-powered metadata inferencing:
-
"Neuro-Procedural" Data Dictionaries: combines deterministically computed statistics and frequency distribution data with AI-inferred Human-Friendly Labels and Descriptions to compile an expanded Data Dictionary (not quite "neuro-symbolic" (YET!))
-
Chat with your Data!: Improved DuckDB and Polars SQL guidance mean more reliable transformations of your Natural Language queries to SQL - leading to fast, deterministic, reproducible, hallucination-free answers! (example, SQL result)
-
Format Option: Replaced
--jsonflag with--formatoption for more flexible output formatting- Supports multiple output formats - Markdown (default), TSV and JSON
- Removed
--jsonloption for cleaner API
-
Controlled Tag Vocabulary: New tag vocabulary system for consistent categorization
--tag-vocaboption to specify controlled vocabulary- Lookup support for tag vocabularies - retrieve a tag vocabulary from a local or remote CSV
usinghttp://,https://,dathere://andckan://URL schemes.
-
Enhanced Boolean Inference:
--infer-booleanis now enabled by default for better data type detection -
Performance Metrics: Added elapsed time tracking to monitor processing duration
-
Improved Prompt Templates: Updated default description prompt with PII/PHI alerts and better attribution metadata
schema & validate
Enhanced JSON Schema inference and validation capabilities:
-
Strict Formats: New
--strict-formatsoption for stricter JSON Schema format validation,
enforcing JSON Schema format constraints for email, hostname & IP address (IPV4/IPV6) formats. -
Output Option: New
--outputoption for specifying schema output destination- Polars schema now uses consistent naming conventions across commands
- Updated
joinp,pivotp, andsqlpcommands to use new.pschema.jsonnaming convention
-
Configurable Email Validation:
validatehas numerous options to tweak email validation
- taking advantage ofschema's email format constraint inferencing.
sample time-series sampling
A new --timeseries sampling method with grouping (hourly, daily, weekly),
adaptive sampling (prefer business hours or weekends) with various aggregation (mean, sum, min, max)
within each interval with configurable starting points (first, last or random).
lens "real-time" Features
Enhanced CSV viewing capabilities with csvlens integration:
-
Auto-Reload: New
--auto-reloadoption to automatically reload file when it changes- Useful for monitoring live data files
-
Streaming stdin: New
--streaming-stdinoption for real-time data viewing- Supports viewing data as it's being piped in
-
Row Marking: Updated csvlens dependency with row marking feature
Breaking Changes
describegpt:--jsonflag replaced with--formatoptiondescribegpt:--jsonloption removedschema,joinp,pivotp,sqlp: Updated Polars schema naming conventions
(existing workflows should work but output format may differ slightly)
Added
- Created Event Logo Archive with AI-generated seasonal/version logos
describegpt: add controlled vocabulary support for tags #3122describegpt: add elapsed time #3168describegpt: add lookup support #3170excel: add--celloption #3133frequency: add dynamic parallel chunk sizing #3135lens: add--auto-reloadoption #3128lens: add--streaming-stdinoption #3171sample: add timeseries sampling options #3130schema: infer addl JSON Schema predefined formats - email, ipv4, ipv6, hostname #3125schema: add--outputoption and standardize Polars Schema file name #3126stats: dynamic parallel chunk sizing with indexed files #3134stats: add n_negative, n_zero, n_positive count columns #3157validate:add email validation options #3148tests: add tests for https://100.dathere.com/lessons/4 by @rzmk in #3151- Added Claude AI guidance for contributors
- Enhanced
--versionoutput with more comprehensive system metadata
Changed
- refactor:
describegptimprove tags inferencing with Tag Vocabulary #3139 - feat:
describegpt- major refactor #3143 - feat:
describegptimproved Polars SQL processing #3147 - feat:
describegptreplace--jsonoption with--formatoption supporting 3 formats - markdown, json and TSV; remove--jsonloption #3167 - refactor:
frequency&stats- parallel chunk sizing - allow forcing of cpu based chunking #3138 - Align partition stdin handling with split/stats pattern by @Copilot in #3162
- deps: use latest polars upstream with new SQL fixes and features (pola-rs/polars@e1be17f)
- build(deps): bump actions/setup-python from 6.0.0 to 6.1.0 by @dependabot[bot] in #3120
- build(deps): bump actix-web from 4.12.0 to 4.12.1 by @dependabot[bot] in #3127
- build(deps): bump flate2 from 1.1.5 to 1.1.7 by @dependabot[bot] in #3159
- build(deps): bump jsonschema from 0.37.1 to 0.37.2 by @dependabot[bot] in #3129
- build(deps): bump jsonschema from 0.37.2 to 0.37.3 by @dependabot[bot] in #3131
- build(deps): bump jsonschema from 0.37.3 to 0.37.4 by @dependabot[bot] in #3140
- build(deps): bump log from 0.4.28 to 0.4.29 by @dependabot[bot] in #3150
- build(deps): bump minijinja from 2.12.0 to 2.13.0 by @dependabot[bot] in #3142
- build(deps): bump minijinja-contrib from 2.12.0 to 2.13.0 by @dependabot[bot] in #3141
- build(deps): bump pyo3 from 0.27.1 to 0.27.2 by @dependabot[bot] in #3137
- build(deps): bump qsv-stats from 0.40.0 to 0.41.0 by @dependabot[bot] in #3136
- build(deps): bump qsv-stats from 0.41.0 to 0.42.0 by @dependabot[bot] in #3156
- build(deps): bump qsv-stats from 0.42.0 to 0.43.0 by @dependabot[bot] in #3169
- build(deps): bump rfd from 0.15.4 to 0.16.0 by @dependabot[bot] in #3121
- build(deps): bump uuid from 1.18.1 to 1.19.0 by @dependabot[bot] in #3146
- Improved qsvpy build process for Apple Silicon
- Updated GitHub Actions workflows for better reliability
- bumped several indirect dependencies
- applied select clippy & Codacy suggestions
- Improved dependency version management
- Better feature flag handling
Fixed
- fix:
applypanic on empty selection #3165 - fix: more robust snappy and file extension detection #3166
- fix:
partitionadd proper stdin handling regression introduced when--limitoption was added #3161 - Fix broken layout of environment variable documentation by @tmtmtmtm in #3163
Removed
New Contributors
- @Copilot made their first contribution in #3162
*...
10.0.0
[10.0.0] - 2025-11-23
Highlights:
- Enhanced Data Dictionary:
describegptnow features an expanded default prompt (v4.0) that generates more comprehensive data dictionaries. - Parallel Search/Replace Operations:
search,searchset, andreplacecommands now support parallel execution when working with indexed CSV files, delivering significant performance improvements for large datasets. - Search/Replace Exact Match Options: Added
--exactoption tosearch,searchset, andreplacecommands for precise string matching without regex patterns. - Enhanced SQL Capabilities:
sqlpnow supports arbitrary expressions in SQL JOIN constraints, named window references, and new SQL functions includingrow_number,rank,dense_rank, andarray_to_string. - Improved
pivotpPerformance: Updated to use Polars' new lazy pivot API with--maintain-orderflag for predictable output ordering. - Luau 0.701: Updated embedded Luau from 0.697 to 0.701 with additional pattern matching documentation and tests.
Added
search&searchset: add--exactoption for literal string matching #3094search: parallel search when file is indexed #3096searchset: parallel execution when indexed #3097replace: add--exactoption e73d9bfreplace: parallel execution when indexed #3098sqlp: added support for arbitrary expressions in SQL JOIN constraints d47c44e & 0d2402bsqlp: added support forrow_number,rank, anddense_rankSQL window functions #3115sqlp: added support for named window references #3118sqlp: added support forarray_to_stringlist evaluation 64cbf34pivotp: added--maintain-orderflag for predictable output ordering 02dca12describegpt: default-prompt-file v4.0 with expanded Data Dictionary generation 4db0d18luau: expanded documentation for string functions using pattern matching a7344e3 & 2dcc9a4util::mem_file_check: added platform adjustment factor 421be84- benchmarks: v7.0 added search & searchset indexed parallel benchmarks 55df784
- benchmarks: v7.1.0 added replace_indexed_parallel benchmark 05c89d8
Changed
describegpt: refactored for improved reliability 1433bf1 & b6190a4frequency: special rank of 0 now assigned to<ALL_UNIQUE>rows effa13bfrequency: microoptimizations 775bb88 & 29ec7afsearch,searchset&replace: now parallelizable with an index, with significant performance improvements 45fc83dsearch: use faster, non-allocatingpar_sort_unstable_by_keyfor improved performance 5f50f23search: optimize--quickoption 1fc1b85search:--preview-matchoption forces sequential search 017ca6fsearch,searchset&replace: sort chunks instead of raw data for better performance 5b58cb8searchset: microoptimizations for performance c4ce324replace: remove unneeded index rebuild logic cfdba60pivotp: refactored to adapt to Polars' new lazy pivot API #3102excel: microoptimize hot loop and formula retrieval f141c1b & 17780b5stats: cache repetitive expensive env_var access in hot path a6ad0cestats: multiple microoptimizations 2f41c33 & 9bf43e5 & 00958a1validate: updated to jsonschema 0.37.x with improved error handling f45693d & c7ad5d2 & b9ea447luau: updated embedded Luau from 0.697 to 0.701 8885dce- deps: bump polars to latest upstream with numerous SQL and LazyFrame improvements
- deps: bump jsonschema from 0.34 to 0.37.1
- deps: bump syn from 2.0.109 to 2.0.110 d207524
- deps: bump quick-xml from 0.38.3 to 0.38.4 11a5ae4
- deps: bump geosuggest-core from 0.8.1 to 0.8.2 baf3194
- deps: bump geosuggest-utils from 0.8.1 to 0.8.2 c5bcd1b
- deps: bump governor from 0.10.1 to 0.10.2 b0068ef
- deps: bump gzp from 2.0.1 to 2.0.2 2a0b901
- deps: bump indexmap from 2.12.0 to 2.12.1 afa9c1f
- deps: bump mlua from 0.11.4 to 0.11.5 49eedb9
- deps: bump signal-hook-registry from 1.4.6 to 1.4.7 5c2e705
- deps: bump calamine to 0.32 (removed git dependency) 449f162
- deps: bump cached to latest upstream (removed patched fork) 508d1ce
- deps: bump actions/checkout from 5 to 6 f76e009
- deps: removed hashbrown patched fork ad30460
- deps: removed grex patched fork 88cd3fc
- deps: updated Cargo.lock file multiple times with indirect dependency updates
- docs: updated rust-version requirement to 1.91 c288d4d
- docs: prebuilt binaries on Linux and Windows x86_64 are no longer compiled with target-cpu=native 5f892a1
- docs: expanded note about Illegal Instruction (SIGILL) faults and portable builds e4df784
- docs:
describegptupdate with expanded Data Dictionary example and link to defaults d722afd & cedcd41 & bba4f76 - applied select clippy lint suggestions
- bumped several indirect dependencies
Fixed
count: should still work with "broken" CSVs when polars feature is enabled #3104describegpt: more robust SQL escaping to prevent SQL injection e958329excel: formula retrieval bug on error b894515excel: reverted mistaken alloc optimization for trim path b37361aindex: added check to confirm that only uncompressed CSV files can be indexed 1be485bsqlp: unnest workaround for test compatibility 54d079bsqlp: corrected array_to_string test 6c661ac- docs: fixed typo
QSV_MEMORY_HEADROOM_PCT->QSV_FREEMEMORY_HEADROOM_PCTf15d03e
Removed
- deps: removed polars crates (
polars-utils,polars-ops) that are no longer needed a7785f6 - publish: removed target-cpu=native as it causes SIGILL on GitHub Action Runners fd74f8f
Full Changelog: 9.1.0...10.0.0
9.1.0
[9.1.0] - 2025-11-03
FAIRification continues to be a focus, as we tweak key commands that enable us to FAIRify raw data at blazing speed:
frequencyreceived significant updates in this release, including several new options that make compiling frequency distribution tables easier.describegptnow uses the much faster BLAKE3 hash as a cache key (10-20x faster than SHA256) and supports passing complex prompts more easily through the file system.- qsv-stats - the engine that powers both
statsandfrequencycommands - has been further optimized with the 0.40.0 release, to compile summary statistics as fast as possible - even for very large files - often one to two orders of magnitude faster (10 to 100x faster) than typical Python-based tools. - Polars has been upgraded to 0.52.0. This vectorized query engine allows us to support more tabular formats & analyze/query millions of rows in seconds in situ - all without loading the data into a database.
- the csv 1.4.0 crate has been tuned further to squeeze out even higher throughput - already ~2 million rows per second!1
These improvements prepare the ground for the upcoming MCP server on qsv pro, which will enable at-scale, configurable, interactive "Data Steward-in-the-loop", value-added FAIRification of privacy-sensitive files.
The qsv pro MCP server will handle not just CSVs but also other formats, including unstructured data - all processed locally on the desktop, without sending your raw data to the cloud.
It will produce AI-ready, standards-compliant metadata (starting with DCAT-US v3, Croissant and schema.org) - ideal context for AI applications and data governance efforts alike.
Added
frequency: add--pretty-jsonoption c67fd06frequency: add--rank-strategyoption #3075frequency: add-null-textoption #3082
Changed
describegpt: explicitly usefrequency's dense rank strategy dc3f270describegpt: allow--promptto be loaded from a text file b11a10cdescribegpt: use much faster BLAKE3 hash for cache keyfrequency: change default rank-strategy from min (AKA "1224" ranking) to dense (AKA "1223" ranking)lens: bumped csvlens from 0.13.0 to 0.14.0lens: automatically set to monochrome mode when using--findoption 8539869luau: bumped embedded Luau from 0.694 to 0.697 3e68e29stats: fingerprint hash now uses much-faster, parallelizable BLAKE3 instead of SHA256table: document that it also creates "aligned TSVs" and Fixed Width Format files aaa84b0- tests: change default Python to 3.13
- docs: documented that Extended Input Support (ποΈ) does
.zipauto-decompression - docs: documented Limited Extended Input Support (ποΈ)
- use latest qsv-tuned csv crate with performance optimizations
- build(deps): bump flate2 from 1.1.4 to 1.1.5 by @dependabot[bot] in #3071
- build(deps): bump human-panic from 2.0.3 to 2.0.4 by @dependabot[bot] in #3077
- deps: bump Polars from 0.51.0 at py-1.35.0-beta.1 to 0.52.0 618edf0
- build(deps): bump qsv-stats from 0.39.1 to 0.40.0 by @dependabot[bot] in #3078
- build(deps): bump actions/upload-artifact from 4 to 5 by @dependabot[bot] in #3074
- applied several clippy lint suggestions
- bumped several indirect dependencies
- align nightly to 2025-10-24, the same nightly as Polars
- bumped MSRV to Rust 1.91
Fixed
describegpt: add SQL escaping to eliminate SQL injection attack vector; add.csvextension to--sql-outputwhen Polars SQL query runs successfully ad52a35frequency: fix--selectoption always returning<ALL_UNIQUE>#3082- fixed some publishing workflows
Removed
- Removed SHA256 and replaced with mush faster, parallelizable BLAKE3 hash #3072 and #3080
- publish: removed
maximize-build-spacestep in workflows as it was not working as advertised - tests: removed
target-cpu=nativeRUSTFLAG in CI tests to avoid intermittent SIGILL (Illegal Instruction) faults
Full Changelog: 8.1.1...9.1.0
