15 Feb 16:53

376266e

16.1.0 Latest

Latest

[16.1.0] - 2026-02-15 📊 "The Accelerated Civic Intelligence (ACI) Release" 📊

Statistical analysis gets faster and more robust; User & Agent Experience (UAX) improvements keep the CLI parser, docs, shell completions, and MCP tool definitions in sync from a single source; and the qsv MCP Server gets leaner and smarter.

With a properly configured environment, a User can team up with several AI Agents for accelerated analysis of large, real-world, messy data — raw datasets, presentations, reports, spreadsheets, etc. — without uploading it all to the cloud or manually wrangling it into shape first. Analyzing in a few minutes, what would otherwise take a few days, if not a few weeks to compile.

🌟 Major Features

New `pragmastat` Command

A pragmatic statistical toolkit by @AndreyAkinshin — Compute robust, median-of-pairwise statistics with the Pragmastat library. Designed for messy, heavy-tailed, or outlier-prone data where mean/stddev can mislead. See pragmastat.dev for details on the underlying algorithms and design philosophy.

Frequency Cache System

New --frequency-jsonl option for the frequency command creates a JSONL cache (analogous to stats --stats-jsonl) that accelerates repeated frequency analysis. Uses a hybrid strategy for high-cardinality columns with configurable thresholds.

Improved UAX: Unified Documentation & Shell Completions

A new docopt-based parsing system now generates markdown documentation, shell completions, and MCP tool definitions from the same USAGE text that powers qsv's CLI parsing. Everything stays in sync automatically — no more drift between help text, docs, completions and AI tooling.

--generate-help-md flag produces polished markdown docs with section navigation, emoji legends, clickable URLs, and argument/option tables that are both Human and Agent-friendly.
Shell completions are now auto-generated, replacing 68 manually maintained completion files.

qsv MCP Server: Leaner Architecture

The qsv_pipeline tool has been removed in favor of direct sequential command execution. In practice, agents were already calling commands one at a time, and removing the pipeline abstraction made the server simpler, more predictable, and easier to debug. Additional MCP improvements include:

Extended AI agent guidance to take advantage of frequency and stats caches
Seamless support for Google Gemini CLI thanks to @kulnor's continuing contributions
Major codebase refactoring: deduplicated helpers, extracted filesystem tools, fixed any types, and various bug fixes

Detailed MCP changes are documented in the MCP CHANGELOG for full details.

Added

feat: pragmastat command — pragmatic statistical toolkit with parallelism, progress bar, and memcheck (by @AndreyAkinshin)
feat: frequency --frequency-jsonl — JSONL frequency cache with hybrid strategy for high-cardinality columns
feat: --generate-help-md flag — auto-generate markdown docs from USAGE text with section navigation, emoji legends, and clickable URLs
docs: add QSV_FREQ_HIGH_CARD_THRESHOLD and QSV_FREQ_HIGH_CARD_THRESHOLD_PCT env vars

Changed

perf: stats — skip redundant modes tracking, reduce allocations, optimize cache line layout, deterministic antimode sorting
perf: pragmastat — reduce redundant computations, add parallelism
perf: frequency — use sort_unstable_by for faster sorting; parallel computation for high-cardinality columns
refactor: shell completions auto-generated from USAGE text (removed 68 manual files)
refactor: describegpt — disambiguate "Other" bucket from literal "Other" in Data Dictionary Examples column
deps: bump anstream from 0.6.21 to 1.0.0
deps: bump futures to 0.3.32
deps: bump jsonschema from 0.41 to 0.42
deps: bump libc from 0.2.180 to 0.2.181
deps: bump memmap2 from 0.9.9 to 0.9.10
deps: bump polars to latest upstream
deps: bump pyo3 from 0.28.0 to 0.28.1
deps: bump quickcheck from 1.0.3 to 1.1.0
deps: bump rand from 0.9 to 0.10, rand_hc to 0.5, rand_xoshiro to 0.8
deps: bump sysinfo from 0.37.2 to 0.38.2
deps: bump tempfile from 3.24.0 to 3.25.0
deps: bump toml from 0.9.12 to 1.0.1
deps: bump uuid from 1.20.0 to 1.21.0
deps: bump zmij from 1.0.20 to 1.0.21
deps: update csv patched fork MSRV to 1.93

Fixed

fix: frequency — normalize delimiter for cache compatibility; deterministic output with secondary sort key; hybrid cache for high-cardinality columns
fix: stats — remove unsafe block; deterministic antimode sorting
fix(help): section detection, acronym casing, and option word-wrap in markdown generation

Removed

removed 68 manual shell completion files (now auto-generated from USAGE text)

Full Changelog: 16.0.0...16.1.0

Contributors

kulnor and AndreyAkinshin

Assets 15

qsv-16.1.0-aarch64-apple-darwin.zip

sha256:557b558032f1320e77b356f944000ed03ab6c271590608f40f2b1c615defc67f

181 MB 2026-02-15T18:20:39Z
qsv-16.1.0-aarch64-pc-windows-msvc.zip

sha256:653099ca2f74bdd1892504ef56dfc570006988914b35a66255574db39f4c71ec

46.3 MB 2026-02-15T18:30:39Z
qsv-16.1.0-aarch64-unknown-linux-gnu.zip

sha256:d1b9a3e2a01f91fbcdcc75b0f12073ce62ec3afc683ad505d12969857d0e8ac5

54.6 MB 2026-02-15T18:00:56Z
qsv-16.1.0-geocode-index.rkyv

sha256:57c4cb039e9bfed1f3a3d1fefadfc0f15c95ae53181a09eb0088227c71609c58

21.3 MB 2026-02-15T16:52:40Z
qsv-16.1.0-geocode-index.rkyv.cities15000

sha256:569ca2dfa20d2f5cf066e39b0a11ca26afec7f320433ff4321ab1a1152130b79

21.3 MB 2026-02-15T16:52:29Z
qsv-16.1.0-geocode-index.rkyv.cities15000.sz

sha256:40304f73481a39ece950b392c3594eea813e6f1c9a7eb43fc087039e7b48c40d

8.52 MB 2026-02-15T16:52:24Z
qsv-16.1.0-powerpc64le-unknown-linux-gnu.zip

sha256:2982b5a45b90753cfea8c85fe0b8b0419fca2d4059684690cf5388850e601285

21.8 MB 2026-02-15T17:34:16Z
qsv-16.1.0-s390x-unknown-linux-gnu.zip

sha256:0ea6d92daa83b9b82fcf553083b61e09e3eddfdb752fde0d50a9bc1e966ee70b

24 MB 2026-02-15T17:32:54Z
qsv-16.1.0-x86_64-pc-windows-gnu.zip

sha256:778d4535d3ab2199d27a83ceaf3a7f26d9daea2a6b4f3dc3cc83ed07e0e770fe

111 MB 2026-02-15T19:00:40Z
qsv-16.1.0-x86_64-pc-windows-msvc.zip

sha256:53a07338b4c1d75529ceea8385ebcf8a9c056177ed857925e1ee15c54d67f034

276 MB 2026-02-15T19:25:49Z
Source code (zip)

2026-02-15T16:51:37Z
Source code (tar.gz)

2026-02-15T16:51:37Z

09 Feb 04:29

jqnatividad

16.0.0

692fa5e

16.0.0

[16.0.0] - 2026-02-08 🤖 "The AI-Native Release" 🤖

This release makes qsv deeply AI-native — from smarter date detection that flows through to Polars schemas, to a MCP Plugin layer that lets AI agents wield qsv as a first-class data tool.

Claude Desktop, Code, and Cowork users can now use qsv's powerful data-wrangling capabilities directly within their AI workflows, with intelligent guidance and seamless integration. Google Gemini is now also supported thanks to @kulnor.

🌟 Major Features

Smarter Date/DateTime Detection

qsv can now automatically detect date and datetime columns and carry that knowledge through the entire pipeline:

stats --dates-whitelist sniff is now the default — qsv sniffs the first 1000 rows to identify date/datetime field candidates for further guaranteed date/datetime type inferencing
schema auto-detects Date/DateTime columns when generating Polars schemas (.pschema.json)
DateTime type support in Polars schema parsing — temporal types are preserved through sqlp, joinp, and Parquet conversion

Hardened Stats Cache

The stats cache system that accelerates frequency, schema, tojsonl, sqlp, joinp, pivotp, diff, and sample is now more robust:

Simplified API: Removed dataset_stats from get_stats_records(), streamlining all downstream consumers
Safe fallback: Corrupted or unparsable cache files are gracefully handled instead of erroring out
Auto-regeneration: Stats cache regenerates on parse error rather than failing

Enhanced MCP Server (16.0.0)

The qsv MCP Server receives its largest update yet — see MCP CHANGELOG for full details.

Breaking Changes

diff command: --force option removed
- Was used for short-circuiting diffs based on dataset_stats
- No longer needed after stats cache API simplification
to command: parquet subcommand removed
- Use dedicated qsv_to_parquet MCP tool or sqlp for Parquet output

Added

feat: stats — add 'sniff' support for --dates-whitelist
feat: schema — auto-detect Date/DateTime columns for Polars schema via sniff
feat: Support DateTime type in Polars schema parsing

Changed

refactor: stats — make --dates-whitelist sniff the default
perf: Use foldhash HashMap/HashSet across codebase for faster hashing
- Replaces std::collections with foldhash in 14 modules
- foldhash is much faster than std::collections for non-crypto hashing
refactor: stats Remove dataset_stats from stats cache system
- Simplified get_stats_records() API
- Centralized rowcount handling in sample command
- Adapted diff, pivotp, sample, and other commands to new API
refactor: stats Stats cache now regenerates on parse error (improved robustness)
refactor: stats Safe fallback on corrupted stats cache
refactor: pivotp use sparsity for suggestions and uniqueness_ratio for pivot heuristics
refactor: sample lazily compute row_count only for sampling methods that need it
deps: bump async-compression to 0.4.39
deps: bump bytes from 1.11.0 to 1.11.1
deps: bump calamine to 0.33
deps: bump csv-nose from 0.7.0 to 0.8.0
deps: bump csvlens to latest upstream (PR merged)
deps: bump geosuggest to latest upstream
deps: bump flate2 from 1.1.8 to 1.1.9
deps: bump jsonschema from 0.40.0 to 0.41 (latest upstream with unreleased perf improvements)
deps: bump polars from 0.52.0 at py-1.38.1 tag to 0.53
deps: bump pyo3 from 0.27.2 to 0.28.0
deps: bump redis from 1.0.2 to 1.0.3
deps: bump regex from 1.12.2 to 1.12.3
deps: bump reqwest from 0.13.1 to 0.13.2
deps: bump zerocopy from 0.8.35 to 0.8.36
deps: bump zip from 6 to 7
deps: bump zmij from 1.0.17 to 1.0.20
deps: we now bundle Luau 0.708 from 0.706
deps: bump @modelcontextprotocol/sdk (MCP)
applied several clippy lint suggestions
applied several GH Copilot and Claude review suggestions

Fixed

fix: frequency column selection when using --select option in different order
- Now lookup cardinality by column name instead of index
- Handles user-selected/reordered column subsets correctly
fix: sample handle missing min weight in stats cache
fix: validate adapt tests to jsonschema 0.40.2 error message format changes
fix: joinp switch pschema serialization to serde_json for compound type support
fix: excel adjust jsonl path usage caused by calamine 0.33 release
fix: stats return sentinel when sniff finds no date columns
fix: config — QSV_NO_HEADERS environment variable being ignored; split no_headers into explicit setter and CLI flag method

Removed

removed to parquet subcommand in favor of dedicated qsv_to_parquet MCP tool and sqlp Parquet output support
removed cargo install instructions from README as qsv is rarely cargo installable as it uses patched forks on a regular basis and cargo install doesn't support git dependencies.

Full Changelog: 15.0.1...16.0.0

Contributors

kulnor

Assets 15

28 Jan 12:38

jqnatividad

15.0.1

5ba35e7

15.0.1

[15.0.1] - 2026-01-28

Ooops, we celebrated color and the magika-powered revamped sniff but forgot to actually enable them in the release prebuilts! 🤦🏻‍♂️
This patch enables the new color command, turns on magika, along with several fixes and dependency bumps.

Changed

deps: bump polars to latest upstream
deps: bump csv-nose from 0.6.0 to 0.7.0
deps: bump mlua from 0.11.5 to 0.11.6
deps: bump minijinja from 2.14.0 to 2.15.1
deps: bump minijinja-contrib from 2.14.0 to 2.15.1
deps: bump siphasher from 1.0.1 to 1.0.2
deps: bump iana-time-zone from 0.1.64 to 0.1.65
deps: bump hono from 4.11.4 to 4.11.7 (MCP)
build: add color feature to build and test workflows
build: add magika feature to publishing workflows
docs: updated luau documentation to reflect bundled Luau 0.706
docs: sniff is now also 🤖-powered with its use of Magika mime-type detection

Fixed

tests: fix flaky color test_get_theme test (now ignored due to environment dependencies)
tests: fix flaky search JSON test by using semantic rather than byte-by-byte compare

Full Changelog: 15.0.0...15.0.1

Assets 15

26 Jan 14:27

jqnatividad

15.0.0

7d4a18b

15.0.0

[15.0.0] - 2026-01-26 🖖🏻 "The Mind Meld Release" 🖖🏽

This is the biggest release of qsv yet thanks to many expert contributions from the community!

@kulnor's deep expertise in statistics and data standards has been instrumental in enhancing qsv's data analysis capabilities across the entire qsv suite! His well-crafted issue reports, detailed design proposals, thorough testing and detailed documentation on top of our weekly mind-melds have vastly improved commands like frequency, stats, moarstats and describegpt. His contributions and advocacy have been invaluable and I've learned a lot from him.
@ws-garcia's research on the Table Uniformity Method (TUM) - the algorithm behind the revamped sniff command will be the linchpin behind our upcoming next-gen CKAN harvester. Though it took a while, our implementation is now complete and achieves 99.55% accuracy on the W3C-CSVW test suite.
@gurgeous' new color command contribution makes viewing CSVs in the terminal a joy! His attention to detail and design aesthetics have resulted in a command that is both functional and visually appealing, with more features on the way!
If you look at the recent commit history, you can see I went on a Claude-bender over the holiday break 🤖. Collaborating heavily with @claude (running Opus 4.5) appropriately enough, to build up qsv's Generative AI capabilities in describegpt and its US Census-aware MCP server.

🌟 Major Features

An entire section courtesy of @kulnor's mind-melds.

Enhanced `frequency` Command

Powerful new filtering and display options:

--no-float: Exclude Float columns from frequency analysis
--pct-nulls: Include NULL values in percentage calculations
--null-sorted: Sort NULL values with other entries (not at end)
--no-other: Exclude the "Other" aggregation category
--null-text: Customize the NULL display text
--stats-filter: Luau-based column filtering using statistics
- Filter columns based on any stats field (nullcount, cardinality, type, etc.)
- Full Luau expression support for complex conditions
Omit stats in JSON output when using --weight

Enhanced `describegpt` Command

AI-powered data description gets smarter. Now optimized to work with LM Studio and openai/gpt-oss-20b out-of-the-box:

--frequency-options / --freq-opts: Pass options to underlying frequency command
--enum-threshold Integration: Control enum constraint compilation thresholds
file: Prefix Support: Load prompts from files with file:my_prompt.txt
CLI Supersedes Environment Variables: Command-line options take precedence
Updated LLM Base URLs: Current endpoints for major providers
Robust Frequency Parsing: Better handling of frequency output formats
QSV_TEST_DESCRIBEGPT: Environment variable for testing describegpt features

Enhanced `stats` Command

File Metadata in JSON: JSON output now includes source file information
Removed --dataset-stats: Statistics are now always populated (was optional flag)

Enhanced `transpose` Command

--select Option: Select specific columns during transposition
- Uses standard qsv select syntax
- Filter columns before wide-to-long transformation

Revamped `sniff` Command

Complete overhaul of CSV sniffing capabilities with state-of-the-art detection algorithms:

csv-nose Integration: Replaced qsv-sniffer with csv-nose for more robust and accurate detection using @ws-garcia's TUM algorithm
Magika-Powered Inference: Feature-gated integration with Google's Magika for advanced, AI-powered file type detection
- Inference labels for detected types
- Confidence scores for type predictions
1-Based Field Numbering: More intuitive field indexing
Robust Remote URLs: Improved handling of remote CSV sources
'Unknown' Fallback: Graceful handling of undetectable data types

NEW: `color` Command by @gurgeous

A vibrant new command for displaying CSVs as colorized, pretty-printed tables:

Pretty Tables: Transform your CSVs into beautiful, readable terminal output
Row Numbers (--row-numbers): Add line numbers for easy reference
Custom Titles (--title): Add descriptive headers to your output
Color Themes (--color): Choose from multiple color schemes
Placeholder Support: Configurable placeholders for empty values
Environment Variables: QSV_TERMWIDTH (max 1000) and QSV_FORCE_COLOR support
Microoptimized: Fast rendering even for large datasets

Enhanced MCP Server

Major improvements to the Model Context Protocol server, making qsv even more AI-native:

Token Optimization 🚀

66-76% token reduction in tool definitions
Removed redundant defaults and test_file fields from schemas
Streamlined tool and prompts for efficient LLM consumption

Tool Lazy Loading

Tool Search: Dynamically discover available tools and load them as required
Expose-All-Tools Mode: Option to expose the complete tool catalog
Universal --help: Even deeper help across all MCP-exposed commands if the Agent needs more information

Documentation & Integration

Census Integration Guide: If you have the US Census' Official MCP Server installed, prime @claude to use it together with qsv efficiently to do deep research and analysis on data without overunning the context window.
Updated Claude/MCP Documentation: Comprehensive Documentation
qsv Prompts: Pre-built prompts for common data wrangling tasks
SkillExecutor Unit Tests: Robust testing for skill execution

🏗️ Infrastructure & Quality

Testing

Test suite expanded to 2,448 tests
Comprehensive coverage for new MCP features
SkillExecutor unit tests added

Documentation

DeepWiki Badge: Added project documentation badge
Emoji Legend: Added 🖥️ for UI commands, Luau logos for scripting
COMMAND_DEPENDENCIES.md: New comprehensive command dependency documentation (by @kulnor)
Detailed Examples: Enhanced examples for numerous commands, formatted to be both human and AI-readable
Magika in Version Metadata: File type detection engine now shown in version info

📦 Dependencies

Major Updates

reqwest: 0.12 → 0.13
jsonschema: 0.39 → 0.40
crossterm: 0.28.1 → 0.29.0
csv-nose: 0.2.0 → 0.6.0
sysinfo: 0.37.2 → 0.38.0
rust_decimal: 1.39.0 → 1.40.0

Minor Updates

zmij: 1.0.13 → 1.0.17
flexi_logger: 0.31.7 → 0.31.8
cmov: 0.4.3 → 0.4.5
filetime: 0.2.26 → 0.2.27
get-size2: 0.7.3 → 0.7.4
hono: 4.11.3 → 4.11.4
lodash: 4.17.21 → 4.17.23
Polars: Latest upstream

CI/Actions

actions/checkout: 4 → 6
actions/setup-python: 6.1.0 → 6.2.0

Other

Patched calamine fork with unreleased fixes
MSRV: Rust 1.93

🌍 Environment Variables

New

QSV_MCP_MAX_EXAMPLES: Maximum examples per MCP tool
QSV_TERMWIDTH: Terminal width for color command (max 1000)
QSV_FORCE_COLOR: Force color output
QSV_TEST_DESCRIBEGPT: Enable describegpt testing mode

Updated

QSV_PREAMBLE_ROWS: Enhanced preamble detection
Various QSV_STATS_* and QSV_FORCE_* variables

Migration Notes

Breaking Changes

stats command: --dataset-stats option removed
- Statistics are now always computed
- No migration needed if not using this flag
sniff command: Field numbering changed to 1-based
- Scripts parsing field numbers may need adjustment
- More consistent with other qsv commands

Added

feat: NEW color command for pretty-printed colorized tables by @gurgeous
feat: frequency add --no-float option to exclude Float columns
feat: frequency add --pct-nulls option for NULL percentage calculations
feat: frequency add --null-sorted option for sorting NULL values
feat: frequency add --no-other option to exclude Other category
feat: frequency add --null-text option for custom NULL display
feat: frequency add --stats-filter for Luau-based column filtering
feat: describegpt add --frequency-options / --freq-opts option
feat: describegpt add --enum-threshold integration
feat: describegpt add file: prefix support for prompt files
feat: stats add file metadata to JSON output
feat: transpose add --select option for column selection
feat: sniff integrate csv-nose for improved CSV detection
feat: sniff add Magika-powered file type inference (feature-gated)
feat: mcp add Tool Search capability
feat: mcp add expose-all-tools mode
feat: mcp add universal --help support
feat: mcp add subcommand enum support
feat: mcp add QSV_MCP_MAX_EXAMPLES configuration
docs: add COMMAND_DEPENDENCIES.md by @kulnor
docs: add DeepWiki badge
docs: add emoji legend for UI commands and Luau
docs: add Census integration g...

Contributors

claude, kulnor, and 2 other contributors

Assets 15

13 Jan 03:26

jqnatividad

14.0.0

b56f355

14.0.0

[14.0.0] - 2026-01-12 📦 "The qsv MCP for Everyone Release" 🎁

Building on our 13.0.0 "AI-native Agent" release last week, qsv 14.0.0 is dedicated to making AI integration seamless, reliable, and easy for everyone.

Previously, installing the qsv MCP Server required a full-fledged development environment and familiarity with command line tools and was not readily usable by non-developers.

This release transforms the qsv MCP Server from a powerful developer tool into a user-friendly, transparently integrated Claude Desktop data-wrangling agent with robust cross-platform support, automatic updates, and comprehensive testing infrastructure.

MCP Desktop Extension (Bundle) - One-Click Installation

The new MCP Desktop Extension provides a streamlined installation experience for Claude Desktop users:

User-Friendly Package - Pre-configured bundle with automatic qsv binary detection - and if not found, provide installation guidance¹
Cross-Platform Support - Works seamlessly on macOS, Windows, and Linux
Smart Data-wrangling - it's deep knowledge of qsv insulates the User from the nitty-gritty details of the comprehensive toolkit with its hundreds of options, while ensuring fast, effective operations
Token Efficient - Despite this deep knowledge, the MCP server is still token efficient by including intelligent contextual guidance to help Claude make optimal decisions (USE WHEN, COMMON PATTERNS, ERROR PREVENTION, PERFORMANCE HINTS prompt guidance along with lazy-loading of full qsv --help text when more info is required)
Security Enhanced - Raw Data is not sent to Claude, only statistical metadata²
Welcome Experience - Includes prompts and examples to get started quickly
Seamlessly works with both Claude Code and the just launched Claude Cowork! Take qsv beyond data-wrangling chats and unlock even greater potential with an agentic qsv.

The Desktop Extension follows the official MCP Bundle (MCPB) manifest specification v0.3, ensuring compatibility with Claude Desktop and future MCP-compatible applications.

See the MCP documentation for installation instructions.

Breaking Changes

MCP Skills: qsv-skill-gen binary removed - use qsv --update-mcp-skills instead (requires mcp feature flag)

Added

feat: MCP Desktop Extension - user friendly installation of qsv MCP Server #3296
feat: MCP Server: numerous QoL improvements to MCP Desktop Bundle #3298
feat: MCP skills auto update #3292
feat: MCP - add expert guidance, common patterns, MCP optimized descriptions & usage hints #3303
feat: MCP skills generator now extracts performance hints (📇 indexed, 🤯 memory-intensive, 😣 proportional memory) from README.md command table
feat: MCP Server automatically enables --stats-jsonl flag for stats command to create cache for smart commands
feat: MCP enhanced tool descriptions with intelligent guidance - USE WHEN, COMMON PATTERNS, ERROR PREVENTION hints
feat: MCP parameter enhancements with examples for common options (selection, delimiter, etc.)
feat: MCP comprehensive pipeline tool description with workflows and limitations
feat: MCP enhanced filesystem tools (list_files, set_working_dir, get_working_dir) with usage guidance
feat: MCP add auto-detection of qsv binary path for Desktop Extension 5c09672e
feat: MCP various Quality-of-Life UI/UX improvements b5b338f6
feat: MCP enhance Desktop Extension with validation and fixes e2e20551
feat: MCP add prompts for welcome message and examples 2672a74b
feat: Claude Code GitHub App integration - PR review and issue assistance workflows #3312
tests: MCP add CI test workflow for qsv MCP server 8732fee3
docs: MCP add comprehensive Claude Code (CLI) documentation 97a88c4e
docs: MCP add an MCP Server-specific CLAUDE.md e7e5f9e1
docs: add qsv pro download badges to README and update description #3295
docs: add alt text to all download badges cc1c3819
docs: add mise alternate installation documentation #3304
docs: MCP update skills markdown documentation #3308
docs: add MCP Server environment variables section to ENVIRONMENT_VARIABLES.md & dotenv.template

Changed

refactor: MCP Server - removed applydp command (datapusher+ specific, not needed for general use)
refactor: MCP use qsv --update-mcp-skill instead of separate qsv-skill-gen binary 13380ba1
refactor: MCP remove qsv-skill-gen binary, make it an option in qsv gated behind mcp feature flag 9c771ee6
refactor: MCP more robust output processing - use temp output file and stdout intelligently #3291
refactor: MCP qsv-skill-gen.rs to preserve positional docopt args when generating skills JSON file 9618a25c
refactor: MCP make output/temp file processing smarter 207274c7
refactor: MCP use directory type for filesystem config to clarify restricted access 9650fb41
refactor: MCP added null checks before iterating arrays 2d0747ab
refactor: MCP fixed TS output directory to account for prod and test builds b0b12a40
refactor: MCP address all issues identified during Copilot review 27027e50
refactor: MCP optimize tokens use - extract concise command descriptions from README #3307
refactor: MCP fine-tune select guidance 37964123
docs: with MCP fully implemented - update the logo to make the horse robotic 33f3b9f5
docs: comprehensive STATS_DEFINITION.md update b443ccc4
chore: address valid robustness issues in last Copilot review 55a5a300
chore: delete CITATION.cff file and just depend on Zenodo integration which auto-assigns a DOI on release 9b981b8c
deps: bump polars to 0.52.0 at py-1.37.1 tag 3bbad1ea
deps: bump atoi_simd and calamine c7cd928f
deps: bump data-encoding from 2.9.0 to 2.10.0 09bf3c33
deps: bump unicase from 2.8.1 to 2.9.0 99f66a3b
deps: bump csvlens to 15.1 and remove our patched fork d588e36e
deps: use latest csvlens with marked row export fd706255
deps: bump blake3 to 1.8.3 and remove our patched fork 05f0efbb
deps: bump toml from 0.9.10+spec-1.1.0 to 0.9.11+spec-1.1.0 2330b1d2
deps: bump zerocopy from 0.8.32 to 0.8.33 950564d1
build(deps): bump serde_json from 1.0.148 to 1.0.149 #3290
build(deps): bump @modelcontextprotocol/sdk from 1.25.1 to 1.25.2 #3293
build(deps): bump indexmap from 2.12.1 to 2.13.0 #3294
build(deps): bump libc from 0.2.179 to 0.2.180 #3299
build(deps): bump zmij from 1.0.12 to 1.0.13 #3305
build(deps): bump actions/checkout from 4 to 6 #3309
build(deps): bump actions/setup-node from 4 to 6 #3310
deps: bump nightly from 2025-10-24 to 2026-01-09; same as polars f77ea524
bumped several indirect dependencies
applied select clippy & Codacy suggestions
applied several GH Copilot and Claude review suggestions
bumped nightly from 2025-10-24 to 2026-01-09, same as polars

Fixed

fix: stats use .get() instead of [] indexing to avoid panics on missing keys when using old stats cache file #3306
fix: MCP force add tsconfig.json #3301
fix: MCP correct manifest.json to match official spec v0.3 c783cf2c
fix: MCP expand template variables in config paths 3177cfe1
fix: MCP address Copilot review issues in package-mcpb.js ec37b7c7
fix: MCP replace execSync with execFileSync for security reasons 5209c751
fix: MCP add promise-based deduplication for metadata cache to prevent race conditions https...

The qsv MCP Server is at v14.1.0, incorporating several fixes ↩
Note that statistical metadata is not anonymized and will disclose potentially sensitive information. See #3289 ↩

Assets 15

06 Jan 13:15

jqnatividad

13.0.0

1ec4696

13.0.0

[13.0.0] - 2026-01-06 🦾 "The Statistical Data-Wrangling Agent Release" 🤖

We welcome 2026 with qsv 13.0.0 - a major milestone that transforms qsv into an AI-native Agent!

This is in addition to the online AI-Chatbot for CKAN portals we released last September and the expanded describegpt command we released last month as we continue our march towards even more AI/ML/Graph/FAIR and Data Librarian/Concierge/Advisor/Analyst capabilities across the datHere suite in the coming months as we embark on a strategic partnership with the Open Knowledge Foundation to Strengthen Open, FAIR, AI-Ready Data Infrastructure powered by CKAN.

This release introduces first-class support for AI agents through three major new capabilities:

MCP Server - Model Context Protocol Integration

qsv now ships with a built-in Model Context Protocol (MCP) Server enabling seamless integration with AI Chatbots starting with Claude Desktop.

Local Data - Its "zero-copy" inspired approach allows you to wrangle very large datasets - WITHOUT sending raw data¹, only sending statistical metadata to Claude! This is not only good for security and privacy reasons - it overcomes Claude's upload size limit, saves tokens and improves performance!
22 MCP Tools: 20 common qsv commands as individual tools + 1 generic tool to access all other 46 commands + 1 pipeline tool
Natural Language Interface: No need to remember command syntax
Pipeline Support: Chain multiple operations together seamlessly

See the MCP documentation for detailed setup instructions.

Claude Agent SDK Helper Utilities

New Agent Skills infrastructure provides:

qsv-skill-gen CLI - Generate skill definitions for AI agents
Parses qsv USAGE text using qsv-docopt to generate JSON skill definitions. This allows quick update of Agent Skills as commands and options are added & modified.
Shell-safe example generation with proper quoting
Comprehensive documentation for AI agent integration to integrate qsv into your own AI solutions!

`moarstats` - Massive Statistical Expansion

The moarstats command received substantial enhancements, adding 24+ MOAR statistical measures:

Advanced Univariate Statistics:

Bimodality Coefficient - Detect multimodal distributions
Normalized Entropy - Scaled information content measure (0-1)
Atkinson Index - Inequality measure with configurable epsilon parameter

Bivariate Statistics:

Pearson's correlation - Linear correlation coefficient
Spearman's rank correlation - Monotonic relationship measure
Kendall's tau - Concordance-based correlation
Covariance - Joint variability measure
Mutual Information - Information-theoretic dependency
Normalized Mutual Information - Scaled mutual information (0-1)
Multi-dataset joins - --join-inputs for bivariate analysis ACROSS datasets

XSD Type Mapping:

Automatic inference of W3C XML Schema Definition (XSD) datatypes
Smart XSD Gregorian date type inferencing with "quick" and "thorough" modes (#3259)
Support for gYear, gMonth, gDay, gMonthDay, gYearMonth validation

See STATS_DEFINITIONS.md for a comprehensive list of the ~100 statistical metrics qsv compiles!

Breaking Changes

lens: Default behavior changed to NOT stream from stdin (use explicit flag if needed)
moarstats: Output now includes additional columns (xsd_type, bivariate stats)

Added

feat: qsv MCP server #3269
feat: MCP - expanded file selector for more supported tabular file formats; auto index for files larger than 10mb #3278
feat: added Claude Agent Skills SDK support 🤖 #3264
feat: moarstats add "xsd_type" column #3242
feat: moarstats add Atkinson Index with configurable inequality aversion parameter, Normalized Entropy & Bimodal Coefficient #3243
feat: moarstats add bivariate stats #3247
feat: moarstats add normalized mutual info #3256
feat: moarstats add --force and --jobs options #3253
feat: moarstats add "xsd_subtype" Gregorian date data types inferencing with --xsd-gdate-scan having fast (default) and comprehensive modes #3259
feat: qsvdp enable join command that moarstats uses #3252
docs: added comprehensive stats documentation #3240

Changed

refactor: describegpt - consolidate JSON response parsing; cache handling; and make DuckDB & Polars error handling more consistent #3241
refactor: frequency reduce duplication introduced by --weight option #3236
perf: frequency precompute other_prefix for performance 2dc75ee
perf: frequency simplify apply_limits* helper functions f0b7f9c
perf: pivotp convert directly to PlSmallStr for performance b7dbb3f
refactor MCP Server to optimize for Local Access to Files #3272
refactor: MCP Server improvements #3274
refactor: MCP Server remove examples from ci tests #3277
refactor: MCP Server add LIFO converted cache #3280
refactor: MCP Server moar refactoring after tests #3282
perf: moarstats much faster bivariate calculation #3248
perf: moarstats optimize non-streaming bivariate stats compilation #3250
refactor: qsv Skills Agent #3267
deps: polars bump to rev c241260 #3276
build(deps): bump itoa from 1.0.16 to 1.0.17 by @dependabot[bot] in #3239
build(deps): bump human-panic from 2.0.4 to 2.0.5 by @dependabot[bot] in #3234
build(deps): bump human-panic from 2.0.5 to 2.0.6 by @dependabot[bot] in #3249
build(deps): bump libc from 0.2.178 to 0.2.179 by @dependabot[bot] in #3265
build(deps): bump redis from 1.0.1 to 1.0.2 by @dependabot[bot] in #3232
build(deps): bump rfd from 0.16.0 to 0.17.0 by @dependabot[bot] in #3279
build(deps): bump rfd from 0.17.0 to 0.17.1 by @dependabot[bot] in #3284
build(deps): bump serde_json from 1.0.147 to 1.0.148 by @dependabot[bot] in #3238
build(deps): bump serial_test from 3.2.0 to 3.3.0 by @dependabot[bot] in #3273
build(deps): bump serial_test from 3.3.0 to 3.3.1 by @dependabot[bot] in #3275
build(deps): bump tokio from 1.48.0 to 1.49.0 by @dependabot[bot] in #3266
build(deps): bump url from 2.5.7 to 2.5.8 by @dependabot[bot] in #3286
build(deps): numerous bumps zmij from 0.1.7 to 1.0.12
bumped several indirect dependencies
applied select clippy & Codacy suggestions
applied several GH Copilot and Claude review suggestions

Fixed

fix: refresh_cpu_all() -> refresh_cpu_list(sysinfo::CpuRefreshKind::nothing())… #3261
fix: stats remove redundant check 0977ebf
fix: moarstats correct kendall_tau formula cf16543
fix: describegpt and util::run_qsv_cmd - add special case for sample as it expects output differently 6b6039f
fix: CVE-2025-66414 security vulnerability GHSA-w48q-cv73-mx4w
fix: RUSTSEC-2026-0001 (rkyv bump) c2d4937
typo: Portugese → Portuguese
typo: stats asummes → assumes

AI Contributors

@jqnatividad collaborated with and orchestrated @Copilot, Claude Code, Cursor and Gemini using various models

Full Changelog: 12.0.0...13.0.0

Note that statistical metadata is not anonymized and will disclose potentially sensitive information. See #3289 ↩

Contributors

jqnatividad and dependabot

Assets 14

24 Dec 14:14

jqnatividad

12.0.0

bd21fa3

12.0.0

[12.0.0] - 2025-12-24 🎄

Stuff your virtual stocking and jingle your data bells - qsv 12.0.0 slides down the chimney packed fuller than Santa’s sleigh! Unwrap delightful surprises like the shiny new moarstats command, gift-wrapped weighted statistics, and AI-powered FAIR metadata inferencing now speaking in multiple languages (no elf translation required). As the star on top, meet TOON - the brand new LLM-optimized, token-efficient format - ready to sleigh your AI projects all through 2026. Ho-ho-hold my data, this update’s a festive feast!

Special thanks to @kulnor for advocating, brainstorming & testing many of the new features below!

🌟 Major Features

NEW: moarstats Command

A powerful new command for "moar" advanced statistical analysis, providing statistics beyond what the stats command offers:

Comprehensive Statistics: Over 50+ advanced statistical measures including:
- Detailed outlier analysis (count, sum, average)
- Winsorized and trimmed means (5%, 10%, 20%, 25%)
- Multiple dispersion measures (IQR to range ratio, quartile coefficient of dispersion)
- Distribution statistics (skewness, multiple kurtosis measures)
Advanced Option (--advanced): Access computationally intensive statistics:
- Gini coefficient for inequality measurement
- Excess Kurtosis to measure "tailedness" of the distribution
- Shannon Entropy for data diversity analysis
Available on all binary variants for universal access

Enhanced describegpt Command

Major enhancements to AI-powered data description capabilities:

⛩️ Minijinja Template Engine Integration:
- Custom prompt templating with full Minijinja and Minijinja-contrib filters
- More powerful and flexible prompt customization
Multilingual Support:
- --language option for generating descriptions in any language/dialect
  - Languages: Spanish, Portuguese, Italian, Japanese, Hindi, Arabic
  - Dialects: Franglais, Taglish, Pennsylvania Dutch
  - Constructed Languages: Klingon, High Valyrian, Quenya
  - Personalities: Snoop Dog, Hans Rosling, Christopher Walken
  - Personas: Gen Z Slang, Silly, Emoji-loving Santa
- Automatic language detection in --prompt mode
- SQL comments also generated in requested language
Advanced Features:
- --addl-columns option with detailed attribution and system metadata
- --export-prompt <file> to save the default prompts to the specified file.
  This file can then be tailored and used with the --prompt-file <file> option.
- Iterative, session-based SQL RAG with --prompt option
- Sampling in prompt mode for better SQL generation
- Lookup table and CKAN support for controlled vocabularies
- Convenience values for --addl-cols-list
  (i.e., "everything", "everything!", "moar", "moar!")

Weighted Statistics Support

Comprehensive weighted statistics implementation across multiple commands:

stats Command (--weight <column>):
- Weighted mean, standard deviation, variance
- Weighted MAD (Median Absolute Deviation) and percentiles
- Weighted modes and antimodes
- Weighted harmonic and geometric means
- All weighted calculations handle non-finite values gracefully
frequency Command (--weight <column>):
- Weighted frequency distributions
- Proper handling of weighted "Other" and "ALL UNIQUE" category
- Non-finite weights automatically skipped

Token Object Oriented Notation (TOON) Format Support

A compact, human-readable encoding of the JSON data model for LLM prompts
Commands Supporting TOON:
- describegpt --format TOON
- frequency --toon
Benefits: More readable than JSON, easier to parse than CSV for hierarchical data
and more token-efficient, terse format targeted for LLMs

stats Command Enhancements

Percentile Improvements:
- --percentile-list special values: "deciles" and "quintiles"
- Percentile labels now include prefix before value (e.g., "p50: 42.5")
- Validation of percentile-list on startup
New Columns: Added n_counts for more detailed count information
Performance Optimizations:
- Optimized Stats struct layout
- Eliminated redundant, unnecessary sorting
- Removed redundant filtering for weighted stats functions
- Microoptimizations throughout

transpose Command

New --long Option: Transform data from wide to long format
- Column selection support using select syntax
- Streaming implementation per GitHub Copilot review suggestions

diff Command

upgraded csv-diff from 0.1.1 to faster 0.1.2, improving performance
in optimal cases by up to 25% 🚀

lens Command

Aligned --no-streaming-stdin behavior with csvlens upstream

📊 Output Format Changes

schema Command

Updated $schema from Draft 7 to JSON Schema Draft 2020-12

⚡ Performance Improvements

suite-wide

replaced already fast ryu float to string conversion crate crate with even
faster zmij crate (https://vitaut.net/posts/2025/faster-dtoa/)

stats Command

Optimized Stats struct memory layout
Eliminated redundant sorting operations
Removed unnecessary clone operations
Better handling of real-world data (assumes no infinity values)

frequency Command

Microoptimizations for faster frequency computation
Optimized top_n/bottom_n retrieval

🐛 Bug Fixes

frequency Command

Fixed behavior when compiling weighted frequencies with ALL_UNIQUE
Fixed issue where "Other (0),0,0,0" could appear in output
Proper handling of non-finite weights (automatically skipped)

🏗️ Infrastructure & Quality

Testing

Test suite expanded from 2,060 to 2,380 tests
Comprehensive test coverage for all new features
Weighted statistics thoroughly tested
Advanced moarstats options validated

Code Quality

Extensive GitHub Copilot review integration
Multiple refactoring passes for code clarity
Clippy suggestions incorporated throughout
Better error handling and edge case management

FAIR Principles

Added CITATION.cff (by @rzmk) for academic citation
Added Zenodo DOI badge for dataset citation
Enhanced FAIRification of qsv as a research tool

📚 Documentation Improvements

Statistical Documentation

Comprehensive documentation for statistics produced by stats command (by @kulnor) WIP
Enhanced usage text for stats, frequency, and moarstats
Better examples throughout documentation

Command Documentation

Updated describegpt with multilingual examples
Added controlled tag vocabulary examples
Enhanced TOON format documentation
Better SQL RAG workflow documentation

Migration Notes

Breaking Changes

schema command: $schema output changed from Draft 7 to Draft 2020-12
- Most schemas should be compatible
- Validation tools must support JSON Schema Draft 2020-12
stats command: Output now includes percentile label prefixes
- Example: "p50: 10" of the 50th percentile value instead of just the value "10"
- May affect parsing scripts that expect raw numbers

Added

feat: describegpt add --add-cols and --addl-cols-list <list> options #3179
feat: describegpt add --language option #3184
feat: describegpt use minijinja engine for prompt processing #3188
feat: describegpt add language autodetection in --prompt (chat) mode #3193
feat: describegpt sampling in prompt mode for better SQL generation… #3198
feat: describegpt add --prompt sessions for iterative SQL RAG refinement #3200
feat: describegpt add TOON format support #3205
feat: frequency add TOON format #3206
feat: frequency add weighted frequencies #3218
feat: add new moarstats command #3207
feat: moarstats add even moar! Now with detailed outliers info! #3208
feat: moarstats - add configurable ...

Contributors

kulnor, dependabot, and rzmk

Assets 14

08 Dec 06:09

jqnatividad

11.0.2

1b160d0

11.0.2

[11.0.2] - 2025-12-08

qsv 11.0.2 brings significant enhancements to larger-than-memory data processing, AI-powered metadata inferencing, JSON Schema inferencing & validation, and data viewing capabilities, along with important bug fixes and performance improvements.

All in preparation for at-scale, secure, interactive, "zero-copy" "Data Steward-in-the-Loop" FAIRification on the desktop in qsv pro.

🌟 Major Features

`stats` & `frequency`

Larger than Memory Files: stats & frequency can now handle arbitrarily large files, even when "advanced" statistics are enabled with its new dynamic parallel chunk sizing algorithm! (example stats, frequency)
N Counts: Added "n_counts" (n_negative, n_zero and n_positive) columns to stats output for more detailed count information for numeric fields.

`describegpt`

The describegpt command has received substantial improvements for AI-powered metadata inferencing:

"Neuro-Procedural" Data Dictionaries: combines deterministically computed statistics and frequency distribution data with AI-inferred Human-Friendly Labels and Descriptions to compile an expanded Data Dictionary (not quite "neuro-symbolic" (YET!))
Chat with your Data!: Improved DuckDB and Polars SQL guidance mean more reliable transformations of your Natural Language queries to SQL - leading to fast, deterministic, reproducible, hallucination-free answers! (example, SQL result)
Format Option: Replaced --json flag with --format option for more flexible output formatting
- Supports multiple output formats - Markdown (default), TSV and JSON
- Removed --jsonl option for cleaner API
Controlled Tag Vocabulary: New tag vocabulary system for consistent categorization
- --tag-vocab option to specify controlled vocabulary
- Lookup support for tag vocabularies - retrieve a tag vocabulary from a local or remote CSV
  using http://, https://, dathere:// and ckan:// URL schemes.
Enhanced Boolean Inference: --infer-boolean is now enabled by default for better data type detection
Performance Metrics: Added elapsed time tracking to monitor processing duration
Improved Prompt Templates: Updated default description prompt with PII/PHI alerts and better attribution metadata

`schema` & `validate`

Enhanced JSON Schema inference and validation capabilities:

Strict Formats: New --strict-formats option for stricter JSON Schema format validation,
enforcing JSON Schema format constraints for email, hostname & IP address (IPV4/IPV6) formats.
Output Option: New --output option for specifying schema output destination
- Polars schema now uses consistent naming conventions across commands
- Updated joinp, pivotp, and sqlp commands to use new .pschema.json naming convention
Configurable Email Validation: validate has numerous options to tweak email validation
- taking advantage of schema's email format constraint inferencing.

`sample` time-series sampling

A new --timeseries sampling method with grouping (hourly, daily, weekly),
adaptive sampling (prefer business hours or weekends) with various aggregation (mean, sum, min, max)
within each interval with configurable starting points (first, last or random).

`lens` "real-time" Features

Enhanced CSV viewing capabilities with csvlens integration:

Auto-Reload: New --auto-reload option to automatically reload file when it changes
- Useful for monitoring live data files
Streaming stdin: New --streaming-stdin option for real-time data viewing
- Supports viewing data as it's being piped in
Row Marking: Updated csvlens dependency with row marking feature

Breaking Changes

describegpt: --json flag replaced with --format option
describegpt: --jsonl option removed
schema, joinp, pivotp, sqlp: Updated Polars schema naming conventions
(existing workflows should work but output format may differ slightly)

Added

Created Event Logo Archive with AI-generated seasonal/version logos
describegpt: add controlled vocabulary support for tags #3122
describegpt: add elapsed time #3168
describegpt: add lookup support #3170
excel: add --cell option #3133
frequency: add dynamic parallel chunk sizing #3135
lens: add --auto-reload option #3128
lens: add --streaming-stdin option #3171
sample: add timeseries sampling options #3130
schema: infer addl JSON Schema predefined formats - email, ipv4, ipv6, hostname #3125
schema: add --output option and standardize Polars Schema file name #3126
stats: dynamic parallel chunk sizing with indexed files #3134
stats: add n_negative, n_zero, n_positive count columns #3157
validate: add email validation options #3148
tests: add tests for https://100.dathere.com/lessons/4 by @rzmk in #3151
Added Claude AI guidance for contributors
Enhanced --version output with more comprehensive system metadata

Changed

refactor: describegpt improve tags inferencing with Tag Vocabulary #3139
feat: describegpt - major refactor #3143
feat: describegpt improved Polars SQL processing #3147
feat: describegpt replace --json option with --format option supporting 3 formats - markdown, json and TSV; remove --jsonl option #3167
refactor: frequency & stats - parallel chunk sizing - allow forcing of cpu based chunking #3138
Align partition stdin handling with split/stats pattern by @Copilot in #3162
deps: use latest polars upstream with new SQL fixes and features (pola-rs/polars@e1be17f)
build(deps): bump actions/setup-python from 6.0.0 to 6.1.0 by @dependabot[bot] in #3120
build(deps): bump actix-web from 4.12.0 to 4.12.1 by @dependabot[bot] in #3127
build(deps): bump flate2 from 1.1.5 to 1.1.7 by @dependabot[bot] in #3159
build(deps): bump jsonschema from 0.37.1 to 0.37.2 by @dependabot[bot] in #3129
build(deps): bump jsonschema from 0.37.2 to 0.37.3 by @dependabot[bot] in #3131
build(deps): bump jsonschema from 0.37.3 to 0.37.4 by @dependabot[bot] in #3140
build(deps): bump log from 0.4.28 to 0.4.29 by @dependabot[bot] in #3150
build(deps): bump minijinja from 2.12.0 to 2.13.0 by @dependabot[bot] in #3142
build(deps): bump minijinja-contrib from 2.12.0 to 2.13.0 by @dependabot[bot] in #3141
build(deps): bump pyo3 from 0.27.1 to 0.27.2 by @dependabot[bot] in #3137
build(deps): bump qsv-stats from 0.40.0 to 0.41.0 by @dependabot[bot] in #3136
build(deps): bump qsv-stats from 0.41.0 to 0.42.0 by @dependabot[bot] in #3156
build(deps): bump qsv-stats from 0.42.0 to 0.43.0 by @dependabot[bot] in #3169
build(deps): bump rfd from 0.15.4 to 0.16.0 by @dependabot[bot] in #3121
build(deps): bump uuid from 1.18.1 to 1.19.0 by @dependabot[bot] in #3146
Improved qsvpy build process for Apple Silicon
Updated GitHub Actions workflows for better reliability
bumped several indirect dependencies
applied select clippy & Codacy suggestions
Improved dependency version management
Better feature flag handling

Fixed

fix: apply panic on empty selection #3165
fix: more robust snappy and file extension detection #3166
fix: partition add proper stdin handling regression introduced when --limit option was added #3161
Fix broken layout of environment variable documentation by @tmtmtmtm in #3163

Removed

describegpt: remove --jsonl option #3167
chore: remove jemalloc support #3153

New Contributors

@Copilot made their first contribution in #3162

*...

Contributors

tmtmtmtm, dependabot, and rzmk

Assets 14

23 Nov 22:43

jqnatividad

10.0.0

76e74a1

10.0.0

[10.0.0] - 2025-11-23

Highlights:

Enhanced Data Dictionary: describegpt now features an expanded default prompt (v4.0) that generates more comprehensive data dictionaries.
Parallel Search/Replace Operations: search, searchset, and replace commands now support parallel execution when working with indexed CSV files, delivering significant performance improvements for large datasets.
Search/Replace Exact Match Options: Added --exact option to search, searchset, and replace commands for precise string matching without regex patterns.
Enhanced SQL Capabilities: sqlp now supports arbitrary expressions in SQL JOIN constraints, named window references, and new SQL functions including row_number, rank, dense_rank, and array_to_string.
Improved pivotp Performance: Updated to use Polars' new lazy pivot API with --maintain-order flag for predictable output ordering.
Luau 0.701: Updated embedded Luau from 0.697 to 0.701 with additional pattern matching documentation and tests.

Added

search & searchset: add --exact option for literal string matching #3094
search: parallel search when file is indexed #3096
searchset: parallel execution when indexed #3097
replace: add --exact option e73d9bf
replace: parallel execution when indexed #3098
sqlp: added support for arbitrary expressions in SQL JOIN constraints d47c44e & 0d2402b
sqlp: added support for row_number, rank, and dense_rank SQL window functions #3115
sqlp: added support for named window references #3118
sqlp: added support for array_to_string list evaluation 64cbf34
pivotp: added --maintain-order flag for predictable output ordering 02dca12
describegpt: default-prompt-file v4.0 with expanded Data Dictionary generation 4db0d18
luau: expanded documentation for string functions using pattern matching a7344e3 & 2dcc9a4
util::mem_file_check: added platform adjustment factor 421be84
benchmarks: v7.0 added search & searchset indexed parallel benchmarks 55df784
benchmarks: v7.1.0 added replace_indexed_parallel benchmark 05c89d8

Changed

describegpt: refactored for improved reliability 1433bf1 & b6190a4
frequency: special rank of 0 now assigned to <ALL_UNIQUE> rows effa13b
frequency: microoptimizations 775bb88 & 29ec7af
search, searchset & replace: now parallelizable with an index, with significant performance improvements 45fc83d
search: use faster, non-allocating par_sort_unstable_by_key for improved performance 5f50f23
search: optimize --quick option 1fc1b85
search: --preview-match option forces sequential search 017ca6f
search, searchset & replace: sort chunks instead of raw data for better performance 5b58cb8
searchset: microoptimizations for performance c4ce324
replace: remove unneeded index rebuild logic cfdba60
pivotp: refactored to adapt to Polars' new lazy pivot API #3102
excel: microoptimize hot loop and formula retrieval f141c1b & 17780b5
stats: cache repetitive expensive env_var access in hot path a6ad0ce
stats: multiple microoptimizations 2f41c33 & 9bf43e5 & 00958a1
validate: updated to jsonschema 0.37.x with improved error handling f45693d & c7ad5d2 & b9ea447
luau: updated embedded Luau from 0.697 to 0.701 8885dce
deps: bump polars to latest upstream with numerous SQL and LazyFrame improvements
deps: bump jsonschema from 0.34 to 0.37.1
deps: bump syn from 2.0.109 to 2.0.110 d207524
deps: bump quick-xml from 0.38.3 to 0.38.4 11a5ae4
deps: bump geosuggest-core from 0.8.1 to 0.8.2 baf3194
deps: bump geosuggest-utils from 0.8.1 to 0.8.2 c5bcd1b
deps: bump governor from 0.10.1 to 0.10.2 b0068ef
deps: bump gzp from 2.0.1 to 2.0.2 2a0b901
deps: bump indexmap from 2.12.0 to 2.12.1 afa9c1f
deps: bump mlua from 0.11.4 to 0.11.5 49eedb9
deps: bump signal-hook-registry from 1.4.6 to 1.4.7 5c2e705
deps: bump calamine to 0.32 (removed git dependency) 449f162
deps: bump cached to latest upstream (removed patched fork) 508d1ce
deps: bump actions/checkout from 5 to 6 f76e009
deps: removed hashbrown patched fork ad30460
deps: removed grex patched fork 88cd3fc
deps: updated Cargo.lock file multiple times with indirect dependency updates
docs: updated rust-version requirement to 1.91 c288d4d
docs: prebuilt binaries on Linux and Windows x86_64 are no longer compiled with target-cpu=native 5f892a1
docs: expanded note about Illegal Instruction (SIGILL) faults and portable builds e4df784
docs: describegpt update with expanded Data Dictionary example and link to defaults d722afd & cedcd41 & bba4f76
applied select clippy lint suggestions
bumped several indirect dependencies

Fixed

count: should still work with "broken" CSVs when polars feature is enabled #3104
describegpt: more robust SQL escaping to prevent SQL injection e958329
excel: formula retrieval bug on error b894515
excel: reverted mistaken alloc optimization for trim path b37361a
index: added check to confirm that only uncompressed CSV files can be indexed 1be485b
sqlp: unnest workaround for test compatibility 54d079b
sqlp: corrected array_to_string test 6c661ac
docs: fixed typo QSV_MEMORY_HEADROOM_PCT -> QSV_FREEMEMORY_HEADROOM_PCT f15d03e

Removed

deps: removed polars crates (polars-utils, polars-ops) that are no longer needed a7785f6
publish: removed target-cpu=native as it causes SIGILL on GitHub Action Runners fd74f8f

Full Changelog: 9.1.0...10.0.0

Assets 13

03 Nov 20:52

jqnatividad

9.1.0

7718a14

9.1.0

[9.1.0] - 2025-11-03

FAIRification continues to be a focus, as we tweak key commands that enable us to FAIRify raw data at blazing speed:

frequency received significant updates in this release, including several new options that make compiling frequency distribution tables easier.
describegpt now uses the much faster BLAKE3 hash as a cache key (10-20x faster than SHA256) and supports passing complex prompts more easily through the file system.
qsv-stats - the engine that powers both stats and frequency commands - has been further optimized with the 0.40.0 release, to compile summary statistics as fast as possible - even for very large files - often one to two orders of magnitude faster (10 to 100x faster) than typical Python-based tools.
Polars has been upgraded to 0.52.0. This vectorized query engine allows us to support more tabular formats & analyze/query millions of rows in seconds in situ - all without loading the data into a database.
the csv 1.4.0 crate has been tuned further to squeeze out even higher throughput - already ~2 million rows per second!¹

These improvements prepare the ground for the upcoming MCP server on qsv pro, which will enable at-scale, configurable, interactive "Data Steward-in-the-loop", value-added FAIRification of privacy-sensitive files.

The qsv pro MCP server will handle not just CSVs but also other formats, including unstructured data - all processed locally on the desktop, without sending your raw data to the cloud.

It will produce AI-ready, standards-compliant metadata (starting with DCAT-US v3, Croissant and schema.org) - ideal context for AI applications and data governance efforts alike.

Added

frequency: add --pretty-json option c67fd06
frequency: add --rank-strategy option #3075
frequency: add -null-text option #3082

Changed

describegpt: explicitly use frequency's dense rank strategy dc3f270
describegpt: allow --prompt to be loaded from a text file b11a10c
describegpt: use much faster BLAKE3 hash for cache key
frequency: change default rank-strategy from min (AKA "1224" ranking) to dense (AKA "1223" ranking)
lens: bumped csvlens from 0.13.0 to 0.14.0
lens: automatically set to monochrome mode when using --find option 8539869
luau: bumped embedded Luau from 0.694 to 0.697 3e68e29
stats: fingerprint hash now uses much-faster, parallelizable BLAKE3 instead of SHA256
table: document that it also creates "aligned TSVs" and Fixed Width Format files aaa84b0
tests: change default Python to 3.13
docs: documented that Extended Input Support (🗄️) does .zip auto-decompression
docs: documented Limited Extended Input Support (🗃️)
use latest qsv-tuned csv crate with performance optimizations
build(deps): bump flate2 from 1.1.4 to 1.1.5 by @dependabot[bot] in #3071
build(deps): bump human-panic from 2.0.3 to 2.0.4 by @dependabot[bot] in #3077
deps: bump Polars from 0.51.0 at py-1.35.0-beta.1 to 0.52.0 618edf0
build(deps): bump qsv-stats from 0.39.1 to 0.40.0 by @dependabot[bot] in #3078
build(deps): bump actions/upload-artifact from 4 to 5 by @dependabot[bot] in #3074
applied several clippy lint suggestions
bumped several indirect dependencies
align nightly to 2025-10-24, the same nightly as Polars
bumped MSRV to Rust 1.91

Fixed

describegpt: add SQL escaping to eliminate SQL injection attack vector; add .csv extension to --sql-output when Polars SQL query runs successfully ad52a35
frequency: fix --select option always returning <ALL_UNIQUE> #3082
fixed some publishing workflows

Removed

Removed SHA256 and replaced with mush faster, parallelizable BLAKE3 hash #3072 and #3080
publish: removed maximize-build-space step in workflows as it was not working as advertised
tests: removed target-cpu=native RUSTFLAG in CI tests to avoid intermittent SIGILL (Illegal Instruction) faults

Full Changelog: 8.1.1...9.1.0

see validate_no_schema benchmark ↩

Contributors

dependabot

Assets 14

Releases: dathere/qsv

16.1.0

[16.1.0] - 2026-02-15 📊 "The Accelerated Civic Intelligence (ACI) Release" 📊

🌟 Major Features

New pragmastat Command

Frequency Cache System

Improved UAX: Unified Documentation & Shell Completions

qsv MCP Server: Leaner Architecture

Added

Changed

Fixed

Removed

Contributors

Uh oh!

16.0.0

[16.0.0] - 2026-02-08 🤖 "The AI-Native Release" 🤖

🌟 Major Features

Smarter Date/DateTime Detection

Hardened Stats Cache

Enhanced MCP Server (16.0.0)

Breaking Changes

Added

Changed

Fixed

Removed

Contributors

Uh oh!

15.0.1

[15.0.1] - 2026-01-28

Changed

Fixed

Uh oh!

15.0.0

[15.0.0] - 2026-01-26 🖖🏻 "The Mind Meld Release" 🖖🏽

🌟 Major Features

Enhanced frequency Command

Enhanced describegpt Command

Enhanced stats Command

Enhanced transpose Command

Revamped sniff Command

NEW: color Command by @gurgeous

Enhanced MCP Server

Token Optimization 🚀

Tool Lazy Loading

Documentation & Integration

🏗️ Infrastructure & Quality

Testing

Documentation

📦 Dependencies

Major Updates

Minor Updates

CI/Actions

Other

🌍 Environment Variables

New

Updated

Migration Notes

Breaking Changes

Added

Contributors

Uh oh!

14.0.0

[14.0.0] - 2026-01-12 📦 "The qsv MCP for Everyone Release" 🎁

MCP Desktop Extension (Bundle) - One-Click Installation

Breaking Changes

Added

Changed

Fixed

Uh oh!

13.0.0

[13.0.0] - 2026-01-06 🦾 "The Statistical Data-Wrangling Agent Release" 🤖

MCP Server - Model Context Protocol Integration

Claude Agent SDK Helper Utilities

moarstats - Massive Statistical Expansion

Breaking Changes

Added

Changed

Fixed

AI Contributors

Contributors

New `pragmastat` Command

Enhanced `frequency` Command

Enhanced `describegpt` Command

Enhanced `stats` Command

Enhanced `transpose` Command

Revamped `sniff` Command

NEW: `color` Command by @gurgeous

`moarstats` - Massive Statistical Expansion