Releases: dathere/qsv
9.1.0
[9.1.0] - 2025-11-03
FAIRification continues to be a focus, as we tweak key commands that enable us to FAIRify raw data at blazing speed:
frequencyreceived significant updates in this release, including several new options that make compiling frequency distribution tables easier.describegptnow uses the much faster BLAKE3 hash as a cache key (10-20x faster than SHA256) and supports passing complex prompts more easily through the file system.- qsv-stats - the engine that powers both
statsandfrequencycommands - has been further optimized with the 0.40.0 release, to compile summary statistics as fast as possible - even for very large files - often one to two orders of magnitude faster (10 to 100x faster) than typical Python-based tools. - Polars has been upgraded to 0.52.0. This vectorized query engine allows us to support more tabular formats & analyze/query millions of rows in seconds in situ - all without loading the data into a database.
- the csv 1.4.0 crate has been tuned further to squeeze out even higher throughput - already ~2 million rows per second!1
These improvements prepare the ground for the upcoming MCP server on qsv pro, which will enable at-scale, configurable, interactive "Data Steward-in-the-loop", value-added FAIRification of privacy-sensitive files.
The qsv pro MCP server will handle not just CSVs but also other formats, including unstructured data - all processed locally on the desktop, without sending your raw data to the cloud.
It will produce AI-ready, standards-compliant metadata (starting with DCAT-US v3, Croissant and schema.org) - ideal context for AI applications and data governance efforts alike.
Added
frequency: add--pretty-jsonoption c67fd06frequency: add--rank-strategyoption #3075frequency: add-null-textoption #3082
Changed
describegpt: explicitly usefrequency's dense rank strategy dc3f270describegpt: allow--promptto be loaded from a text file b11a10cdescribegpt: use much faster BLAKE3 hash for cache keyfrequency: change default rank-strategy from min (AKA "1224" ranking) to dense (AKA "1223" ranking)lens: bumped csvlens from 0.13.0 to 0.14.0lens: automatically set to monochrome mode when using--findoption 8539869luau: bumped embedded Luau from 0.694 to 0.697 3e68e29stats: fingerprint hash now uses much-faster, parallelizable BLAKE3 instead of SHA256table: document that it also creates "aligned TSVs" and Fixed Width Format files aaa84b0- tests: change default Python to 3.13
- docs: documented that Extended Input Support (🗄️) does
.zipauto-decompression - docs: documented Limited Extended Input Support (🗃️)
- use latest qsv-tuned csv crate with performance optimizations
- build(deps): bump flate2 from 1.1.4 to 1.1.5 by @dependabot[bot] in #3071
- build(deps): bump human-panic from 2.0.3 to 2.0.4 by @dependabot[bot] in #3077
- deps: bump Polars from 0.51.0 at py-1.35.0-beta.1 to 0.52.0 618edf0
- build(deps): bump qsv-stats from 0.39.1 to 0.40.0 by @dependabot[bot] in #3078
- build(deps): bump actions/upload-artifact from 4 to 5 by @dependabot[bot] in #3074
- applied several clippy lint suggestions
- bumped several indirect dependencies
- align nightly to 2025-10-24, the same nightly as Polars
- bumped MSRV to Rust 1.91
Fixed
describegpt: add SQL escaping to eliminate SQL injection attack vector; add.csvextension to--sql-outputwhen Polars SQL query runs successfully ad52a35frequency: fix--selectoption always returning<ALL_UNIQUE>#3082- fixed some publishing workflows
Removed
- Removed SHA256 and replaced with mush faster, parallelizable BLAKE3 hash #3072 and #3080
- publish: removed
maximize-build-spacestep in workflows as it was not working as advertised - tests: removed
target-cpu=nativeRUSTFLAG in CI tests to avoid intermittent SIGILL (Illegal Instruction) faults
Full Changelog: 8.1.1...9.1.0
8.1.1
[8.1.1] - 2025-10-22
Added
Changed
- deps: use latest version of qsv-tuned csv crate 7523e08
- deps: unpin zip from 4.6 and bump to 6 now that geosuggest uses it 957ad6d
- build(deps): bump dns-lookup from 3.0.0 to 3.0.1 by @dependabot[bot] in #3057
- build(deps): bump geosuggest-utils from 0.8.0 to 0.8.1 by @dependabot[bot] in #3058
- build(deps): bump geosuggest-core from 0.8.0 to 0.8.1 by @dependabot[bot] in #3059
- build(deps): bump memmap2 from 0.9.8 to 0.9.9 by @dependabot[bot] in #3060
- build(deps): bump pyo3 from 0.27.0 to 0.27.1 by @dependabot[bot] in #3061
- tweaked several publishing and test GH Actions workflows
- applied
clippy::to_string_in_format_argslint suggestion - bumped several indirect dependencies
Fixed
- use latest csvlens patched fork that fixes panic when using stdin input 34154e6
New Contributors
Full Changelog: 8.1.0...8.1.1
8.1.0
[8.1.0] - 2025-10-20
This minor release features:
- qsv on IBM Z mainframes (s390x)! - now that we have endianness detection, even adding a prebuilt binary for it.
describegpt: Output Kind and Token Usage have been added to the output making it easier to parse responses and track LLM costs.python: with the latest pyO3.rs 0.27 crate, we're setting the stage to drop support for Python 3.12 and below, targeting free-threaded Python exclusively starting with the 9.0 release. This should allow us to massively boost performance by parallelizingpyworkloads.
It will also power the upcoming FAIRification commands.- a tuned csv fork based on the just released csv 1.4 crate, increasing performance suite-wide.
Added
describegpt: add Kind and Token Usage to output a21e117- add big-endian handling for big-endian platforms (e.g.
s390x-unknown-linux-gnu) #3045 - add s390x prebuilt binary (qsv now runs on IBM Z Mainframes!) a3f455c
Changed
datefmt: Replacelocalzonecrate withiana-time-zonecrate #3048geoconvert: Improved with the latest geozero fixes needed for Datapusher+ processing of GeoJSON and SHP files.python: micro-optimize to remove unnecessary clone; use more idiomatic error_result handling - 777aa14- docs: update badges with PowerPC Linux GNU, Windows ARM64 MSVC, remove macOS Intel by @rzmk in #3036
- deps: bump bitflags from 2.9.4 to 2.10.0 8d65c1b
- deps: bumped csv crate to 1.4 and reapplied qsv optimizations. For more info, see 4e2f2a0
- deps: bump csvs_convert patch fork 8aa398f
- deps: bump geozero to latest upstream with unreleased fixes - 0a9d1b3
- deps: bump polars to 0.51.0 at py-1.35.0-beta-1 tag
- deps: bump socket2 from 0.6.0 to 0.6.1
- deps: bump whatlang to 0.18 e80e9c0
- build(deps): bump actions/setup-python from 5.0.0 to 6.0.0 by @dependabot[bot] in #3030
- build(deps): bump actix-governor from 0.8.0 to 0.10.0 by @dependabot[bot] in #3046
- build(deps): bump gzp from 1.0.1 to 2.0.0 by @dependabot[bot] in #3033
- build(deps): bump github/codeql-action from 3 to 4 by @dependabot[bot] in #3034
- build(deps): bump flexi_logger from 0.31.4 to 0.31.5 by @dependabot[bot] in #3032
- build(deps): bump flexi_logger from 0.31.5 to 0.31.6 by @dependabot[bot] in #3035
- build(deps): bump flexi_logger from 0.31.6 to 0.31.7 by @dependabot[bot] in #3038
- build(deps): bump libc from 0.2.176 to 0.2.177 by @dependabot[bot] in #3040
- build(deps): bump pyo3 from 0.26.0 to 0.27.0 by @dependabot[bot] in #3055
- build(deps): bump qsv_docopt from 1.8.0 to 1.9.0 by @dependabot[bot] in #3041
- build(deps): bump regex from 1.11.3 to 1.12.1 by @dependabot[bot] in #3043
- build(deps): bump regex from 1.12.1 to 1.12.2 by @dependabot[bot] in #3050
- build(deps): bump reqwest from 0.12.23 to 0.12.24 by @dependabot[bot] in #3049
- build(deps): bump rust_decimal from 1.38.0 to 1.39.0 by @dependabot[bot] in #3047
- build(deps): bump simd-json from 0.16.0 to 0.17.0 by @dependabot[bot] in #3031
- build(deps): bump tikv-jemallocator from 0.6.0 to 0.6.1 by @dependabot[bot] in #3053
- build(deps): bump tokio from 1.47.1 to 1.48.0 by @dependabot[bot] in #3052
- applied select clippy lint suggestions
- updated indirect dependencies
Fixed
headers: fix stdin handling without explicit-for stdin input #3039
Removed
- removed Python 3.10 prebuilts as py03 0.27 no longer supports it and Python 3.10 is no longer maintained
- deps: removed patched fork of time-rs now that 0.3.43 has been released fde03b3
Full Changelog: 8.0.0...8.1.0
8.0.0
[8.0.0] - 2025-10-06
1
Findable, Accessible, Interoperable & Reusable (FAIR) Data is AI-Ready Data.
A week and a half after launching our "People's API" AI Chatbot and "AI-Ready" service, we fine-tune qsv further, as it powers the FAIRification engine that allows us to "open your data" (as a verb) - to infer and calculate AI-Ready, FAIR metadata at blazing speed even for large datasets.
This release features:
describegptfixes and improvementstablecan now produce "aligned" TSV and Fixed Width format filesvalidatenow has Extended Input Support in its RFC 4180 validation modeextdedupfixed to dedupe arbitrarily large csv or text filesluauupgraded from 0.690 to 0.693- PowerPC64 pre-built binaries - making it more convenient to use qsv on this "power"ful 😉 platform that's widely used in research (thanks to IBM-provided access to its native GitHub Action ppc64le runners! For the next release - qsv on IBM Z Mainframes!)
These changes set the stage for even more advanced, powerful, configurable FAIRification capabilities to
make ALL your Data AI-Ready, Useful, Usable & Used by Machines & Humans alike.
Added
table: addleftendtabalignment option #3004table: addleftfwf(Fixed Width Format) alignment option 590c861validate: add Extended Input Support to RFC 4180 validation mode #3012- added PowerPC64 LE Linux prebuilt
Changed
describegpt: fine-tuned default LLM Prompt template (v3.1.0) 00e52a3 6b09b7e 5be7f2eluau: bump embedded Luau from 0.690 to 0.693 #3017schema: make Decimal Type Scale configurable for polars schema withQSV_POLARS_DECIMAL_SCALEenv var - f20edd5- updated optimized csv crate, adding non-allocating
StringRecord::trim()and moreinline()s 4a1c82a - deps: bump calamine to 0.31.0 bd7a04c
- deps: Bump polars to 0.51.0 from 0.50.0 at py-1.33.1 tag #2995
- deps: bump polars to 0.51.0 at py-1.34.0-beta.4 tag at revision b973cac (latest upstream) #3022
- deps: bump polars to 0.51.0 at py-1.35.0 tag revision b973cac 4164875
- deps: replace tabwriter with renamed fork qsv-tabwriter #3010
- deps: use patched fork of whatlang-rs. Though our PR was merged, there is still no new release 6afff4f
- build(deps): bump base62 from 2.2.2 to 2.2.3 by @dependabot[bot] in #3003
- build(deps): bump bytemuck from 1.23.2 to 1.24.0 by @dependabot[bot] in #3026
- build(deps): bump chrono from 0.4.41 to 0.4.42 by @dependabot[bot] in #2974
- build(deps): bump fancy-regex from 0.16.1 to 0.16.2 by @dependabot[bot] in #3000
- build(deps): bump flate2 from 1.1.2 to 1.1.3 by @dependabot[bot] in #3027
- build(deps): bump flexi_logger from 0.31.2 to 0.31.3 by @dependabot[bot] in #3005
- build(deps): bump flexi_logger from 0.31.3 to 0.31.4 by @dependabot[bot] in #3008
- build(deps): bump indexmap from 2.11.0 to 2.11.1 by @dependabot[bot] in #2973
- build(deps): bump indexmap from 2.11.1 to 2.11.3 by @dependabot[bot] in #2993
- build(deps): bump indexmap from 2.11.3 to 2.11.4 by @dependabot[bot] in #2999
- build(deps): bump libc from 0.2.175 to 0.2.176 by @dependabot[bot] in #3009
- build(deps): bump mlua from 0.11.3 to 0.11.4 by @dependabot[bot] in #3021
- build(deps): bump regex from 1.11.2 to 1.11.3 by @dependabot[bot] in #3011
- build(deps): bump redis from 0.32.5 to 0.32.6 by @dependabot[bot] in #3016
- build(deps): bump qsv-stats from 0.38.0 to 0.39.0 by @dependabot[bot] in #3028
- build(deps): bump qsv-stats from 0.39.0 to 0.39.1 by @dependabot[bot] in #3029
- build(deps): bump redis from 0.32.6 to 0.32.7 by @dependabot[bot] in #3025
- build(deps): bump serde from 1.0.219 to 1.0.223 by @dependabot[bot] in #2983
- build(deps): bump serde from 1.0.223 to 1.0.224 by @dependabot[bot] in #2988
- build(deps): bump serde from 1.0.224 to 1.0.225 by @dependabot[bot] in #2994
- build(deps): bump serde from 1.0.225 to 1.0.226 by @dependabot[bot] in #3002
- build(deps): bump serde from 1.0.226 to 1.0.227 by @dependabot[bot] in #3014
- build(deps): bump serde from 1.0.227 to 1.0.228 by @dependabot[bot] in #3019
- build(deps): bump serde_json from 1.0.143 to 1.0.145 by @dependabot[bot] in #2981
- build(deps): bump semver from 1.0.26 to 1.0.27 by @dependabot[bot] in #2982
- build(deps): bump sysinfo from 0.37.0 to 0.37.1 by @dependabot[bot] in #3015
- build(deps): bump sysinfo from 0.37.1 to 0.37.2 by @dependabot[bot] in #3024
- build(deps): bump tempfile from 3.21.0 to 3.22.0 by @dependabot[bot] in #2975
- build(deps): bump tempfile from 3.22.0 to 3.23.0 by @dependabot[bot] in #3007
- build(deps): bump toml from 0.9.6 to 0.9.7 by @dependabot[bot] in #3001
- pin zip to 4.6, as zip 5 has features that are not widely adopted b231a23
- applied select clippy lint suggestions
- updated indirect dependencies
- bumped MSRV to Rust 1.90
Fixed
describegpt: init cache vars even when --no-cache is used #2970describegpt:--base-urloption being ignored #2977schema: delimiter detection #2998extdedup: really use memmapped ondisk hash table #3020
Removed:
- removed powerpc64-le cross-compilation directive now that we have access to IBM-provided native PowerPC GH Action runner 9659bfc
- removed macOS on Intel (x86_64-apple-darwin) prebuilt binaries
Full Changelog: 7.1.0...8.0.0
-
SangyaPundir, CC BY-SA 4.0 https://creativecommons.org/licenses/by-sa/4.0, via Wikimedia Commons https://commons.wikimedia.org/wiki/File:FAIR_data_principles.jpg ↩
7.1.0
[7.1.0] - 2025-09-06
🇮🇹 csv,conf,v9 edition 🍝
![]() |
Just in time for csv,conf,v9, we're Bologna-bound and will be talking all things qsv, CSV, open data, metadata standards, AI, POSE and CKAN! For this feature release, we polished describegpt a bit more for the occasion...Towards the "People's API!"! Verso l'API del Popolo! (Answering People/Policymaker Interface) |
🚀 Enhanced describegpt Command
- Configurable Frequency Limits: Make frequency distribution limit configurable for better control over data analysis
- Few-shot Learning: Add
--fewshot-examplesoption to improve LLM response quality with contextual examples - Advanced SQL Generation: Fine-tuned SQL generation guidance for better date handling and query optimization
- Conditional SQL Results: Implement conditional
--sql-resultsformat for more efficient "SQL RAG" processing - i.e. if the generated SQL query executes successfully - the results are saved to the specified file with a.csvextension. If a "SQL hallucination" fails, the file is saved with a.sqlextension instead for the user to tweak and edit. - TogetherAI Support: Add support for TogetherAI models endpoint, expanding LLM provider options
- Enhanced Error Handling: Improved SQL parsing error handling and more informative error messages
- Disk Cache by Default: The disk cache is now enabled by default for better performance
- TOML Configuration: Migrate from JSON to more readable TOML format for more easily modifiable prompt files.
(see https://github.com/dathere/qsv/blob/master/resources/describegpt_defaults.toml) - Better Local LLM Support:
--api-keycan now be set to NONE for local LLM configurations that may not necessarily run onlocalhost(e.g. a shared Local LLM service running on the local network)
partition Command Enhancements
- New
--limitOption: Implement--limitoption to set the maximum number of open files - Streaming to Enhanced Batching Logic: Convert from streaming to a simplified, two-pass batched approach designed to partition on columns with high cardinality for very large datasets
Added
describegpt: add configurable frequency limit #2950describegpt: migrate prompt file from JSON to more easier to edit TOML format #2954describegpt: refactor default prompt file; add--fewshot-examplesoption #2955describegpt: add TogetherAI support for models endpoint #2965partition: add--limitoption #2960- added Windows ARM64 prebuilt binaries
Changed
describegpt: enable disk cache by default #2951describegpt: Polars SQL generation tweaks #2958python: replace deprecatedwith_gilwithattach#2949. This sets the stage for "free-threaded" Python 3.14 support when its released in October 2025. Buh-bye GIL!- deps: bump embedded Luau from 0.688 to 0.690 #2967
- deps: bump Polars to 0.50.0 at py-1.33.0 tag
- build(deps): bump actions/setup-python from 5.6.0 to 6.0.0 by @dependabot[bot] in #2962
- build(deps): bump actions/stale from 9 to 10 by @dependabot[bot] in #2963
- build(deps): bump log from 0.4.27 to 0.4.28 by @dependabot[bot] in #2961
- build(deps): bump mlua from 0.11.2 to 0.11.3 by @dependabot[bot] in #2948
- build(deps): bump pyo3 from 0.25.1 to 0.26.0 by @dependabot[bot] in #2946
- build(deps): bump uuid from 1.18.0 to 1.18.1 by @dependabot[bot] in #2956
- build(deps): bump zip from 4.5.0 to 4.6.0 by @dependabot[bot] in #2952
- applied select clippy lints
- updated indirect dependencies
Full Changelog: 7.0.1...7.1.0
7.0.1
[7.0.1] - 2025-08-28
A patch release with some minor bug fixes, benchmark tweaks and build system improvements.
Added
- publish: add dedicated powerpc64le-unknown-linux-gnu publishing workflow (WIP)
Changed
- docs:
describegptexpanded error message about LLM URL or API key - deps: remove planus pinned dependency
Fixed
- fix:
geocode--batch 0causes panic when polars feature is enabled - publish: remove luau feature from x86_64-pc-windows builds that was causing builds to fail
- publish: remove powerpc64le from main publish workflow
- benchmarks: updated to v6.8.0 with fixes to luau and clustered sample benchmarks
Full Changelog: 7.0.0...7.0.1
7.0.0
[7.0.0] - 2025-08-28
🥳 Open Weights with Open Data, Local LLM 🤖 edition 🚀
This is the biggest release yet - 470+ commits since v6.0.1! Packed with new AI-powered features, fixes and significant performance improvements suite-wide!
With the release of OpenAI's gpt-oss open-weight reasoning model earlier this month setting the stage, we continue on our "Automagical Metadata" journey by revamping describegpt.
🤖 Revamped describegpt - AI-Powered Metadata Inferencing and Data Analysis:
- Intelligent Metadata Generation: Automatically generate comprehensive metadata - Data Dictionaries, Description and Tags for your Datasets using Large Language Models (LLM) prompted with summary statistics and frequency tables as detailed context - without sending your data to the cloud!
Even if you elect to use a cloud-based LLM, your Raw Data is never sent. - Chat with your Data: If your prompt can be answered using this high-quality, high-resolution Metadata,
describegptwill answer it! If your prompt is not remotely related to the data, it will politely refuse - "I'm sorry, I can only answer questions about the Dataset." - Auto SQL RAG Mode: Should the LLM decide that it doesn't have the necessary information in the metadata it compiled to answer your prompt, it will automatically enter SQL Retrieval-Augmented Generation (RAG) mode - using the rich metadata instead as context to craft an expert-level, deterministic, reproducible, "hallucination-free" SQL query1 to respond to your prompt.
- Database Engine Support: If DuckDB is installed or the Polars feature is enabled, and
--sql-results <ANSWER.CSV>is specified - an optimized SQL query will be automatically executed with the query results saved to the specified file.
As both DuckDB and Polars are purpose-built OLAP engines that support direct queries (no database pre-loading required), you get answers in a few seconds2 - even for very large datasets. - Multi-LLM Support: Works with any OpenAI-API compatible LLM - with special support for local LLMs like Ollama, Jan and LM Studio, with the ability to customize model behavior with the
--addl-propsoption. - Advanced Caching: Disk and Redis caching support for performance and cost optimization.
- Flexible Prompting: Custom prompt files and built-in intelligent templates for various analysis tasks.
Check out these examples using a 1 million row sample of NYC's 311 data!
--alloption produces a Data Dictionary, Description and Tags - Markdown, JSON- --prompt "What are the top 10 complaint types per community board and borough?" - SQL result
--prompt "How tall is the Empire State Building?"- "I'm sorry, I can only answer questions about the Dataset."
On top of other improvements in Datapusher+ with its new Jinja-based "metadata suggestion engine" - we're using this AI-inferred metadata along with other precalcs to prepopulate DCATv3 (both US and European profiles) and Croissant metadata fields that are otherwise too hard and expensive to compile manually.
The inferred and precalculated metadata values are offered as "suggestions", using a UI/UX purpose-built to facilitate interactive metadata curation chats.
This allows Data Stewards to compile high-quality, high-resolution metadata catalogs with an accelerated "Data Steward in the Loop" data ingestion and metadata curation workflow.
If you want to see and learn more, we're Bologna-bound to attend csv,conf,v9 to present and share how we're using this to auto-infer metadata in CKAN. Hope to see you there!
Towards the People's API!
(Answering People/Policymaker Interface)
📊 Enhanced frequency Command:
- Rank Column: Ranking of frequency results for better data insights
- JSON Output Mode: New
--jsonoption not only provides structured output beyond the default CSV format - it also takes advantage of JSON's nested support to include 15 additional summary statistics per field - Performance Boost: Speed improvements with SIMD-accelerated number parsing, remaining performant even with the added functionality
⚡ stats Command Improvements:
- Faster Still: Enabled by improvements in the underlying qsv-stats crate
- Improved Precision: Faster, streamlined precision calculation
- SIMD Number Parsing: Hardware-accelerated parsing for int/float values
- Unix Epoch Support: Proper handling of Unix timestamp 0 as valid date
- Enhanced Date Inference: Better date and boolean type inference capabilities
🔧 validate & schema Enhancements:
- Fancy Regex Support: You can now use "advanced" regex features with your JSON Schema patterns with the
--fancy-regexoption. Previously, you can only use the standard Rust regex engine which does not support backreferences or look-arounds (for performance reasons) - JSON Schema Improvements: Better error handling and format validation options
- Schema Validation Refinements: More granular validation control with
--no-format-validation
🔄 rename Reverted and Improved:
When pairwise renaming was introduced in v6.0.0, it broke some some workflows. It's now fixed by introducing two modes:
- Positional Mode: Renaming by position is now once again the default
- Pairwise Mode: New
--pairwiseflag for column renaming by column pairs
🗂️ partition Improvements:
- Case-Insensitive Safety: Improved case-aware partitioning algorithm. Previously, case insensitive file systems like macOS APFS and Windows NTFS was causing incorrect partitioning of case-sensitive values
- Faster still: With better use of I/O bufferring - with deferred, batched, async writes instead of after every record
Added
frequencyadd rank info to frequency table #2878frequencyadd--jsonoutput option #2868validateadd--fancy-regexoption #2845- add CPU-accelerated, mem-mapped, chunked sha256 file checksum helper #2909
Changed
applyuse SIMD-accelerated base64-simd crate for Encode64 and Decode64 operations #2863statsfaster precision calculation #2852- perf: Use simd_json instead of serde_json to serialize to JSON #2884
- refactor: create and use reqwest client helpers to eliminate redundant code #2888
- perf: Faster parallelized sha256 hash file #2918
- refactor:
describegpt#2890 - refactor:
describegptsetting--timeoutto 0 sets no timeout #2891 - refactor:
describegptmore refinements #2892 - feat:
describegptrefactor round3 #2893 - feat:
describegptdisk & redis caching #2895 - refactor:
describegpt#2896 - refactor:
describegptcreateget_cache_keyhelper; customizable stats options #2902 - feat:
describegptauto SQL RAG for--prompt#2904 - feat:
describegptmajor refactor #2913 - refactor:
describegptdefault promptfile is now embedded in qsv binary; fine-tune tests #2924 - feat:
describegptreturning reasoning with --json option #2926 - feat:
describegptadd DuckDB support in SQL RAG mode #2929 - feat:
describegptvarious DuckD...
6.0.1
[6.0.1] - 2025-07-12
This is a patch release with bug fixes and minor improvements.
Changed
- feat: updated completions for qsv v6.0.0 by @rzmk in #2838
- docs: updated sample schema.json based on NYC311 1M row sample benchmark data
- docs: updated sample stats output using NYC 311 1M row sample benchmark data
- build(deps): bump chrono-tz from 0.10.3 to 0.10.4 by @dependabot[bot] in #2839
- build(deps): bump qsv-stats from 0.35.0 to 0.36.0 by @dependabot[bot] in #2840
- bumped indirect dependencies
- Added benchmark_data.* to .gitignore
Fixed
geocode: make--batch=0mode more robust by setting a minimum batch size of 1,000 rows 2fa90bcjsonl: correct batchsize calculation to use input file instead of output file for line counting 742dc77benchmarks: fixed benchmarks with unescaped parameters with embedded spaces ad95596
Removed
- Removed retired publishing workflows (linux-glibc-231-musl-123 and wix-installer)
Full Changelog: 6.0.0...6.0.1
6.0.0
Highlights:
This is a major release with significant improvements and new features!
🔍 Enhanced lens command:
- File prompt support: You can now load prompts from files using the new
file:support, making it easier to reuse complex prompts - Wrap mode option: Added
--wrap-modeoption for better text display control when viewing data - Improved examples: Enhanced usage examples and documentation
🔄 Improved rename command:
- Pair-based renaming: Easier column renaming with more intuitive syntax for bulk operations.
📊 Enhanced sort command:
- Natural sorting: Added
--naturaloption for human-friendly sorting (e.g., "file1.txt", "file2.txt", "file10.txt" sorts "naturally"; previously lexicographical sorting would sort it as "file1.txt", "file10.txt", "file2.txt")
⚡ Performance improvements:
- Memory optimizations: Multiple performance enhancements across
frequency,stats,validate, andtransposecommands - Buffer optimizations: Improved I/O performance with better buffer sizing for various operations
- Polars engine upgrade: Updated to the latest Polars 0.49.x series for better performance and stability
🔧 Enhanced validation:
- Robust JSON Schema validation: More granular error messages and better schema validation
- Improved error reporting: Clearer messages to help debug validation issues
- UTF-8 handling: Better handling of invalid records with improved debug output
🌐 Geocoding improvements:
- Updated geosuggest: Bumped to version 0.8 with direct index update support for better geocoding performance
🔗 SQL enhancements (joinp and sqlp):
- Decimal comma support: Added
--decimal-commaoption for writing operations, improving international data support - Better validation: Enhanced delimiter and decimal comma validation
🏗️ Infrastructure updates:
- Rust 1.88 MSRV: Updated minimum supported Rust version
- Dependency updates: Comprehensive updates to all major dependencies including Polars, Tokio, and many others
- Compilation optimizations: Various improvements for faster builds and better runtime performance
Added
New Features:
lens: addfile:support to load prompts from files #2805lens: add--wrap-modeoption #2805rename: pair-based renaming for easier bulk column renaming #2806sort: add--naturaloption for natural/human-friendly sorting #2808schema: set JSON Schema description to the command line used for generationjoinp&sqlp: add--decimal-commaoption for writing operationsjoinp: add--decimal-commaand--delimitervalidationsqlp: add--decimal-commaand--delimitervalidationvalidate: more robust JSON Schema schema validation with granular error messagesvalidate: show invalid record in debug format for UTF-8 failures- Enhanced completions for qsv v5.1.0 and v6.0.0
Documentation & Examples:
lens: improved examples in usage textschema: expand examples and add-Pshortcut for--promptoptionsqlp: update description to note support for input beyond CSVs- Polars SQL documentation noting it's a PostgreSQL dialect
- Added link to Polars 0.49.0 release notes
- MSRV documentation updated to Rust 1.88
- Additional conditions for when to use "portable" binaries
Changed
Performance Improvements:
frequency: microoptimize null value handling and preallocate vectorsstats: preallocate with_capacity for Unsorted struct and coefficient of variation handling improvementstranspose: performance refactoring with optimized buffer handlingvalidate: microoptimizations for JSON instance handling and buffer capacity improvementsapply: bigger reader buffer as apply is batch oriented- Enabled setter for read and write buffer sizing configuration
- Various microoptimizations across commands
Polars Engine Updates:
- Bumped Polars from 0.48 to 0.49.x series
- Adapted to new Polars PlPath API
- Updated to use latest Polars upstream throughout development cycle
- Enabled simd-json compiler hints feature on nightly builds
Dependency Updates:
-
Major updates:
- Polars: 0.48 → 0.49.x
- Tokio: 1.45.1 → 1.46.1
- qsv-stats: 0.33.0 → 0.35.0
- kiddo: 5.0.3 → 5.2.2
- indexmap: 2.9.0 → 2.10.0
- calamine: updated to latest upstream
- redis: 0.32.2 → 0.32.3
- sysinfo: 0.35.2 → 0.36.0
- geosuggest: bumped to 0.8
-
Build dependencies:
- flexi_logger: 0.31.0 → 0.31.2
- arboard: 3.5.0 → 3.6.0
- minijinja: 2.10.2 → 2.11.0
- minijinja-contrib: 2.10.2 → 2.11.0
- zip: 4.1.0 → 4.3.0
- reqwest: 0.12.20 → 0.12.22
- indicatif: 0.17.11 → 0.17.12
- phf: 0.11.3 → 0.12.1
- human-panic: 2.0.2 → 2.0.3
- jaq-std: 2.1.1 → 2.1.2
- jaq-core: 2.2.0 → 2.2.1
- jaq-json: 1.1.2 → 1.1.3
Code Quality & Maintenance:
- Applied clippy lint suggestions including
collapsible_if,needless_return,redundant_clone, andmanual_is_multiple_of - Updated MSRV to Rust 1.88
- Set nightly to 2025-06-27
- Removed hardware-lock-elision feature on parking_lot
- No longer use similar-asserts crate, reverted to standard assert_eq
- Better TOML formatting
- Removed unneeded dependency aliases
- Various code refactoring for better maintainability
Infrastructure:
- Updated csvlens integration with natural sorting support
- Switched dependency management approaches for better upstream compatibility
- Pin plist to 1.7.3 to avoid unnecessary quick-xml bumps
- Use latest calamine upstream consistently
Fixed
validate: clearer JSON Schema schema error messages to differentiate validation typesround_num(): should return an empty string ifdec_f64.is_nan()joinp: non-equi-join test result order deterministic issues- Enhanced Snappy file decompression robustness
- Fixed geometric mean calculation in stats
- Better UTF-8 record validation with debug output
- Various test adjustments to account for dependency updates and behavior changes
- Resolved several clippy warnings and code quality issues
Test Updates:
rename: add pair-renaming testssort: add natural sort testsjoinp: add decimal_comma testssqlp: add decimal-comma validation testsvalidate: add JSON Schema schema validation testsstats: adjust test cases for qsv-stats 0.35.0 changesexcel: re-enable and revert formula tests based on upstream changes
Development Notes
Benchmarks:
- Comprehensive benchmarking for versions 5.1.0 and 6.6.1
- Performance comparisons available for major operations
Continuous Integration:
- Multiple dependency updates via Dependabot automation
- Comprehensive test coverage maintained throughout development
- Regular upstream synchronization with Polars and other major dependencies
Pull Requests
NOTE: The changelog entries below only document changes with a corresponding PR. Several changes were committed to master directly and are documented in the release highlights above.
Added
lens: add--wrap-modeoption in #2805rename: add pair-based renaming in #2806sort: add--naturalsort option in #2808
Changed
geocode: now uses the faster geosuggest 0.8 crate.index-updatesubcommand now generates command to use geosuggest crate directly to update/create the index instead of doing it internally.schema: when generating JSON schema, description property set to cmdline used to generate the JSON schema in #2796sqlp&joinp:--decimal-commaoption is not only for parsing input CSVs, it's also used when writing output CSVs in #2800transpose: performance refactoring in #2827validateimproved JSON Schema schema validation in #2803- update completions for qsv v5.1.0 by @rzmk in #2804
- dep: bump polars to latest upstream - adapt to PlPath api reqt in #2822
- perf: bump to faster geosuggest to 0.8 in #2837
- build(deps): bump arboard from 3.5.0 to 3.6.0 by @dependabot[bot] in #2814
- build(deps): bump flexi_logger from 0.31.0 to 0.31.1 by @dependabot[bot] in #2801
- build(deps): bump flexi_logger from 0.31.1 to 0.31.2 by @dependabot[bot] in #2812
- build(deps): bump libc from 0.2.173 to 0.2.174 by @dependabot[bot] in #2794
- build(deps): bump human-panic from 2.0.2 to 2.0.3 by @dependabot[bot] in #2833
- build(deps): bump indicatif from 0.17.11 to 0.17.12 by @dependabot[bot] in #2818
- build(deps): bump jaq-std from 2.1.1 to 2.1.2 by @dependabot[bot] in #2830
- build(deps): bump jaq-core from 2.2.0 to 2.2.1 by @dependabot[bot] in #2831
- build(deps): bump jaq-json from 1.1.2 to 1.1.3 by @dependabot[bot] in #2832
- build(deps): bump minijinja from 2.10.2 to 2.11.0 by @dependabot[bot] in #2815
- build(deps): bump minijinja-contrib from 2.10.2 to 2.11.0 by @dependabot[bot] in #2816
- build(deps): bump phf from 0.11.3 to 0.12.1 by @dependabot[bot] in #2797
...
5.1.0
[5.1.0] - 2025-06-17
Highlights
-
lensis now colorful by default, with a--monochromeoption to turn it off:qsv lens /tmp/NYC_311_SR_2010-2020-sample-1M.csv
-
lenscan now have custom prompts with the--promptoption (with support for ANSI escape codes to format the prompt). Meant to be paired with the--echo-column <colname>option, e.g.:qsv lens --prompt $'\033[1;5;31mBlinking red, bold text\033[0m' --echo-column 'Unique Key' \ /tmp/NYC_311_SR_2010-2020-sample-1M.csv
- the
qsv-statscrate - the underlying engine behind the centralstats,frequencyand "smart" commands, got a lot of love in this release validategot a tad faster while decreasing its memory footprint. The new--no-format-validationoption now also allows you to ignore all JSON Schema "format" keywords (e.g. date, email, url, currency, etc.) when validating CSVs.
Added
lens: add--promptoption, add examples to regex-enabled options #2772lens: add--monochromeoption, otherwise, columns displayed in different colors #2761validate: add--no-format-validationoption when in JSON Schema mode #2762- docs: add shell completions badges by @rzmk in #2760
- feat: added criterion trim algorithm microbenchmarks #2789
Changed
frequency: performance microoptimizations - use stats cache column cardinality to pre-alloc & size frequency hash tablesgeocode: refactor regex handling for performance & maintainabilityjson: preserve key order #2777stats: performance microoptimizations - useunwrap_unchecked()instead of justunwrap()in hot sampling functionsvalidate: major refactoring for added performance/memory efficiency- chore: temporarily use qsv-calamine until a new calamine is released #2790
- Bump cpc from 1.9 to 2 #2770
- deps: bump criterion from 0.5 to 0.6 #2791
- deps: use latest csvlens upstream with colorful columnshttps://github.com/dathere/qsv/commit/f2c9322e33a0ac335dafec10a490c871d3de0a6c
- deps: temporarily use qsv-calamine until a new calamine is released #2790
- deps: bump our patched forks of
cached,csvs_convert,json-objects-to-csv,jsonschema,localzone,rfd,self_updateuntil PRs are merged or new releases are made - deps: bump zip from 3 to 4 in 75909d2
- deps: bump polars to 0.48.1 at 49ce57a revision
- build(deps): bump atoi_simd from 0.16.0 to 0.16.1 by @dependabot in #2766
- build(deps): bump bytemuck from 1.23.0 to 1.23.1 by @dependabot in #2778
- build(deps): bump flate2 from 1.1.1 to 1.1.2 by @dependabot in #2781
- build(deps): bump flexi_logger from 0.30.1 to 0.30.2 by @dependabot in #2765
- build(deps): bump flexi_logger from 0.30.2 to 0.31.0 by @dependabot in #2793
- build(deps): bump hashbrown from 0.15.3 to 0.15.4 by @dependabot in #2779
- build(deps): bump libc from 0.2.172 to 0.2.173 by @dependabot in #2787
- build(deps): bump mimalloc from 0.1.46 to 0.1.47 by @dependabot in #2792
- build(deps): bump mlua from 0.10.3 to 0.10.5 by @dependabot in #2758
- build(deps): bump num_cpus from 1.16.0 to 1.17.0 by @dependabot in #2771
- build(deps): bump parking_lot from 0.12.3 to 0.12.4 by @dependabot in #2768
- build(deps): bump pyo3 from 0.25.0 to 0.25.1 by @dependabot in #2785
- deps: upgrade qsv-stats from 0.32 to 0.33, which features major memory and performance optimizations behind the
stats&frequencycommands #2786 - deps: bump redis from 0.29.5 to 0.32
- build(deps): bump reqwest from 0.12.15 to 0.12.16 by @dependabot in #2764
- build(deps): bump reqwest from 0.12.16 to 0.12.18 by @dependabot in #2767
- build(deps): bump reqwest from 0.12.18 to 0.12.19 by @dependabot in #2773
- build(deps): bump reqwest from 0.12.19 to 0.12.20 by @dependabot in #2782
- build(deps): bump rust_decimal from 1.37.1 to 1.37.2 by @dependabot in #2788
- build(deps): bump smallvec from 1.15.0 to 1.15.1 by @dependabot in #2780
- build(deps): bump sysinfo from 0.35.1 to 0.35.2 by @dependabot in #2774
- build(deps): bump titlecase from 3.5.0 to 3.6.0 by @dependabot in #2775
- build(deps): bump tokio from 1.45.0 to 1.45.1 by @dependabot in #2759
- build(deps): bump uuid from 1.16.0 to 1.17.0 by @dependabot in #2757
- applied select clippy suggestions
- updated indirect dependencies
- set Rust nightly to 2025-05-21, the same nightly Polars uses 872ade1
Fixed:
- fix:
frequencyrecover from non-fatal absence of stats cache, instead of panicking b2821a0 - fix: flaky
jsontests caused by hardcoding name of intermediate file - 62ca310 - fix: flaky
reverseproperty tests by handling BOM characters cefd490 - fix:
util::process_inputhelper does not honorQSV_SKIP_FORMAT_CHECKwhen processing dir input #2784
Full Changelog: 5.0.3...5.1.0


