03 Nov 20:52

7718a14

9.1.0

[9.1.0] - 2025-11-03

FAIRification continues to be a focus, as we tweak key commands that enable us to FAIRify raw data at blazing speed:

frequency received significant updates in this release, including several new options that make compiling frequency distribution tables easier.
describegpt now uses the much faster BLAKE3 hash as a cache key (10-20x faster than SHA256) and supports passing complex prompts more easily through the file system.
qsv-stats - the engine that powers both stats and frequency commands - has been further optimized with the 0.40.0 release, to compile summary statistics as fast as possible - even for very large files - often one to two orders of magnitude faster (10 to 100x faster) than typical Python-based tools.
Polars has been upgraded to 0.52.0. This vectorized query engine allows us to support more tabular formats & analyze/query millions of rows in seconds in situ - all without loading the data into a database.
the csv 1.4.0 crate has been tuned further to squeeze out even higher throughput - already ~2 million rows per second!¹

These improvements prepare the ground for the upcoming MCP server on qsv pro, which will enable at-scale, configurable, interactive "Data Steward-in-the-loop", value-added FAIRification of privacy-sensitive files.

The qsv pro MCP server will handle not just CSVs but also other formats, including unstructured data - all processed locally on the desktop, without sending your raw data to the cloud.

It will produce AI-ready, standards-compliant metadata (starting with DCAT-US v3, Croissant and schema.org) - ideal context for AI applications and data governance efforts alike.

Added

frequency: add --pretty-json option c67fd06
frequency: add --rank-strategy option #3075
frequency: add -null-text option #3082

Changed

describegpt: explicitly use frequency's dense rank strategy dc3f270
describegpt: allow --prompt to be loaded from a text file b11a10c
describegpt: use much faster BLAKE3 hash for cache key
frequency: change default rank-strategy from min (AKA "1224" ranking) to dense (AKA "1223" ranking)
lens: bumped csvlens from 0.13.0 to 0.14.0
lens: automatically set to monochrome mode when using --find option 8539869
luau: bumped embedded Luau from 0.694 to 0.697 3e68e29
stats: fingerprint hash now uses much-faster, parallelizable BLAKE3 instead of SHA256
table: document that it also creates "aligned TSVs" and Fixed Width Format files aaa84b0
tests: change default Python to 3.13
docs: documented that Extended Input Support (🗄️) does .zip auto-decompression
docs: documented Limited Extended Input Support (🗃️)
use latest qsv-tuned csv crate with performance optimizations
build(deps): bump flate2 from 1.1.4 to 1.1.5 by @dependabot[bot] in #3071
build(deps): bump human-panic from 2.0.3 to 2.0.4 by @dependabot[bot] in #3077
deps: bump Polars from 0.51.0 at py-1.35.0-beta.1 to 0.52.0 618edf0
build(deps): bump qsv-stats from 0.39.1 to 0.40.0 by @dependabot[bot] in #3078
build(deps): bump actions/upload-artifact from 4 to 5 by @dependabot[bot] in #3074
applied several clippy lint suggestions
bumped several indirect dependencies
align nightly to 2025-10-24, the same nightly as Polars
bumped MSRV to Rust 1.91

Fixed

describegpt: add SQL escaping to eliminate SQL injection attack vector; add .csv extension to --sql-output when Polars SQL query runs successfully ad52a35
frequency: fix --select option always returning <ALL_UNIQUE> #3082
fixed some publishing workflows

Removed

Removed SHA256 and replaced with mush faster, parallelizable BLAKE3 hash #3072 and #3080
publish: removed maximize-build-space step in workflows as it was not working as advertised
tests: removed target-cpu=native RUSTFLAG in CI tests to avoid intermittent SIGILL (Illegal Instruction) faults

Full Changelog: 8.1.1...9.1.0

see validate_no_schema benchmark ↩

Contributors

dependabot

Assets 14

22 Oct 02:28

jqnatividad

8.1.1

c23ea3a

8.1.1

[8.1.1] - 2025-10-22

Added

docs: Seeded developer documentation for index/stats/frequency modules by @kulnor in #3056

Changed

deps: use latest version of qsv-tuned csv crate 7523e08
deps: unpin zip from 4.6 and bump to 6 now that geosuggest uses it 957ad6d
build(deps): bump dns-lookup from 3.0.0 to 3.0.1 by @dependabot[bot] in #3057
build(deps): bump geosuggest-utils from 0.8.0 to 0.8.1 by @dependabot[bot] in #3058
build(deps): bump geosuggest-core from 0.8.0 to 0.8.1 by @dependabot[bot] in #3059
build(deps): bump memmap2 from 0.9.8 to 0.9.9 by @dependabot[bot] in #3060
build(deps): bump pyo3 from 0.27.0 to 0.27.1 by @dependabot[bot] in #3061
tweaked several publishing and test GH Actions workflows
applied clippy::to_string_in_format_args lint suggestion
bumped several indirect dependencies

Fixed

use latest csvlens patched fork that fixes panic when using stdin input 34154e6

New Contributors

@kulnor made their first contribution in #3056

Full Changelog: 8.1.0...8.1.1

Contributors

kulnor and dependabot

Assets 13

20 Oct 10:43

jqnatividad

8.1.0

6ea24ae

8.1.0

[8.1.0] - 2025-10-20

This minor release features:

qsv on IBM Z mainframes (s390x)! - now that we have endianness detection, even adding a prebuilt binary for it.
describegpt: Output Kind and Token Usage have been added to the output making it easier to parse responses and track LLM costs.
python: with the latest pyO3.rs 0.27 crate, we're setting the stage to drop support for Python 3.12 and below, targeting free-threaded Python exclusively starting with the 9.0 release. This should allow us to massively boost performance by parallelizing py workloads.
It will also power the upcoming FAIRification commands.
a tuned csv fork based on the just released csv 1.4 crate, increasing performance suite-wide.

Added

describegpt: add Kind and Token Usage to output a21e117
add big-endian handling for big-endian platforms (e.g. s390x-unknown-linux-gnu) #3045
add s390x prebuilt binary (qsv now runs on IBM Z Mainframes!) a3f455c

Changed

datefmt: Replace localzone crate with iana-time-zone crate #3048
geoconvert: Improved with the latest geozero fixes needed for Datapusher+ processing of GeoJSON and SHP files.
python: micro-optimize to remove unnecessary clone; use more idiomatic error_result handling - 777aa14
docs: update badges with PowerPC Linux GNU, Windows ARM64 MSVC, remove macOS Intel by @rzmk in #3036
deps: bump bitflags from 2.9.4 to 2.10.0 8d65c1b
deps: bumped csv crate to 1.4 and reapplied qsv optimizations. For more info, see 4e2f2a0
deps: bump csvs_convert patch fork 8aa398f
deps: bump geozero to latest upstream with unreleased fixes - 0a9d1b3
deps: bump polars to 0.51.0 at py-1.35.0-beta-1 tag
deps: bump socket2 from 0.6.0 to 0.6.1
deps: bump whatlang to 0.18 e80e9c0
build(deps): bump actions/setup-python from 5.0.0 to 6.0.0 by @dependabot[bot] in #3030
build(deps): bump actix-governor from 0.8.0 to 0.10.0 by @dependabot[bot] in #3046
build(deps): bump gzp from 1.0.1 to 2.0.0 by @dependabot[bot] in #3033
build(deps): bump github/codeql-action from 3 to 4 by @dependabot[bot] in #3034
build(deps): bump flexi_logger from 0.31.4 to 0.31.5 by @dependabot[bot] in #3032
build(deps): bump flexi_logger from 0.31.5 to 0.31.6 by @dependabot[bot] in #3035
build(deps): bump flexi_logger from 0.31.6 to 0.31.7 by @dependabot[bot] in #3038
build(deps): bump libc from 0.2.176 to 0.2.177 by @dependabot[bot] in #3040
build(deps): bump pyo3 from 0.26.0 to 0.27.0 by @dependabot[bot] in #3055
build(deps): bump qsv_docopt from 1.8.0 to 1.9.0 by @dependabot[bot] in #3041
build(deps): bump regex from 1.11.3 to 1.12.1 by @dependabot[bot] in #3043
build(deps): bump regex from 1.12.1 to 1.12.2 by @dependabot[bot] in #3050
build(deps): bump reqwest from 0.12.23 to 0.12.24 by @dependabot[bot] in #3049
build(deps): bump rust_decimal from 1.38.0 to 1.39.0 by @dependabot[bot] in #3047
build(deps): bump simd-json from 0.16.0 to 0.17.0 by @dependabot[bot] in #3031
build(deps): bump tikv-jemallocator from 0.6.0 to 0.6.1 by @dependabot[bot] in #3053
build(deps): bump tokio from 1.47.1 to 1.48.0 by @dependabot[bot] in #3052
applied select clippy lint suggestions
updated indirect dependencies

Fixed

headers: fix stdin handling without explicit - for stdin input #3039

Removed

removed Python 3.10 prebuilts as py03 0.27 no longer supports it and Python 3.10 is no longer maintained
deps: removed patched fork of time-rs now that 0.3.43 has been released fde03b3

Full Changelog: 8.0.0...8.1.0

Contributors

dependabot and rzmk

Assets 13

06 Oct 00:43

jqnatividad

8.0.0

0773001

8.0.0

[8.0.0] - 2025-10-06

¹
Findable, Accessible, Interoperable & Reusable (FAIR) Data is AI-Ready Data.

A week and a half after launching our "People's API" AI Chatbot and "AI-Ready" service, we fine-tune qsv further, as it powers the FAIRification engine that allows us to "open your data" (as a verb) - to infer and calculate AI-Ready, FAIR metadata at blazing speed even for large datasets.

This release features:

describegpt fixes and improvements
table can now produce "aligned" TSV and Fixed Width format files
validate now has Extended Input Support in its RFC 4180 validation mode
extdedup fixed to dedupe arbitrarily large csv or text files
luau upgraded from 0.690 to 0.693
PowerPC64 pre-built binaries - making it more convenient to use qsv on this "power"ful 😉 platform that's widely used in research (thanks to IBM-provided access to its native GitHub Action ppc64le runners! For the next release - qsv on IBM Z Mainframes!)

These changes set the stage for even more advanced, powerful, configurable FAIRification capabilities to

make ALL your Data AI-Ready, Useful, Usable & Used by Machines & Humans alike.

Added

table: add leftendtab alignment option #3004
table: add leftfwf (Fixed Width Format) alignment option 590c861
validate: add Extended Input Support to RFC 4180 validation mode #3012
added PowerPC64 LE Linux prebuilt

Changed

describegpt: fine-tuned default LLM Prompt template (v3.1.0) 00e52a3 6b09b7e 5be7f2e
luau: bump embedded Luau from 0.690 to 0.693 #3017
schema: make Decimal Type Scale configurable for polars schema with QSV_POLARS_DECIMAL_SCALE env var - f20edd5
updated optimized csv crate, adding non-allocating StringRecord::trim() and more inline()s 4a1c82a
deps: bump calamine to 0.31.0 bd7a04c
deps: Bump polars to 0.51.0 from 0.50.0 at py-1.33.1 tag #2995
deps: bump polars to 0.51.0 at py-1.34.0-beta.4 tag at revision b973cac (latest upstream) #3022
deps: bump polars to 0.51.0 at py-1.35.0 tag revision b973cac 4164875
deps: replace tabwriter with renamed fork qsv-tabwriter #3010
deps: use patched fork of whatlang-rs. Though our PR was merged, there is still no new release 6afff4f
build(deps): bump base62 from 2.2.2 to 2.2.3 by @dependabot[bot] in #3003
build(deps): bump bytemuck from 1.23.2 to 1.24.0 by @dependabot[bot] in #3026
build(deps): bump chrono from 0.4.41 to 0.4.42 by @dependabot[bot] in #2974
build(deps): bump fancy-regex from 0.16.1 to 0.16.2 by @dependabot[bot] in #3000
build(deps): bump flate2 from 1.1.2 to 1.1.3 by @dependabot[bot] in #3027
build(deps): bump flexi_logger from 0.31.2 to 0.31.3 by @dependabot[bot] in #3005
build(deps): bump flexi_logger from 0.31.3 to 0.31.4 by @dependabot[bot] in #3008
build(deps): bump indexmap from 2.11.0 to 2.11.1 by @dependabot[bot] in #2973
build(deps): bump indexmap from 2.11.1 to 2.11.3 by @dependabot[bot] in #2993
build(deps): bump indexmap from 2.11.3 to 2.11.4 by @dependabot[bot] in #2999
build(deps): bump libc from 0.2.175 to 0.2.176 by @dependabot[bot] in #3009
build(deps): bump mlua from 0.11.3 to 0.11.4 by @dependabot[bot] in #3021
build(deps): bump regex from 1.11.2 to 1.11.3 by @dependabot[bot] in #3011
build(deps): bump redis from 0.32.5 to 0.32.6 by @dependabot[bot] in #3016
build(deps): bump qsv-stats from 0.38.0 to 0.39.0 by @dependabot[bot] in #3028
build(deps): bump qsv-stats from 0.39.0 to 0.39.1 by @dependabot[bot] in #3029
build(deps): bump redis from 0.32.6 to 0.32.7 by @dependabot[bot] in #3025
build(deps): bump serde from 1.0.219 to 1.0.223 by @dependabot[bot] in #2983
build(deps): bump serde from 1.0.223 to 1.0.224 by @dependabot[bot] in #2988
build(deps): bump serde from 1.0.224 to 1.0.225 by @dependabot[bot] in #2994
build(deps): bump serde from 1.0.225 to 1.0.226 by @dependabot[bot] in #3002
build(deps): bump serde from 1.0.226 to 1.0.227 by @dependabot[bot] in #3014
build(deps): bump serde from 1.0.227 to 1.0.228 by @dependabot[bot] in #3019
build(deps): bump serde_json from 1.0.143 to 1.0.145 by @dependabot[bot] in #2981
build(deps): bump semver from 1.0.26 to 1.0.27 by @dependabot[bot] in #2982
build(deps): bump sysinfo from 0.37.0 to 0.37.1 by @dependabot[bot] in #3015
build(deps): bump sysinfo from 0.37.1 to 0.37.2 by @dependabot[bot] in #3024
build(deps): bump tempfile from 3.21.0 to 3.22.0 by @dependabot[bot] in #2975
build(deps): bump tempfile from 3.22.0 to 3.23.0 by @dependabot[bot] in #3007
build(deps): bump toml from 0.9.6 to 0.9.7 by @dependabot[bot] in #3001
pin zip to 4.6, as zip 5 has features that are not widely adopted b231a23
applied select clippy lint suggestions
updated indirect dependencies
bumped MSRV to Rust 1.90

Fixed

describegpt: init cache vars even when --no-cache is used #2970
describegpt: --base-url option being ignored #2977
schema: delimiter detection #2998
extdedup: really use memmapped ondisk hash table #3020

Removed:

removed powerpc64-le cross-compilation directive now that we have access to IBM-provided native PowerPC GH Action runner 9659bfc
removed macOS on Intel (x86_64-apple-darwin) prebuilt binaries

Full Changelog: 7.1.0...8.0.0

SangyaPundir, CC BY-SA 4.0 https://creativecommons.org/licenses/by-sa/4.0, via Wikimedia Commons https://commons.wikimedia.org/wiki/File:FAIR_data_principles.jpg ↩

Contributors

dependabot

Assets 13

06 Sep 16:07

jqnatividad

7.1.0

df89a22

7.1.0

[7.1.0] - 2025-09-06

🇮🇹 csv,conf,v9 edition 🍝


	Just in time for csv,conf,v9, we're Bologna-bound and will be talking all things qsv, CSV, open data, metadata standards, AI, POSE and CKAN! For this feature release, we polished `describegpt` a bit more for the occasion... *Towards the "People's API!"! Verso l'API del Popolo!* (Answering People/Policymaker Interface)

🚀 Enhanced `describegpt` Command

Configurable Frequency Limits: Make frequency distribution limit configurable for better control over data analysis
Few-shot Learning: Add --fewshot-examples option to improve LLM response quality with contextual examples
Advanced SQL Generation: Fine-tuned SQL generation guidance for better date handling and query optimization
Conditional SQL Results: Implement conditional --sql-results format for more efficient "SQL RAG" processing - i.e. if the generated SQL query executes successfully - the results are saved to the specified file with a .csv extension. If a "SQL hallucination" fails, the file is saved with a .sql extension instead for the user to tweak and edit.
TogetherAI Support: Add support for TogetherAI models endpoint, expanding LLM provider options
Enhanced Error Handling: Improved SQL parsing error handling and more informative error messages
Disk Cache by Default: The disk cache is now enabled by default for better performance
TOML Configuration: Migrate from JSON to more readable TOML format for more easily modifiable prompt files.
(see https://github.com/dathere/qsv/blob/master/resources/describegpt_defaults.toml)
Better Local LLM Support: --api-key can now be set to NONE for local LLM configurations that may not necessarily run on localhost (e.g. a shared Local LLM service running on the local network)

`partition` Command Enhancements

New --limit Option: Implement --limit option to set the maximum number of open files
Streaming to Enhanced Batching Logic: Convert from streaming to a simplified, two-pass batched approach designed to partition on columns with high cardinality for very large datasets

Added

describegpt: add configurable frequency limit #2950
describegpt: migrate prompt file from JSON to more easier to edit TOML format #2954
describegpt: refactor default prompt file; add --fewshot-examples option #2955
describegpt: add TogetherAI support for models endpoint #2965
partition: add --limit option #2960
added Windows ARM64 prebuilt binaries

Changed

describegpt: enable disk cache by default #2951
describegpt: Polars SQL generation tweaks #2958
python: replace deprecated with_gil with attach #2949. This sets the stage for "free-threaded" Python 3.14 support when its released in October 2025. Buh-bye GIL!
deps: bump embedded Luau from 0.688 to 0.690 #2967
deps: bump Polars to 0.50.0 at py-1.33.0 tag
build(deps): bump actions/setup-python from 5.6.0 to 6.0.0 by @dependabot[bot] in #2962
build(deps): bump actions/stale from 9 to 10 by @dependabot[bot] in #2963
build(deps): bump log from 0.4.27 to 0.4.28 by @dependabot[bot] in #2961
build(deps): bump mlua from 0.11.2 to 0.11.3 by @dependabot[bot] in #2948
build(deps): bump pyo3 from 0.25.1 to 0.26.0 by @dependabot[bot] in #2946
build(deps): bump uuid from 1.18.0 to 1.18.1 by @dependabot[bot] in #2956
build(deps): bump zip from 4.5.0 to 4.6.0 by @dependabot[bot] in #2952
applied select clippy lints
updated indirect dependencies

Full Changelog: 7.0.1...7.1.0

Contributors

dependabot

Assets 13

29 Aug 03:06

jqnatividad

7.0.1

aa404c3

7.0.1

[7.0.1] - 2025-08-28

A patch release with some minor bug fixes, benchmark tweaks and build system improvements.

Added

publish: add dedicated powerpc64le-unknown-linux-gnu publishing workflow (WIP)

Changed

docs: describegpt expanded error message about LLM URL or API key
deps: remove planus pinned dependency

Fixed

fix: geocode --batch 0 causes panic when polars feature is enabled
publish: remove luau feature from x86_64-pc-windows builds that was causing builds to fail
publish: remove powerpc64le from main publish workflow
benchmarks: updated to v6.8.0 with fixes to luau and clustered sample benchmarks

Full Changelog: 7.0.0...7.0.1

Assets 12

28 Aug 14:13

jqnatividad

7.0.0

b63901f

7.0.0

[7.0.0] - 2025-08-28

🥳 Open Weights with Open Data, Local LLM 🤖 edition 🚀

This is the biggest release yet - 470+ commits since v6.0.1! Packed with new AI-powered features, fixes and significant performance improvements suite-wide!

With the release of OpenAI's gpt-oss open-weight reasoning model earlier this month setting the stage, we continue on our "Automagical Metadata" journey by revamping describegpt.

🤖 Revamped describegpt - AI-Powered Metadata Inferencing and Data Analysis:

Intelligent Metadata Generation: Automatically generate comprehensive metadata - Data Dictionaries, Description and Tags for your Datasets using Large Language Models (LLM) prompted with summary statistics and frequency tables as detailed context - without sending your data to the cloud!
Even if you elect to use a cloud-based LLM, your Raw Data is never sent.
Chat with your Data: If your prompt can be answered using this high-quality, high-resolution Metadata, describegpt will answer it! If your prompt is not remotely related to the data, it will politely refuse - "I'm sorry, I can only answer questions about the Dataset."
Auto SQL RAG Mode: Should the LLM decide that it doesn't have the necessary information in the metadata it compiled to answer your prompt, it will automatically enter SQL Retrieval-Augmented Generation (RAG) mode - using the rich metadata instead as context to craft an expert-level, deterministic, reproducible, "hallucination-free" SQL query¹ to respond to your prompt.
Database Engine Support: If DuckDB is installed or the Polars feature is enabled, and --sql-results <ANSWER.CSV> is specified - an optimized SQL query will be automatically executed with the query results saved to the specified file.
As both DuckDB and Polars are purpose-built OLAP engines that support direct queries (no database pre-loading required), you get answers in a few seconds² - even for very large datasets.
Multi-LLM Support: Works with any OpenAI-API compatible LLM - with special support for local LLMs like Ollama, Jan and LM Studio, with the ability to customize model behavior with the --addl-props option.
Advanced Caching: Disk and Redis caching support for performance and cost optimization.
Flexible Prompting: Custom prompt files and built-in intelligent templates for various analysis tasks.

Check out these examples using a 1 million row sample of NYC's 311 data!

--all option produces a Data Dictionary, Description and Tags - Markdown, JSON
--prompt "What are the top 10 complaint types per community board and borough?" - SQL result
--prompt "How tall is the Empire State Building?" - "I'm sorry, I can only answer questions about the Dataset."

On top of other improvements in Datapusher+ with its new Jinja-based "metadata suggestion engine" - we're using this AI-inferred metadata along with other precalcs to prepopulate DCATv3 (both US and European profiles) and Croissant metadata fields that are otherwise too hard and expensive to compile manually.

The inferred and precalculated metadata values are offered as "suggestions", using a UI/UX purpose-built to facilitate interactive metadata curation chats.

This allows Data Stewards to compile high-quality, high-resolution metadata catalogs with an accelerated "Data Steward in the Loop" data ingestion and metadata curation workflow.

If you want to see and learn more, we're Bologna-bound to attend csv,conf,v9 to present and share how we're using this to auto-infer metadata in CKAN. Hope to see you there!

Towards the People's API!

(Answering People/Policymaker Interface)

📊 Enhanced frequency Command:

Rank Column: Ranking of frequency results for better data insights
JSON Output Mode: New --json option not only provides structured output beyond the default CSV format - it also takes advantage of JSON's nested support to include 15 additional summary statistics per field
Performance Boost: Speed improvements with SIMD-accelerated number parsing, remaining performant even with the added functionality

⚡ stats Command Improvements:

Faster Still: Enabled by improvements in the underlying qsv-stats crate
Improved Precision: Faster, streamlined precision calculation
SIMD Number Parsing: Hardware-accelerated parsing for int/float values
Unix Epoch Support: Proper handling of Unix timestamp 0 as valid date
Enhanced Date Inference: Better date and boolean type inference capabilities

🔧 validate & schema Enhancements:

Fancy Regex Support: You can now use "advanced" regex features with your JSON Schema patterns with the --fancy-regex option. Previously, you can only use the standard Rust regex engine which does not support backreferences or look-arounds (for performance reasons)
JSON Schema Improvements: Better error handling and format validation options
Schema Validation Refinements: More granular validation control with --no-format-validation

🔄 rename Reverted and Improved:

When pairwise renaming was introduced in v6.0.0, it broke some some workflows. It's now fixed by introducing two modes:

Positional Mode: Renaming by position is now once again the default
Pairwise Mode: New --pairwise flag for column renaming by column pairs

🗂️ partition Improvements:

Case-Insensitive Safety: Improved case-aware partitioning algorithm. Previously, case insensitive file systems like macOS APFS and Windows NTFS was causing incorrect partitioning of case-sensitive values
Faster still: With better use of I/O bufferring - with deferred, batched, async writes instead of after every record

Added

frequency add rank info to frequency table #2878
frequency add --json output option #2868
validate add --fancy-regex option #2845
add CPU-accelerated, mem-mapped, chunked sha256 file checksum helper #2909

Changed

apply use SIMD-accelerated base64-simd crate for Encode64 and Decode64 operations #2863
stats faster precision calculation #2852
perf: Use simd_json instead of serde_json to serialize to JSON #2884
refactor: create and use reqwest client helpers to eliminate redundant code #2888
perf: Faster parallelized sha256 hash file #2918
refactor: describegpt #2890
refactor: describegpt setting --timeout to 0 sets no timeout #2891
refactor: describegpt more refinements #2892
feat: describegpt refactor round3 #2893
feat: describegpt disk & redis caching #2895
refactor: describegpt #2896
refactor: describegpt create get_cache_key helper; customizable stats options #2902
feat: describegpt auto SQL RAG for --prompt #2904
feat: describegpt major refactor #2913
refactor: describegpt default promptfile is now embedded in qsv binary; fine-tune tests #2924
feat: describegpt returning reasoning with --json option #2926
feat: describegpt add DuckDB support in SQL RAG mode #2929
feat: describegpt various DuckD...

LLMs can still hallucinate a syntactically wrong SQL query. But once a valid SQL query is generated, its fully reproducible. ↩
Depending on your LLM setup, SQL query generation may take some time. Once generated however, the SQL query itself will be blazing-fast. ↩

Contributors

abobov and dependabot

Assets 12

12 Jul 13:49

jqnatividad

6.0.1

2d3272e

6.0.1

[6.0.1] - 2025-07-12

This is a patch release with bug fixes and minor improvements.

Changed

feat: updated completions for qsv v6.0.0 by @rzmk in #2838
docs: updated sample schema.json based on NYC311 1M row sample benchmark data
docs: updated sample stats output using NYC 311 1M row sample benchmark data
build(deps): bump chrono-tz from 0.10.3 to 0.10.4 by @dependabot[bot] in #2839
build(deps): bump qsv-stats from 0.35.0 to 0.36.0 by @dependabot[bot] in #2840
bumped indirect dependencies
Added benchmark_data.* to .gitignore

Fixed

geocode: make --batch=0 mode more robust by setting a minimum batch size of 1,000 rows 2fa90bc
jsonl: correct batchsize calculation to use input file instead of output file for line counting 742dc77
benchmarks: fixed benchmarks with unescaped parameters with embedded spaces ad95596

Removed

Removed retired publishing workflows (linux-glibc-231-musl-123 and wix-installer)

Full Changelog: 6.0.0...6.0.1

Contributors

dependabot and rzmk

Assets 12

11 Jul 12:10

jqnatividad

6.0.0

90cd5c2

6.0.0

Highlights:

This is a major release with significant improvements and new features!

🔍 Enhanced lens command:

File prompt support: You can now load prompts from files using the new file: support, making it easier to reuse complex prompts
Wrap mode option: Added --wrap-mode option for better text display control when viewing data
Improved examples: Enhanced usage examples and documentation

🔄 Improved rename command:

Pair-based renaming: Easier column renaming with more intuitive syntax for bulk operations.

📊 Enhanced sort command:

Natural sorting: Added --natural option for human-friendly sorting (e.g., "file1.txt", "file2.txt", "file10.txt" sorts "naturally"; previously lexicographical sorting would sort it as "file1.txt", "file10.txt", "file2.txt")

⚡ Performance improvements:

Memory optimizations: Multiple performance enhancements across frequency, stats, validate, and transpose commands
Buffer optimizations: Improved I/O performance with better buffer sizing for various operations
Polars engine upgrade: Updated to the latest Polars 0.49.x series for better performance and stability

🔧 Enhanced validation:

Robust JSON Schema validation: More granular error messages and better schema validation
Improved error reporting: Clearer messages to help debug validation issues
UTF-8 handling: Better handling of invalid records with improved debug output

🌐 Geocoding improvements:

Updated geosuggest: Bumped to version 0.8 with direct index update support for better geocoding performance

🔗 SQL enhancements (joinp and sqlp):

Decimal comma support: Added --decimal-comma option for writing operations, improving international data support
Better validation: Enhanced delimiter and decimal comma validation

🏗️ Infrastructure updates:

Rust 1.88 MSRV: Updated minimum supported Rust version
Dependency updates: Comprehensive updates to all major dependencies including Polars, Tokio, and many others
Compilation optimizations: Various improvements for faster builds and better runtime performance

Added

New Features:

lens: add file: support to load prompts from files #2805
lens: add --wrap-mode option #2805
rename: pair-based renaming for easier bulk column renaming #2806
sort: add --natural option for natural/human-friendly sorting #2808
schema: set JSON Schema description to the command line used for generation
joinp & sqlp: add --decimal-comma option for writing operations
joinp: add --decimal-comma and --delimiter validation
sqlp: add --decimal-comma and --delimiter validation
validate: more robust JSON Schema schema validation with granular error messages
validate: show invalid record in debug format for UTF-8 failures
Enhanced completions for qsv v5.1.0 and v6.0.0

Documentation & Examples:

lens: improved examples in usage text
schema: expand examples and add -P shortcut for --prompt option
sqlp: update description to note support for input beyond CSVs
Polars SQL documentation noting it's a PostgreSQL dialect
Added link to Polars 0.49.0 release notes
MSRV documentation updated to Rust 1.88
Additional conditions for when to use "portable" binaries

Changed

Performance Improvements:

frequency: microoptimize null value handling and preallocate vectors
stats: preallocate with_capacity for Unsorted struct and coefficient of variation handling improvements
transpose: performance refactoring with optimized buffer handling
validate: microoptimizations for JSON instance handling and buffer capacity improvements
apply: bigger reader buffer as apply is batch oriented
Enabled setter for read and write buffer sizing configuration
Various microoptimizations across commands

Polars Engine Updates:

Bumped Polars from 0.48 to 0.49.x series
Adapted to new Polars PlPath API
Updated to use latest Polars upstream throughout development cycle
Enabled simd-json compiler hints feature on nightly builds

Dependency Updates:

Major updates:
- Polars: 0.48 → 0.49.x
- Tokio: 1.45.1 → 1.46.1
- qsv-stats: 0.33.0 → 0.35.0
- kiddo: 5.0.3 → 5.2.2
- indexmap: 2.9.0 → 2.10.0
- calamine: updated to latest upstream
- redis: 0.32.2 → 0.32.3
- sysinfo: 0.35.2 → 0.36.0
- geosuggest: bumped to 0.8
Build dependencies:
- flexi_logger: 0.31.0 → 0.31.2
- arboard: 3.5.0 → 3.6.0
- minijinja: 2.10.2 → 2.11.0
- minijinja-contrib: 2.10.2 → 2.11.0
- zip: 4.1.0 → 4.3.0
- reqwest: 0.12.20 → 0.12.22
- indicatif: 0.17.11 → 0.17.12
- phf: 0.11.3 → 0.12.1
- human-panic: 2.0.2 → 2.0.3
- jaq-std: 2.1.1 → 2.1.2
- jaq-core: 2.2.0 → 2.2.1
- jaq-json: 1.1.2 → 1.1.3

Code Quality & Maintenance:

Applied clippy lint suggestions including collapsible_if, needless_return, redundant_clone, and manual_is_multiple_of
Updated MSRV to Rust 1.88
Set nightly to 2025-06-27
Removed hardware-lock-elision feature on parking_lot
No longer use similar-asserts crate, reverted to standard assert_eq
Better TOML formatting
Removed unneeded dependency aliases
Various code refactoring for better maintainability

Infrastructure:

Updated csvlens integration with natural sorting support
Switched dependency management approaches for better upstream compatibility
Pin plist to 1.7.3 to avoid unnecessary quick-xml bumps
Use latest calamine upstream consistently

Fixed

validate: clearer JSON Schema schema error messages to differentiate validation types
round_num(): should return an empty string if dec_f64.is_nan()
joinp: non-equi-join test result order deterministic issues
Enhanced Snappy file decompression robustness
Fixed geometric mean calculation in stats
Better UTF-8 record validation with debug output
Various test adjustments to account for dependency updates and behavior changes
Resolved several clippy warnings and code quality issues

Test Updates:

rename: add pair-renaming tests
sort: add natural sort tests
joinp: add decimal_comma tests
sqlp: add decimal-comma validation tests
validate: add JSON Schema schema validation tests
stats: adjust test cases for qsv-stats 0.35.0 changes
excel: re-enable and revert formula tests based on upstream changes

Development Notes

Benchmarks:

Comprehensive benchmarking for versions 5.1.0 and 6.6.1
Performance comparisons available for major operations

Continuous Integration:

Multiple dependency updates via Dependabot automation
Comprehensive test coverage maintained throughout development
Regular upstream synchronization with Polars and other major dependencies

Pull Requests

NOTE: The changelog entries below only document changes with a corresponding PR. Several changes were committed to master directly and are documented in the release highlights above.

Added

lens: add --wrap-mode option in #2805
rename: add pair-based renaming in #2806
sort: add --natural sort option in #2808

Changed

geocode: now uses the faster geosuggest 0.8 crate. index-update subcommand now generates command to use geosuggest crate directly to update/create the index instead of doing it internally.
schema: when generating JSON schema, description property set to cmdline used to generate the JSON schema in #2796
sqlp & joinp: --decimal-comma option is not only for parsing input CSVs, it's also used when writing output CSVs in #2800
transpose: performance refactoring in #2827
validate improved JSON Schema schema validation in #2803
update completions for qsv v5.1.0 by @rzmk in #2804
dep: bump polars to latest upstream - adapt to PlPath api reqt in #2822
perf: bump to faster geosuggest to 0.8 in #2837
build(deps): bump arboard from 3.5.0 to 3.6.0 by @dependabot[bot] in #2814
build(deps): bump flexi_logger from 0.31.0 to 0.31.1 by @dependabot[bot] in #2801
build(deps): bump flexi_logger from 0.31.1 to 0.31.2 by @dependabot[bot] in #2812
build(deps): bump libc from 0.2.173 to 0.2.174 by @dependabot[bot] in #2794
build(deps): bump human-panic from 2.0.2 to 2.0.3 by @dependabot[bot] in #2833
build(deps): bump indicatif from 0.17.11 to 0.17.12 by @dependabot[bot] in #2818
build(deps): bump jaq-std from 2.1.1 to 2.1.2 by @dependabot[bot] in #2830
build(deps): bump jaq-core from 2.2.0 to 2.2.1 by @dependabot[bot] in #2831
build(deps): bump jaq-json from 1.1.2 to 1.1.3 by @dependabot[bot] in #2832
build(deps): bump minijinja from 2.10.2 to 2.11.0 by @dependabot[bot] in #2815
build(deps): bump minijinja-contrib from 2.10.2 to 2.11.0 by @dependabot[bot] in #2816
build(deps): bump phf from 0.11.3 to 0.12.1 by @dependabot[bot] in #2797
...

Contributors

dependabot and rzmk

Assets 12

17 Jun 10:41

jqnatividad

5.1.0

e92c59b

5.1.0

[5.1.0] - 2025-06-17

Highlights

lens is now colorful by default, with a --monochrome option to turn it off:
```
 qsv lens /tmp/NYC_311_SR_2010-2020-sample-1M.csv
```

lens can now have custom prompts with the --prompt option (with support for ANSI escape codes to format the prompt). Meant to be paired with the --echo-column <colname> option, e.g.:
```
qsv lens --prompt $'\033[1;5;31mBlinking red, bold text\033[0m' --echo-column 'Unique Key' \
 /tmp/NYC_311_SR_2010-2020-sample-1M.csv
```

the qsv-stats crate - the underlying engine behind the central stats, frequency and "smart" commands, got a lot of love in this release
validate got a tad faster while decreasing its memory footprint. The new --no-format-validation option now also allows you to ignore all JSON Schema "format" keywords (e.g. date, email, url, currency, etc.) when validating CSVs.

Added

lens: add --prompt option, add examples to regex-enabled options #2772
lens: add --monochrome option, otherwise, columns displayed in different colors #2761
validate: add --no-format-validation option when in JSON Schema mode #2762
docs: add shell completions badges by @rzmk in #2760
feat: added criterion trim algorithm microbenchmarks #2789

Changed

frequency: performance microoptimizations - use stats cache column cardinality to pre-alloc & size frequency hash tables
geocode: refactor regex handling for performance & maintainability
json: preserve key order #2777
stats: performance microoptimizations - use unwrap_unchecked() instead of just unwrap() in hot sampling functions
validate: major refactoring for added performance/memory efficiency
chore: temporarily use qsv-calamine until a new calamine is released #2790
Bump cpc from 1.9 to 2 #2770
deps: bump criterion from 0.5 to 0.6 #2791
deps: use latest csvlens upstream with colorful columnshttps://github.com/dathere/qsv/commit/f2c9322e33a0ac335dafec10a490c871d3de0a6c
deps: temporarily use qsv-calamine until a new calamine is released #2790
deps: bump our patched forks of cached, csvs_convert, json-objects-to-csv, jsonschema, localzone, rfd, self_update until PRs are merged or new releases are made
deps: bump zip from 3 to 4 in 75909d2
deps: bump polars to 0.48.1 at 49ce57a revision
build(deps): bump atoi_simd from 0.16.0 to 0.16.1 by @dependabot in #2766
build(deps): bump bytemuck from 1.23.0 to 1.23.1 by @dependabot in #2778
build(deps): bump flate2 from 1.1.1 to 1.1.2 by @dependabot in #2781
build(deps): bump flexi_logger from 0.30.1 to 0.30.2 by @dependabot in #2765
build(deps): bump flexi_logger from 0.30.2 to 0.31.0 by @dependabot in #2793
build(deps): bump hashbrown from 0.15.3 to 0.15.4 by @dependabot in #2779
build(deps): bump libc from 0.2.172 to 0.2.173 by @dependabot in #2787
build(deps): bump mimalloc from 0.1.46 to 0.1.47 by @dependabot in #2792
build(deps): bump mlua from 0.10.3 to 0.10.5 by @dependabot in #2758
build(deps): bump num_cpus from 1.16.0 to 1.17.0 by @dependabot in #2771
build(deps): bump parking_lot from 0.12.3 to 0.12.4 by @dependabot in #2768
build(deps): bump pyo3 from 0.25.0 to 0.25.1 by @dependabot in #2785
deps: upgrade qsv-stats from 0.32 to 0.33, which features major memory and performance optimizations behind the stats & frequency commands #2786
deps: bump redis from 0.29.5 to 0.32
build(deps): bump reqwest from 0.12.15 to 0.12.16 by @dependabot in #2764
build(deps): bump reqwest from 0.12.16 to 0.12.18 by @dependabot in #2767
build(deps): bump reqwest from 0.12.18 to 0.12.19 by @dependabot in #2773
build(deps): bump reqwest from 0.12.19 to 0.12.20 by @dependabot in #2782
build(deps): bump rust_decimal from 1.37.1 to 1.37.2 by @dependabot in #2788
build(deps): bump smallvec from 1.15.0 to 1.15.1 by @dependabot in #2780
build(deps): bump sysinfo from 0.35.1 to 0.35.2 by @dependabot in #2774
build(deps): bump titlecase from 3.5.0 to 3.6.0 by @dependabot in #2775
build(deps): bump tokio from 1.45.0 to 1.45.1 by @dependabot in #2759
build(deps): bump uuid from 1.16.0 to 1.17.0 by @dependabot in #2757
applied select clippy suggestions
updated indirect dependencies
set Rust nightly to 2025-05-21, the same nightly Polars uses 872ade1

Fixed:

fix: frequency recover from non-fatal absence of stats cache, instead of panicking b2821a0
fix: flaky json tests caused by hardcoding name of intermediate file - 62ca310
fix: flaky reverse property tests by handling BOM characters cefd490
fix: util::process_input helper does not honor QSV_SKIP_FORMAT_CHECK when processing dir input #2784

Full Changelog: 5.0.3...5.1.0

Contributors

dependabot and rzmk

Assets 12

Releases: dathere/qsv

9.1.0

[9.1.0] - 2025-11-03

Added

Changed

Fixed

Removed

Contributors

Uh oh!

8.1.1

[8.1.1] - 2025-10-22

Added

Changed

Fixed

New Contributors

Contributors

Uh oh!

8.1.0

[8.1.0] - 2025-10-20

Added

Changed

Fixed

Removed

Contributors

Uh oh!

8.0.0

[8.0.0] - 2025-10-06

make ALL your Data AI-Ready, Useful, Usable & Used by Machines & Humans alike.

Added

Changed

Fixed

Removed:

Contributors

Uh oh!

7.1.0

[7.1.0] - 2025-09-06

🇮🇹 csv,conf,v9 edition 🍝

🚀 Enhanced describegpt Command

partition Command Enhancements

Added

Changed

Contributors

Uh oh!

7.0.1

[7.0.1] - 2025-08-28

Added

Changed

Fixed

Uh oh!

7.0.0

[7.0.0] - 2025-08-28

🥳 Open Weights with Open Data, Local LLM 🤖 edition 🚀

Towards the People's API!

Added

Changed

Contributors

Uh oh!

6.0.1

[6.0.1] - 2025-07-12

Changed

Fixed

Removed

Contributors

Uh oh!

6.0.0

Highlights:

Added

Changed

Fixed

Development Notes

Pull Requests

Added

Changed

Contributors

Uh oh!

5.1.0

[5.1.0] - 2025-06-17

Highlights

Added

Changed

🚀 Enhanced `describegpt` Command

`partition` Command Enhancements