Skip to content

Releases: dathere/qsv

9.1.0

03 Nov 20:52

Choose a tag to compare

[9.1.0] - 2025-11-03

FAIRMetadataRocks-smaller

FAIRification continues to be a focus, as we tweak key commands that enable us to FAIRify raw data at blazing speed:

  • frequency received significant updates in this release, including several new options that make compiling frequency distribution tables easier.
  • describegpt now uses the much faster BLAKE3 hash as a cache key (10-20x faster than SHA256) and supports passing complex prompts more easily through the file system.
  • qsv-stats - the engine that powers both stats and frequency commands - has been further optimized with the 0.40.0 release, to compile summary statistics as fast as possible - even for very large files - often one to two orders of magnitude faster (10 to 100x faster) than typical Python-based tools.
  • Polars has been upgraded to 0.52.0. This vectorized query engine allows us to support more tabular formats & analyze/query millions of rows in seconds in situ - all without loading the data into a database.
  • the csv 1.4.0 crate has been tuned further to squeeze out even higher throughput - already ~2 million rows per second!1

These improvements prepare the ground for the upcoming MCP server on qsv pro, which will enable at-scale, configurable, interactive "Data Steward-in-the-loop", value-added FAIRification of privacy-sensitive files.

The qsv pro MCP server will handle not just CSVs but also other formats, including unstructured data - all processed locally on the desktop, without sending your raw data to the cloud.

It will produce AI-ready, standards-compliant metadata (starting with DCAT-US v3, Croissant and schema.org) - ideal context for AI applications and data governance efforts alike.


Added

  • frequency: add --pretty-json option c67fd06
  • frequency: add --rank-strategy option #3075
  • frequency: add -null-text option #3082

Changed

  • describegpt: explicitly use frequency's dense rank strategy dc3f270
  • describegpt: allow --prompt to be loaded from a text file b11a10c
  • describegpt: use much faster BLAKE3 hash for cache key
  • frequency: change default rank-strategy from min (AKA "1224" ranking) to dense (AKA "1223" ranking)
  • lens: bumped csvlens from 0.13.0 to 0.14.0
  • lens: automatically set to monochrome mode when using --find option 8539869
  • luau: bumped embedded Luau from 0.694 to 0.697 3e68e29
  • stats: fingerprint hash now uses much-faster, parallelizable BLAKE3 instead of SHA256
  • table: document that it also creates "aligned TSVs" and Fixed Width Format files aaa84b0
  • tests: change default Python to 3.13
  • docs: documented that Extended Input Support (🗄️) does .zip auto-decompression
  • docs: documented Limited Extended Input Support (🗃️)
  • use latest qsv-tuned csv crate with performance optimizations
  • build(deps): bump flate2 from 1.1.4 to 1.1.5 by @dependabot[bot] in #3071
  • build(deps): bump human-panic from 2.0.3 to 2.0.4 by @dependabot[bot] in #3077
  • deps: bump Polars from 0.51.0 at py-1.35.0-beta.1 to 0.52.0 618edf0
  • build(deps): bump qsv-stats from 0.39.1 to 0.40.0 by @dependabot[bot] in #3078
  • build(deps): bump actions/upload-artifact from 4 to 5 by @dependabot[bot] in #3074
  • applied several clippy lint suggestions
  • bumped several indirect dependencies
  • align nightly to 2025-10-24, the same nightly as Polars
  • bumped MSRV to Rust 1.91

Fixed

  • describegpt: add SQL escaping to eliminate SQL injection attack vector; add .csv extension to --sql-output when Polars SQL query runs successfully ad52a35
  • frequency: fix --select option always returning <ALL_UNIQUE> #3082
  • fixed some publishing workflows

Removed

  • Removed SHA256 and replaced with mush faster, parallelizable BLAKE3 hash #3072 and #3080
  • publish: removed maximize-build-space step in workflows as it was not working as advertised
  • tests: removed target-cpu=native RUSTFLAG in CI tests to avoid intermittent SIGILL (Illegal Instruction) faults

Full Changelog: 8.1.1...9.1.0

  1. see validate_no_schema benchmark

8.1.1

22 Oct 02:28

Choose a tag to compare

[8.1.1] - 2025-10-22

Added

  • docs: Seeded developer documentation for index/stats/frequency modules by @kulnor in #3056

Changed

  • deps: use latest version of qsv-tuned csv crate 7523e08
  • deps: unpin zip from 4.6 and bump to 6 now that geosuggest uses it 957ad6d
  • build(deps): bump dns-lookup from 3.0.0 to 3.0.1 by @dependabot[bot] in #3057
  • build(deps): bump geosuggest-utils from 0.8.0 to 0.8.1 by @dependabot[bot] in #3058
  • build(deps): bump geosuggest-core from 0.8.0 to 0.8.1 by @dependabot[bot] in #3059
  • build(deps): bump memmap2 from 0.9.8 to 0.9.9 by @dependabot[bot] in #3060
  • build(deps): bump pyo3 from 0.27.0 to 0.27.1 by @dependabot[bot] in #3061
  • tweaked several publishing and test GH Actions workflows
  • applied clippy::to_string_in_format_args lint suggestion
  • bumped several indirect dependencies

Fixed

  • use latest csvlens patched fork that fixes panic when using stdin input 34154e6

New Contributors

Full Changelog: 8.1.0...8.1.1

8.1.0

20 Oct 10:43

Choose a tag to compare

[8.1.0] - 2025-10-20

This minor release features:

  • qsv on IBM Z mainframes (s390x)! - now that we have endianness detection, even adding a prebuilt binary for it.
  • describegpt: Output Kind and Token Usage have been added to the output making it easier to parse responses and track LLM costs.
  • python: with the latest pyO3.rs 0.27 crate, we're setting the stage to drop support for Python 3.12 and below, targeting free-threaded Python exclusively starting with the 9.0 release. This should allow us to massively boost performance by parallelizing py workloads.
    It will also power the upcoming FAIRification commands.
  • a tuned csv fork based on the just released csv 1.4 crate, increasing performance suite-wide.

Added

  • describegpt: add Kind and Token Usage to output a21e117
  • add big-endian handling for big-endian platforms (e.g. s390x-unknown-linux-gnu) #3045
  • add s390x prebuilt binary (qsv now runs on IBM Z Mainframes!) a3f455c

Changed

  • datefmt: Replace localzone crate with iana-time-zone crate #3048
  • geoconvert: Improved with the latest geozero fixes needed for Datapusher+ processing of GeoJSON and SHP files.
  • python: micro-optimize to remove unnecessary clone; use more idiomatic error_result handling - 777aa14
  • docs: update badges with PowerPC Linux GNU, Windows ARM64 MSVC, remove macOS Intel by @rzmk in #3036
  • deps: bump bitflags from 2.9.4 to 2.10.0 8d65c1b
  • deps: bumped csv crate to 1.4 and reapplied qsv optimizations. For more info, see 4e2f2a0
  • deps: bump csvs_convert patch fork 8aa398f
  • deps: bump geozero to latest upstream with unreleased fixes - 0a9d1b3
  • deps: bump polars to 0.51.0 at py-1.35.0-beta-1 tag
  • deps: bump socket2 from 0.6.0 to 0.6.1
  • deps: bump whatlang to 0.18 e80e9c0
  • build(deps): bump actions/setup-python from 5.0.0 to 6.0.0 by @dependabot[bot] in #3030
  • build(deps): bump actix-governor from 0.8.0 to 0.10.0 by @dependabot[bot] in #3046
  • build(deps): bump gzp from 1.0.1 to 2.0.0 by @dependabot[bot] in #3033
  • build(deps): bump github/codeql-action from 3 to 4 by @dependabot[bot] in #3034
  • build(deps): bump flexi_logger from 0.31.4 to 0.31.5 by @dependabot[bot] in #3032
  • build(deps): bump flexi_logger from 0.31.5 to 0.31.6 by @dependabot[bot] in #3035
  • build(deps): bump flexi_logger from 0.31.6 to 0.31.7 by @dependabot[bot] in #3038
  • build(deps): bump libc from 0.2.176 to 0.2.177 by @dependabot[bot] in #3040
  • build(deps): bump pyo3 from 0.26.0 to 0.27.0 by @dependabot[bot] in #3055
  • build(deps): bump qsv_docopt from 1.8.0 to 1.9.0 by @dependabot[bot] in #3041
  • build(deps): bump regex from 1.11.3 to 1.12.1 by @dependabot[bot] in #3043
  • build(deps): bump regex from 1.12.1 to 1.12.2 by @dependabot[bot] in #3050
  • build(deps): bump reqwest from 0.12.23 to 0.12.24 by @dependabot[bot] in #3049
  • build(deps): bump rust_decimal from 1.38.0 to 1.39.0 by @dependabot[bot] in #3047
  • build(deps): bump simd-json from 0.16.0 to 0.17.0 by @dependabot[bot] in #3031
  • build(deps): bump tikv-jemallocator from 0.6.0 to 0.6.1 by @dependabot[bot] in #3053
  • build(deps): bump tokio from 1.47.1 to 1.48.0 by @dependabot[bot] in #3052
  • applied select clippy lint suggestions
  • updated indirect dependencies

Fixed

  • headers: fix stdin handling without explicit - for stdin input #3039

Removed

  • removed Python 3.10 prebuilts as py03 0.27 no longer supports it and Python 3.10 is no longer maintained
  • deps: removed patched fork of time-rs now that 0.3.43 has been released fde03b3

Full Changelog: 8.0.0...8.1.0

8.0.0

06 Oct 00:43

Choose a tag to compare

[8.0.0] - 2025-10-06

FAIRdataAIREADYdataBanner1
Findable, Accessible, Interoperable & Reusable (FAIR) Data is AI-Ready Data.

A week and a half after launching our "People's API" AI Chatbot and "AI-Ready" service, we fine-tune qsv further, as it powers the FAIRification engine that allows us to "open your data" (as a verb) - to infer and calculate AI-Ready, FAIR metadata at blazing speed even for large datasets.

This release features:

These changes set the stage for even more advanced, powerful, configurable FAIRification capabilities to

make ALL your Data AI-Ready, Useful, Usable & Used by Machines & Humans alike.

Added

  • table: add leftendtab alignment option #3004
  • table: add leftfwf (Fixed Width Format) alignment option 590c861
  • validate: add Extended Input Support to RFC 4180 validation mode #3012
  • added PowerPC64 LE Linux prebuilt

Changed

  • describegpt: fine-tuned default LLM Prompt template (v3.1.0) 00e52a3 6b09b7e 5be7f2e
  • luau: bump embedded Luau from 0.690 to 0.693 #3017
  • schema: make Decimal Type Scale configurable for polars schema with QSV_POLARS_DECIMAL_SCALE env var - f20edd5
  • updated optimized csv crate, adding non-allocating StringRecord::trim() and more inline()s 4a1c82a
  • deps: bump calamine to 0.31.0 bd7a04c
  • deps: Bump polars to 0.51.0 from 0.50.0 at py-1.33.1 tag #2995
  • deps: bump polars to 0.51.0 at py-1.34.0-beta.4 tag at revision b973cac (latest upstream) #3022
  • deps: bump polars to 0.51.0 at py-1.35.0 tag revision b973cac 4164875
  • deps: replace tabwriter with renamed fork qsv-tabwriter #3010
  • deps: use patched fork of whatlang-rs. Though our PR was merged, there is still no new release 6afff4f
  • build(deps): bump base62 from 2.2.2 to 2.2.3 by @dependabot[bot] in #3003
  • build(deps): bump bytemuck from 1.23.2 to 1.24.0 by @dependabot[bot] in #3026
  • build(deps): bump chrono from 0.4.41 to 0.4.42 by @dependabot[bot] in #2974
  • build(deps): bump fancy-regex from 0.16.1 to 0.16.2 by @dependabot[bot] in #3000
  • build(deps): bump flate2 from 1.1.2 to 1.1.3 by @dependabot[bot] in #3027
  • build(deps): bump flexi_logger from 0.31.2 to 0.31.3 by @dependabot[bot] in #3005
  • build(deps): bump flexi_logger from 0.31.3 to 0.31.4 by @dependabot[bot] in #3008
  • build(deps): bump indexmap from 2.11.0 to 2.11.1 by @dependabot[bot] in #2973
  • build(deps): bump indexmap from 2.11.1 to 2.11.3 by @dependabot[bot] in #2993
  • build(deps): bump indexmap from 2.11.3 to 2.11.4 by @dependabot[bot] in #2999
  • build(deps): bump libc from 0.2.175 to 0.2.176 by @dependabot[bot] in #3009
  • build(deps): bump mlua from 0.11.3 to 0.11.4 by @dependabot[bot] in #3021
  • build(deps): bump regex from 1.11.2 to 1.11.3 by @dependabot[bot] in #3011
  • build(deps): bump redis from 0.32.5 to 0.32.6 by @dependabot[bot] in #3016
  • build(deps): bump qsv-stats from 0.38.0 to 0.39.0 by @dependabot[bot] in #3028
  • build(deps): bump qsv-stats from 0.39.0 to 0.39.1 by @dependabot[bot] in #3029
  • build(deps): bump redis from 0.32.6 to 0.32.7 by @dependabot[bot] in #3025
  • build(deps): bump serde from 1.0.219 to 1.0.223 by @dependabot[bot] in #2983
  • build(deps): bump serde from 1.0.223 to 1.0.224 by @dependabot[bot] in #2988
  • build(deps): bump serde from 1.0.224 to 1.0.225 by @dependabot[bot] in #2994
  • build(deps): bump serde from 1.0.225 to 1.0.226 by @dependabot[bot] in #3002
  • build(deps): bump serde from 1.0.226 to 1.0.227 by @dependabot[bot] in #3014
  • build(deps): bump serde from 1.0.227 to 1.0.228 by @dependabot[bot] in #3019
  • build(deps): bump serde_json from 1.0.143 to 1.0.145 by @dependabot[bot] in #2981
  • build(deps): bump semver from 1.0.26 to 1.0.27 by @dependabot[bot] in #2982
  • build(deps): bump sysinfo from 0.37.0 to 0.37.1 by @dependabot[bot] in #3015
  • build(deps): bump sysinfo from 0.37.1 to 0.37.2 by @dependabot[bot] in #3024
  • build(deps): bump tempfile from 3.21.0 to 3.22.0 by @dependabot[bot] in #2975
  • build(deps): bump tempfile from 3.22.0 to 3.23.0 by @dependabot[bot] in #3007
  • build(deps): bump toml from 0.9.6 to 0.9.7 by @dependabot[bot] in #3001
  • pin zip to 4.6, as zip 5 has features that are not widely adopted b231a23
  • applied select clippy lint suggestions
  • updated indirect dependencies
  • bumped MSRV to Rust 1.90

Fixed

  • describegpt: init cache vars even when --no-cache is used #2970
  • describegpt: --base-url option being ignored #2977
  • schema: delimiter detection #2998
  • extdedup: really use memmapped ondisk hash table #3020

Removed:

  • removed powerpc64-le cross-compilation directive now that we have access to IBM-provided native PowerPC GH Action runner 9659bfc
  • removed macOS on Intel (x86_64-apple-darwin) prebuilt binaries

Full Changelog: 7.1.0...8.0.0


  1. SangyaPundir, CC BY-SA 4.0 https://creativecommons.org/licenses/by-sa/4.0, via Wikimedia Commons https://commons.wikimedia.org/wiki/File:FAIR_data_principles.jpg

7.1.0

06 Sep 16:07
df89a22

Choose a tag to compare

[7.1.0] - 2025-09-06

🇮🇹 csv,conf,v9 edition 🍝

   
csvconfv9-flavor-small Just in time for csv,conf,v9, we're Bologna-bound and will be talking all things qsv, CSV, open data, metadata standards, AI, POSE and CKAN!

For this feature release, we polished describegpt a bit more for the occasion...

Towards the "People's API!"! Verso l'API del Popolo!
(Answering People/Policymaker Interface)

🚀 Enhanced describegpt Command

  • Configurable Frequency Limits: Make frequency distribution limit configurable for better control over data analysis
  • Few-shot Learning: Add --fewshot-examples option to improve LLM response quality with contextual examples
  • Advanced SQL Generation: Fine-tuned SQL generation guidance for better date handling and query optimization
  • Conditional SQL Results: Implement conditional --sql-results format for more efficient "SQL RAG" processing - i.e. if the generated SQL query executes successfully - the results are saved to the specified file with a .csv extension. If a "SQL hallucination" fails, the file is saved with a .sql extension instead for the user to tweak and edit.
  • TogetherAI Support: Add support for TogetherAI models endpoint, expanding LLM provider options
  • Enhanced Error Handling: Improved SQL parsing error handling and more informative error messages
  • Disk Cache by Default: The disk cache is now enabled by default for better performance
  • TOML Configuration: Migrate from JSON to more readable TOML format for more easily modifiable prompt files.
    (see https://github.com/dathere/qsv/blob/master/resources/describegpt_defaults.toml)
  • Better Local LLM Support: --api-key can now be set to NONE for local LLM configurations that may not necessarily run on localhost (e.g. a shared Local LLM service running on the local network)

partition Command Enhancements

  • New --limit Option: Implement --limit option to set the maximum number of open files
  • Streaming to Enhanced Batching Logic: Convert from streaming to a simplified, two-pass batched approach designed to partition on columns with high cardinality for very large datasets

Added

  • describegpt: add configurable frequency limit #2950
  • describegpt: migrate prompt file from JSON to more easier to edit TOML format #2954
  • describegpt: refactor default prompt file; add --fewshot-examples option #2955
  • describegpt: add TogetherAI support for models endpoint #2965
  • partition: add --limit option #2960
  • added Windows ARM64 prebuilt binaries

Changed

  • describegpt: enable disk cache by default #2951
  • describegpt: Polars SQL generation tweaks #2958
  • python: replace deprecated with_gil with attach #2949. This sets the stage for "free-threaded" Python 3.14 support when its released in October 2025. Buh-bye GIL!
  • deps: bump embedded Luau from 0.688 to 0.690 #2967
  • deps: bump Polars to 0.50.0 at py-1.33.0 tag
  • build(deps): bump actions/setup-python from 5.6.0 to 6.0.0 by @dependabot[bot] in #2962
  • build(deps): bump actions/stale from 9 to 10 by @dependabot[bot] in #2963
  • build(deps): bump log from 0.4.27 to 0.4.28 by @dependabot[bot] in #2961
  • build(deps): bump mlua from 0.11.2 to 0.11.3 by @dependabot[bot] in #2948
  • build(deps): bump pyo3 from 0.25.1 to 0.26.0 by @dependabot[bot] in #2946
  • build(deps): bump uuid from 1.18.0 to 1.18.1 by @dependabot[bot] in #2956
  • build(deps): bump zip from 4.5.0 to 4.6.0 by @dependabot[bot] in #2952
  • applied select clippy lints
  • updated indirect dependencies

Full Changelog: 7.0.1...7.1.0

7.0.1

29 Aug 03:06
aa404c3

Choose a tag to compare

[7.0.1] - 2025-08-28

A patch release with some minor bug fixes, benchmark tweaks and build system improvements.

Added

  • publish: add dedicated powerpc64le-unknown-linux-gnu publishing workflow (WIP)

Changed

  • docs: describegpt expanded error message about LLM URL or API key
  • deps: remove planus pinned dependency

Fixed

  • fix: geocode --batch 0 causes panic when polars feature is enabled
  • publish: remove luau feature from x86_64-pc-windows builds that was causing builds to fail
  • publish: remove powerpc64le from main publish workflow
  • benchmarks: updated to v6.8.0 with fixes to luau and clustered sample benchmarks

Full Changelog: 7.0.0...7.0.1

7.0.0

28 Aug 14:13

Choose a tag to compare

[7.0.0] - 2025-08-28

🥳 Open Weights with Open Data, Local LLM 🤖 edition 🚀

This is the biggest release yet - 470+ commits since v6.0.1! Packed with new AI-powered features, fixes and significant performance improvements suite-wide!

With the release of OpenAI's gpt-oss open-weight reasoning model earlier this month setting the stage, we continue on our "Automagical Metadata" journey by revamping describegpt.

🤖 Revamped describegpt - AI-Powered Metadata Inferencing and Data Analysis:

  • Intelligent Metadata Generation: Automatically generate comprehensive metadata - Data Dictionaries, Description and Tags for your Datasets using Large Language Models (LLM) prompted with summary statistics and frequency tables as detailed context - without sending your data to the cloud!
    Even if you elect to use a cloud-based LLM, your Raw Data is never sent.
  • Chat with your Data: If your prompt can be answered using this high-quality, high-resolution Metadata, describegpt will answer it! If your prompt is not remotely related to the data, it will politely refuse - "I'm sorry, I can only answer questions about the Dataset."
  • Auto SQL RAG Mode: Should the LLM decide that it doesn't have the necessary information in the metadata it compiled to answer your prompt, it will automatically enter SQL Retrieval-Augmented Generation (RAG) mode - using the rich metadata instead as context to craft an expert-level, deterministic, reproducible, "hallucination-free" SQL query1 to respond to your prompt.
  • Database Engine Support: If DuckDB is installed or the Polars feature is enabled, and --sql-results <ANSWER.CSV> is specified - an optimized SQL query will be automatically executed with the query results saved to the specified file.
    As both DuckDB and Polars are purpose-built OLAP engines that support direct queries (no database pre-loading required), you get answers in a few seconds2 - even for very large datasets.
  • Multi-LLM Support: Works with any OpenAI-API compatible LLM - with special support for local LLMs like Ollama, Jan and LM Studio, with the ability to customize model behavior with the --addl-props option.
  • Advanced Caching: Disk and Redis caching support for performance and cost optimization.
  • Flexible Prompting: Custom prompt files and built-in intelligent templates for various analysis tasks.

Check out these examples using a 1 million row sample of NYC's 311 data!

On top of other improvements in Datapusher+ with its new Jinja-based "metadata suggestion engine" - we're using this AI-inferred metadata along with other precalcs to prepopulate DCATv3 (both US and European profiles) and Croissant metadata fields that are otherwise too hard and expensive to compile manually.

The inferred and precalculated metadata values are offered as "suggestions", using a UI/UX purpose-built to facilitate interactive metadata curation chats.

This allows Data Stewards to compile high-quality, high-resolution metadata catalogs with an accelerated "Data Steward in the Loop" data ingestion and metadata curation workflow.

If you want to see and learn more, we're Bologna-bound to attend csv,conf,v9 to present and share how we're using this to auto-infer metadata in CKAN. Hope to see you there!

Towards the People's API!

(Answering People/Policymaker Interface)


📊 Enhanced frequency Command:

  • Rank Column: Ranking of frequency results for better data insights
  • JSON Output Mode: New --json option not only provides structured output beyond the default CSV format - it also takes advantage of JSON's nested support to include 15 additional summary statistics per field
  • Performance Boost: Speed improvements with SIMD-accelerated number parsing, remaining performant even with the added functionality

stats Command Improvements:

  • Faster Still: Enabled by improvements in the underlying qsv-stats crate
  • Improved Precision: Faster, streamlined precision calculation
  • SIMD Number Parsing: Hardware-accelerated parsing for int/float values
  • Unix Epoch Support: Proper handling of Unix timestamp 0 as valid date
  • Enhanced Date Inference: Better date and boolean type inference capabilities

🔧 validate & schema Enhancements:

  • Fancy Regex Support: You can now use "advanced" regex features with your JSON Schema patterns with the --fancy-regex option. Previously, you can only use the standard Rust regex engine which does not support backreferences or look-arounds (for performance reasons)
  • JSON Schema Improvements: Better error handling and format validation options
  • Schema Validation Refinements: More granular validation control with --no-format-validation

🔄 rename Reverted and Improved:

When pairwise renaming was introduced in v6.0.0, it broke some some workflows. It's now fixed by introducing two modes:

  • Positional Mode: Renaming by position is now once again the default
  • Pairwise Mode: New --pairwise flag for column renaming by column pairs

🗂️ partition Improvements:

  • Case-Insensitive Safety: Improved case-aware partitioning algorithm. Previously, case insensitive file systems like macOS APFS and Windows NTFS was causing incorrect partitioning of case-sensitive values
  • Faster still: With better use of I/O bufferring - with deferred, batched, async writes instead of after every record

Added

  • frequency add rank info to frequency table #2878
  • frequency add --json output option #2868
  • validate add --fancy-regex option #2845
  • add CPU-accelerated, mem-mapped, chunked sha256 file checksum helper #2909

Changed

  • apply use SIMD-accelerated base64-simd crate for Encode64 and Decode64 operations #2863
  • stats faster precision calculation #2852
  • perf: Use simd_json instead of serde_json to serialize to JSON #2884
  • refactor: create and use reqwest client helpers to eliminate redundant code #2888
  • perf: Faster parallelized sha256 hash file #2918
  • refactor: describegpt #2890
  • refactor: describegpt setting --timeout to 0 sets no timeout #2891
  • refactor: describegpt more refinements #2892
  • feat: describegpt refactor round3 #2893
  • feat: describegpt disk & redis caching #2895
  • refactor: describegpt #2896
  • refactor: describegpt create get_cache_key helper; customizable stats options #2902
  • feat: describegpt auto SQL RAG for --prompt #2904
  • feat: describegpt major refactor #2913
  • refactor: describegpt default promptfile is now embedded in qsv binary; fine-tune tests #2924
  • feat: describegpt returning reasoning with --json option #2926
  • feat: describegpt add DuckDB support in SQL RAG mode #2929
  • feat: describegpt various DuckD...
  1. LLMs can still hallucinate a syntactically wrong SQL query. But once a valid SQL query is generated, its fully reproducible.

  2. Depending on your LLM setup, SQL query generation may take some time. Once generated however, the SQL query itself will be blazing-fast.

Read more

6.0.1

12 Jul 13:49

Choose a tag to compare

[6.0.1] - 2025-07-12

This is a patch release with bug fixes and minor improvements.


Changed

  • feat: updated completions for qsv v6.0.0 by @rzmk in #2838
  • docs: updated sample schema.json based on NYC311 1M row sample benchmark data
  • docs: updated sample stats output using NYC 311 1M row sample benchmark data
  • build(deps): bump chrono-tz from 0.10.3 to 0.10.4 by @dependabot[bot] in #2839
  • build(deps): bump qsv-stats from 0.35.0 to 0.36.0 by @dependabot[bot] in #2840
  • bumped indirect dependencies
  • Added benchmark_data.* to .gitignore

Fixed

  • geocode: make --batch=0 mode more robust by setting a minimum batch size of 1,000 rows 2fa90bc
  • jsonl: correct batchsize calculation to use input file instead of output file for line counting 742dc77
  • benchmarks: fixed benchmarks with unescaped parameters with embedded spaces ad95596

Removed

  • Removed retired publishing workflows (linux-glibc-231-musl-123 and wix-installer)

Full Changelog: 6.0.0...6.0.1

6.0.0

11 Jul 12:10

Choose a tag to compare

Highlights:

This is a major release with significant improvements and new features!

🔍 Enhanced lens command:

  • File prompt support: You can now load prompts from files using the new file: support, making it easier to reuse complex prompts
  • Wrap mode option: Added --wrap-mode option for better text display control when viewing data
  • Improved examples: Enhanced usage examples and documentation

🔄 Improved rename command:

  • Pair-based renaming: Easier column renaming with more intuitive syntax for bulk operations.

📊 Enhanced sort command:

  • Natural sorting: Added --natural option for human-friendly sorting (e.g., "file1.txt", "file2.txt", "file10.txt" sorts "naturally"; previously lexicographical sorting would sort it as "file1.txt", "file10.txt", "file2.txt")

⚡ Performance improvements:

  • Memory optimizations: Multiple performance enhancements across frequency, stats, validate, and transpose commands
  • Buffer optimizations: Improved I/O performance with better buffer sizing for various operations
  • Polars engine upgrade: Updated to the latest Polars 0.49.x series for better performance and stability

🔧 Enhanced validation:

  • Robust JSON Schema validation: More granular error messages and better schema validation
  • Improved error reporting: Clearer messages to help debug validation issues
  • UTF-8 handling: Better handling of invalid records with improved debug output

🌐 Geocoding improvements:

  • Updated geosuggest: Bumped to version 0.8 with direct index update support for better geocoding performance

🔗 SQL enhancements (joinp and sqlp):

  • Decimal comma support: Added --decimal-comma option for writing operations, improving international data support
  • Better validation: Enhanced delimiter and decimal comma validation

🏗️ Infrastructure updates:

  • Rust 1.88 MSRV: Updated minimum supported Rust version
  • Dependency updates: Comprehensive updates to all major dependencies including Polars, Tokio, and many others
  • Compilation optimizations: Various improvements for faster builds and better runtime performance

Added

New Features:

  • lens: add file: support to load prompts from files #2805
  • lens: add --wrap-mode option #2805
  • rename: pair-based renaming for easier bulk column renaming #2806
  • sort: add --natural option for natural/human-friendly sorting #2808
  • schema: set JSON Schema description to the command line used for generation
  • joinp & sqlp: add --decimal-comma option for writing operations
  • joinp: add --decimal-comma and --delimiter validation
  • sqlp: add --decimal-comma and --delimiter validation
  • validate: more robust JSON Schema schema validation with granular error messages
  • validate: show invalid record in debug format for UTF-8 failures
  • Enhanced completions for qsv v5.1.0 and v6.0.0

Documentation & Examples:

  • lens: improved examples in usage text
  • schema: expand examples and add -P shortcut for --prompt option
  • sqlp: update description to note support for input beyond CSVs
  • Polars SQL documentation noting it's a PostgreSQL dialect
  • Added link to Polars 0.49.0 release notes
  • MSRV documentation updated to Rust 1.88
  • Additional conditions for when to use "portable" binaries

Changed

Performance Improvements:

  • frequency: microoptimize null value handling and preallocate vectors
  • stats: preallocate with_capacity for Unsorted struct and coefficient of variation handling improvements
  • transpose: performance refactoring with optimized buffer handling
  • validate: microoptimizations for JSON instance handling and buffer capacity improvements
  • apply: bigger reader buffer as apply is batch oriented
  • Enabled setter for read and write buffer sizing configuration
  • Various microoptimizations across commands

Polars Engine Updates:

  • Bumped Polars from 0.48 to 0.49.x series
  • Adapted to new Polars PlPath API
  • Updated to use latest Polars upstream throughout development cycle
  • Enabled simd-json compiler hints feature on nightly builds

Dependency Updates:

  • Major updates:

    • Polars: 0.48 → 0.49.x
    • Tokio: 1.45.1 → 1.46.1
    • qsv-stats: 0.33.0 → 0.35.0
    • kiddo: 5.0.3 → 5.2.2
    • indexmap: 2.9.0 → 2.10.0
    • calamine: updated to latest upstream
    • redis: 0.32.2 → 0.32.3
    • sysinfo: 0.35.2 → 0.36.0
    • geosuggest: bumped to 0.8
  • Build dependencies:

    • flexi_logger: 0.31.0 → 0.31.2
    • arboard: 3.5.0 → 3.6.0
    • minijinja: 2.10.2 → 2.11.0
    • minijinja-contrib: 2.10.2 → 2.11.0
    • zip: 4.1.0 → 4.3.0
    • reqwest: 0.12.20 → 0.12.22
    • indicatif: 0.17.11 → 0.17.12
    • phf: 0.11.3 → 0.12.1
    • human-panic: 2.0.2 → 2.0.3
    • jaq-std: 2.1.1 → 2.1.2
    • jaq-core: 2.2.0 → 2.2.1
    • jaq-json: 1.1.2 → 1.1.3

Code Quality & Maintenance:

  • Applied clippy lint suggestions including collapsible_if, needless_return, redundant_clone, and manual_is_multiple_of
  • Updated MSRV to Rust 1.88
  • Set nightly to 2025-06-27
  • Removed hardware-lock-elision feature on parking_lot
  • No longer use similar-asserts crate, reverted to standard assert_eq
  • Better TOML formatting
  • Removed unneeded dependency aliases
  • Various code refactoring for better maintainability

Infrastructure:

  • Updated csvlens integration with natural sorting support
  • Switched dependency management approaches for better upstream compatibility
  • Pin plist to 1.7.3 to avoid unnecessary quick-xml bumps
  • Use latest calamine upstream consistently

Fixed

  • validate: clearer JSON Schema schema error messages to differentiate validation types
  • round_num(): should return an empty string if dec_f64.is_nan()
  • joinp: non-equi-join test result order deterministic issues
  • Enhanced Snappy file decompression robustness
  • Fixed geometric mean calculation in stats
  • Better UTF-8 record validation with debug output
  • Various test adjustments to account for dependency updates and behavior changes
  • Resolved several clippy warnings and code quality issues

Test Updates:

  • rename: add pair-renaming tests
  • sort: add natural sort tests
  • joinp: add decimal_comma tests
  • sqlp: add decimal-comma validation tests
  • validate: add JSON Schema schema validation tests
  • stats: adjust test cases for qsv-stats 0.35.0 changes
  • excel: re-enable and revert formula tests based on upstream changes

Development Notes

Benchmarks:

  • Comprehensive benchmarking for versions 5.1.0 and 6.6.1
  • Performance comparisons available for major operations

Continuous Integration:

  • Multiple dependency updates via Dependabot automation
  • Comprehensive test coverage maintained throughout development
  • Regular upstream synchronization with Polars and other major dependencies

Pull Requests

NOTE: The changelog entries below only document changes with a corresponding PR. Several changes were committed to master directly and are documented in the release highlights above.

Added

  • lens: add --wrap-mode option in #2805
  • rename: add pair-based renaming in #2806
  • sort: add --natural sort option in #2808

Changed

  • geocode: now uses the faster geosuggest 0.8 crate. index-update subcommand now generates command to use geosuggest crate directly to update/create the index instead of doing it internally.
  • schema: when generating JSON schema, description property set to cmdline used to generate the JSON schema in #2796
  • sqlp & joinp: --decimal-comma option is not only for parsing input CSVs, it's also used when writing output CSVs in #2800
  • transpose: performance refactoring in #2827
  • validate improved JSON Schema schema validation in #2803
  • update completions for qsv v5.1.0 by @rzmk in #2804
  • dep: bump polars to latest upstream - adapt to PlPath api reqt in #2822
  • perf: bump to faster geosuggest to 0.8 in #2837
  • build(deps): bump arboard from 3.5.0 to 3.6.0 by @dependabot[bot] in #2814
  • build(deps): bump flexi_logger from 0.31.0 to 0.31.1 by @dependabot[bot] in #2801
  • build(deps): bump flexi_logger from 0.31.1 to 0.31.2 by @dependabot[bot] in #2812
  • build(deps): bump libc from 0.2.173 to 0.2.174 by @dependabot[bot] in #2794
  • build(deps): bump human-panic from 2.0.2 to 2.0.3 by @dependabot[bot] in #2833
  • build(deps): bump indicatif from 0.17.11 to 0.17.12 by @dependabot[bot] in #2818
  • build(deps): bump jaq-std from 2.1.1 to 2.1.2 by @dependabot[bot] in #2830
  • build(deps): bump jaq-core from 2.2.0 to 2.2.1 by @dependabot[bot] in #2831
  • build(deps): bump jaq-json from 1.1.2 to 1.1.3 by @dependabot[bot] in #2832
  • build(deps): bump minijinja from 2.10.2 to 2.11.0 by @dependabot[bot] in #2815
  • build(deps): bump minijinja-contrib from 2.10.2 to 2.11.0 by @dependabot[bot] in #2816
  • build(deps): bump phf from 0.11.3 to 0.12.1 by @dependabot[bot] in #2797
    ...
Read more

5.1.0

17 Jun 10:41

Choose a tag to compare

[5.1.0] - 2025-06-17

Highlights

  • lens is now colorful by default, with a --monochrome option to turn it off:

     qsv lens /tmp/NYC_311_SR_2010-2020-sample-1M.csv
    
Screenshot 2025-06-17 at 10 02 43 PM
  • lens can now have custom prompts with the --prompt option (with support for ANSI escape codes to format the prompt). Meant to be paired with the --echo-column <colname> option, e.g.:

    qsv lens --prompt $'\033[1;5;31mBlinking red, bold text\033[0m' --echo-column 'Unique Key' \
     /tmp/NYC_311_SR_2010-2020-sample-1M.csv
    

qsvprompt

  • the qsv-stats crate - the underlying engine behind the central stats, frequency and "smart" commands, got a lot of love in this release
  • validate got a tad faster while decreasing its memory footprint. The new --no-format-validation option now also allows you to ignore all JSON Schema "format" keywords (e.g. date, email, url, currency, etc.) when validating CSVs.

Added

  • lens: add --prompt option, add examples to regex-enabled options #2772
  • lens: add --monochrome option, otherwise, columns displayed in different colors #2761
  • validate: add --no-format-validation option when in JSON Schema mode #2762
  • docs: add shell completions badges by @rzmk in #2760
  • feat: added criterion trim algorithm microbenchmarks #2789

Changed

  • frequency: performance microoptimizations - use stats cache column cardinality to pre-alloc & size frequency hash tables
  • geocode: refactor regex handling for performance & maintainability
  • json: preserve key order #2777
  • stats: performance microoptimizations - use unwrap_unchecked() instead of just unwrap() in hot sampling functions
  • validate: major refactoring for added performance/memory efficiency
  • chore: temporarily use qsv-calamine until a new calamine is released #2790
  • Bump cpc from 1.9 to 2 #2770
  • deps: bump criterion from 0.5 to 0.6 #2791
  • deps: use latest csvlens upstream with colorful columnshttps://github.com/dathere/qsv/commit/f2c9322e33a0ac335dafec10a490c871d3de0a6c
  • deps: temporarily use qsv-calamine until a new calamine is released #2790
  • deps: bump our patched forks of cached, csvs_convert, json-objects-to-csv, jsonschema, localzone, rfd, self_update until PRs are merged or new releases are made
  • deps: bump zip from 3 to 4 in 75909d2
  • deps: bump polars to 0.48.1 at 49ce57a revision
  • build(deps): bump atoi_simd from 0.16.0 to 0.16.1 by @dependabot in #2766
  • build(deps): bump bytemuck from 1.23.0 to 1.23.1 by @dependabot in #2778
  • build(deps): bump flate2 from 1.1.1 to 1.1.2 by @dependabot in #2781
  • build(deps): bump flexi_logger from 0.30.1 to 0.30.2 by @dependabot in #2765
  • build(deps): bump flexi_logger from 0.30.2 to 0.31.0 by @dependabot in #2793
  • build(deps): bump hashbrown from 0.15.3 to 0.15.4 by @dependabot in #2779
  • build(deps): bump libc from 0.2.172 to 0.2.173 by @dependabot in #2787
  • build(deps): bump mimalloc from 0.1.46 to 0.1.47 by @dependabot in #2792
  • build(deps): bump mlua from 0.10.3 to 0.10.5 by @dependabot in #2758
  • build(deps): bump num_cpus from 1.16.0 to 1.17.0 by @dependabot in #2771
  • build(deps): bump parking_lot from 0.12.3 to 0.12.4 by @dependabot in #2768
  • build(deps): bump pyo3 from 0.25.0 to 0.25.1 by @dependabot in #2785
  • deps: upgrade qsv-stats from 0.32 to 0.33, which features major memory and performance optimizations behind the stats & frequency commands #2786
  • deps: bump redis from 0.29.5 to 0.32
  • build(deps): bump reqwest from 0.12.15 to 0.12.16 by @dependabot in #2764
  • build(deps): bump reqwest from 0.12.16 to 0.12.18 by @dependabot in #2767
  • build(deps): bump reqwest from 0.12.18 to 0.12.19 by @dependabot in #2773
  • build(deps): bump reqwest from 0.12.19 to 0.12.20 by @dependabot in #2782
  • build(deps): bump rust_decimal from 1.37.1 to 1.37.2 by @dependabot in #2788
  • build(deps): bump smallvec from 1.15.0 to 1.15.1 by @dependabot in #2780
  • build(deps): bump sysinfo from 0.35.1 to 0.35.2 by @dependabot in #2774
  • build(deps): bump titlecase from 3.5.0 to 3.6.0 by @dependabot in #2775
  • build(deps): bump tokio from 1.45.0 to 1.45.1 by @dependabot in #2759
  • build(deps): bump uuid from 1.16.0 to 1.17.0 by @dependabot in #2757
  • applied select clippy suggestions
  • updated indirect dependencies
  • set Rust nightly to 2025-05-21, the same nightly Polars uses 872ade1

Fixed:

  • fix: frequency recover from non-fatal absence of stats cache, instead of panicking b2821a0
  • fix: flaky json tests caused by hardcoding name of intermediate file - 62ca310
  • fix: flaky reverse property tests by handling BOM characters cefd490
  • fix: util::process_input helper does not honor QSV_SKIP_FORMAT_CHECK when processing dir input #2784

Full Changelog: 5.0.3...5.1.0