3.0.0
[3.0.0] - 2025-02-13
Highlights:
sample: Five new sampling methods! In addition to reservoir & indexed - added bernoulli, systematic, stratified, weighted & cluster sampling. And they're all memory efficient so you should be able to sample arbitrarily large datasets!stats: Added "sortiness" [-1 (Descending) to 1 (Ascending)] & "uniqueness_ratio" [0 (many repeated values) to 1 (All unique values)] stats (more info).
The qsv-stats engine was also optimized to squeeze out more performance, withstatsnow 2.6x faster while using less memory despite the addition of these new stats.diff: is now a "smart" command, so that it uses the stats cache to short-circuit diffs if files are identical per their fingerprint hashes, and to validate that the diff key column is all unique.- The stats cache has been refactored and improved performance for "smart" commands:
frequencyis not only 3.3x faster, it uses far less memory as it now doesn't need to maintain hashmaps for columns with all unique values.tojsonlis 2.25x fasterschemais 1.4x faster
luaugot a major performance boost with the v0.660 engine upgrade, taking advantage of several compiler optimizations.luauis now up to 3.1x faster!validatehad a major performance regression - going down from 3.295 seconds in v2.1.0 to 13.159 seconds in v2.2.1 in the benchmarks. 4x slower! With the jsonschema 0.29 crate update,validatenow clocks in 3.022 seconds!templatealso got a big boost and is now 2.9x faster with the minijinja 2.7 crate update.
Added
joinp: additionaljoinpasofjoin sort and match options #2486stats: add "sortiness" statistic #2499statsadd uniqueness_ratio #2521stats&frequency: add--vis-whitespaceoption. Fulfills #2501 #2503sample: add more sampling methods (in addition to indexed and reservoir - added bernoulli, systematic, stratified, weighted & cluster sampling) and made them all memory efficient so we can sample arbitrarily large datasets: #2507 & #2511diff: makediffa "smart" command. Fulfills #2493 and #2509 #2518benchmarks: added new benchmarks forsamplefor new sampling methods d758c54
Changed
luau: bump from 0.653 to 0.660 and optimize for performance 4402df6 de429b4 07ff8b8 3211f5cstats: compute string len stats only for string columns #2495contrib(completions): update qsv completions for qsv 2.2.1 by @rzmk in #2494- deps: bump polars to latest upstream after its py-1.22.0 release
- deps: backported csv-core 0.1.12 fix to our qsv-optimized csv-core fork dathere/rust-csv@5d0916e
- build(deps): bump actions/setup-python from 5.3.0 to 5.4.0 by @dependabot in #2488
- build(deps): bump bytes from 1.9.0 to 1.10.0 by @dependabot in #2497
- build(deps): bump data-encoding from 2.7.0 to 2.8.0 by @dependabot in #2512
- build(deps): bump geosuggest-core from 0.6.5 to 0.6.6 by @dependabot in #2520
- build(deps): bump geosuggest-utils from 0.6.5 to 0.6.6 by @dependabot in #2519
- build(deps): bump jsonschema from 0.28.3 to 0.29.0 by @dependabot in #2510
- build(deps): bump minijinja from 2.6.0 to 2.7.0 by @dependabot in #2489
- build(deps): bump mlua from 0.10.2 to 0.10.3 by @dependabot in #2485
- build(deps): bump qsv-stats from 0.27.0 to 0.28.0 by @dependabot in #2496
- build(deps): bump qsv-stats from 0.28.0 to 0.29.0 by @dependabot in #2498
- build(deps): bump qsv-stats from 0.29.0 to 0.30.0 by @dependabot in #2505
- chore: Bump rand to 0.9 #2504
- build(deps): bump simple-home-dir from 0.4.6 to 0.4.7 by @dependabot in #2515
- build(deps): bump uuid from 1.12.1 to 1.13.1 by @dependabot in #2500
- bumped numerous indirect dependencies to latest versions
- applied select clippy lint suggestions
- bumped MSRV to latest Rust stable - v1.84.1
Fixed
- docs: QSV_AUTOINDEX => QSV_AUTOINDEX_SIZE typo. Fixes #2479 #2484
- fix:
search&searchsetoff by 1 when using--flagoption. Fixes #2508 #2513
Full Changelog: 2.2.1...3.0.0