Skip to content

Commit 3967131

Browse files
committed
.
1 parent c5c92e1 commit 3967131

File tree

14 files changed

+462
-312
lines changed

14 files changed

+462
-312
lines changed

.github/workflows/R-CMD-check-mt.yaml

Lines changed: 0 additions & 61 deletions
This file was deleted.

.github/workflows/R-CMD-check.yaml

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -22,12 +22,10 @@ jobs:
2222
fail-fast: false
2323
matrix:
2424
config:
25+
- {os: macOS-latest, r: 'devel', http-user-agent: 'release'}
2526
- {os: macOS-latest, r: 'release'}
26-
27+
- {os: windows-latest, r: 'devel', http-user-agent: 'release'}
2728
- {os: windows-latest, r: 'release'}
28-
# Use 3.6 to trigger usage of RTools35
29-
# - {os: windows-latest, r: '3.6'}
30-
3129
- {os: ubuntu-latest, r: 'devel', http-user-agent: 'release'}
3230
- {os: ubuntu-latest, r: 'release'}
3331
- {os: ubuntu-latest, r: 'oldrel-1'}
@@ -61,4 +59,5 @@ jobs:
6159

6260
- uses: r-lib/actions/check-r-package@v2
6361
with:
62+
args: 'c("--no-manual", "--as-cran")'
6463
upload-snapshots: true

.github/workflows/rhub.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,8 @@ on:
2222
id:
2323
description: 'Unique ID. You can leave this empty now.'
2424
type: string
25+
env:
26+
QS_EXTENDED_TESTS: true
2527

2628
jobs:
2729

ChangeLog

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
Version 0.1.7 (2026-01-11)
2+
* Replace `ATTRIB` and `SET_ATTRIB` calls (no longer in API)
23
* Enable TBB by default (via RcppParallel); use configure --without-TBB to force disable
34
* Add zstd file substitution helpers `zstd_in` and `zstd_out`
45
* Add ZSTD bindings section to the vignette

R/zstd_file_functions.R

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
#' Zstd file helpers
22
#'
3-
#' Helpers for compressing and decompressing zstd files, plus wrappers that
4-
#' read or write using other functions.
3+
#' Helpers for compressing and decompressing zstd files.
54
#'
65
#' @name zstd_file_functions
76
NULL
@@ -12,6 +11,7 @@ NULL
1211
#'
1312
#' @usage zstd_compress_file(input_file, output_file, compress_level = qopt("compress_level"))
1413
#'
14+
#' @name zstd_compress_file
1515
#' @param input_file Path to the input file.
1616
#' @param output_file Path to the output file.
1717
#' @param compress_level The compression level used.
@@ -34,6 +34,7 @@ NULL
3434
#'
3535
#' @usage zstd_decompress_file(input_file, output_file)
3636
#'
37+
#' @name zstd_decompress_file
3738
#' @param input_file Path to the input file.
3839
#' @param output_file Path to the output file.
3940
#'

README.md

Lines changed: 82 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,55 @@ qs_save(data, "myfile.qs2")
7878
data <- qs_read("myfile.qs2", validate_checksum = TRUE)
7979
```
8080

81+
# Bindings to ZSTD compression library
82+
83+
The package exposes the ZSTD compression library for both in memory data
84+
and file workflows.
85+
86+
## In memory compression and decompression
87+
88+
Use these functions when you already have raw vectors in memory and want
89+
direct control of compression.
90+
91+
``` r
92+
x <- serialize(mtcars, connection = NULL)
93+
xz <- zstd_compress_raw(x, compress_level = 3)
94+
x2 <- zstd_decompress_raw(xz)
95+
stopifnot(identical(x, x2))
96+
```
97+
98+
## File compression
99+
100+
These functions mirror typical file compression tools and keep the
101+
workflow simple when you want explicit input and output files.
102+
103+
``` r
104+
infile <- tempfile()
105+
writeBin(as.raw(1:5), infile)
106+
zfile <- tempfile(fileext = ".zst")
107+
zstd_compress_file(infile, zfile, compress_level = 1)
108+
outfile <- tempfile()
109+
zstd_decompress_file(zfile, outfile)
110+
stopifnot(identical(readBin(infile, "raw", 5), readBin(outfile, "raw", 5)))
111+
```
112+
113+
## zstd_in and zstd_out
114+
115+
These generic wrappers substitute a zstd compressed file for a normal
116+
file path, so you can add zstd compression support to existing functions
117+
for reading and writing data.
118+
119+
``` r
120+
# library(data.table)
121+
save_file <- tempfile(fileext = ".csv.zst")
122+
123+
# write out zstd compressed table
124+
zstd_out(data.table::fwrite, mtcars, file = save_file)
125+
126+
# read in zstd compressed table
127+
dt <- zstd_in(data.table::fread, file = save_file)
128+
```
129+
81130
# The qdata format
82131

83132
The package also introduces the `qdata` format which has its own
@@ -103,7 +152,7 @@ A summary across 4 datasets is presented below.
103152
#### Single-threaded
104153

105154
| Algorithm | Compression | Save Time (s) | Read Time (s) |
106-
| --------------- | ----------- | ------------- | ------------- |
155+
|-----------------|-------------|---------------|---------------|
107156
| qs2 | 7.96 | 13.4 | 50.4 |
108157
| qdata | 8.45 | 10.5 | 34.8 |
109158
| base::serialize | 1.1 | 8.87 | 51.4 |
@@ -115,24 +164,24 @@ A summary across 4 datasets is presented below.
115164
#### Multi-threaded (8 threads)
116165

117166
| Algorithm | Compression | Save Time (s) | Read Time (s) |
118-
| ----------- | ----------- | ------------- | ------------- |
167+
|-------------|-------------|---------------|---------------|
119168
| qs2 | 7.96 | 3.79 | 48.1 |
120169
| qdata | 8.45 | 1.98 | 33.1 |
121170
| fst | 2.59 | 5.05 | 46.6 |
122171
| parquet | 8.29 | 20.2 | 37.0 |
123172
| qs (legacy) | 7.97 | 3.21 | 52.0 |
124173

125-
- `qs2`, `qdata` and `qs` with `compress_level = 3`
126-
- `parquet` via the `arrow` package using zstd `compression_level = 3`
127-
- `base::serialize` with `ascii = FALSE` and `xdr = FALSE`
174+
- `qs2`, `qdata` and `qs` with `compress_level = 3`
175+
- `parquet` via the `arrow` package using zstd `compression_level = 3`
176+
- `base::serialize` with `ascii = FALSE` and `xdr = FALSE`
128177

129178
**Datasets used**
130179

131-
- `1000 genomes non-coding VCF` 1000 genomes non-coding variants (2743
132-
MB)
133-
- `B-cell data` B-cell mouse data, Greiff 2017 (1057 MB)
134-
- `IP location` IPV4 range data with location information (198 MB)
135-
- `Netflix movie ratings` Netflix ML prediction dataset (571 MB)
180+
- `1000 genomes non-coding VCF` 1000 genomes non-coding variants (2743
181+
MB)
182+
- `B-cell data` B-cell mouse data, Greiff 2017 (1057 MB)
183+
- `IP location` IPV4 range data with location information (198 MB)
184+
- `Netflix movie ratings` Netflix ML prediction dataset (571 MB)
136185

137186
These datasets are openly licensed and represent a combination of
138187
numeric and text data across multiple domains. See
@@ -181,32 +230,32 @@ The following global options control the behavior of the `qs2`
181230
functions. These global options can be queried or modified using `qopt`
182231
function.
183232
184-
- **compress\_level**
185-
The default compression level used when compressing data.
186-
**Default:** `3L`
233+
- **compress_level**
234+
The default compression level used when compressing data.
235+
**Default:** `3L`
187236
188-
- **shuffle**
189-
A logical flag indicating whether to allow byte shuffling during
190-
compression.
191-
**Default:** `TRUE`
237+
- **shuffle**
238+
A logical flag indicating whether to allow byte shuffling during
239+
compression.
240+
**Default:** `TRUE`
192241
193-
- **nthreads**
194-
The number of threads used for compression and decompression.
195-
**Default:** `1L`
242+
- **nthreads**
243+
The number of threads used for compression and decompression.
244+
**Default:** `1L`
196245
197-
- **validate\_checksum**
198-
A logical flag indicating whether to validate the stored checksum
199-
when reading data.
200-
**Default:** `FALSE`
246+
- **validate_checksum**
247+
A logical flag indicating whether to validate the stored checksum when
248+
reading data.
249+
**Default:** `FALSE`
201250
202-
- **warn\_unsupported\_types**
203-
For `qd_save`, a logical flag indicating whether to warn when saving
204-
an object with unsupported types.
205-
**Default:** `TRUE`
251+
- **warn_unsupported_types**
252+
For `qd_save`, a logical flag indicating whether to warn when saving
253+
an object with unsupported types.
254+
**Default:** `TRUE`
206255
207-
- **use\_alt\_rep**
208-
For `qd_read`, a logical flag indicating whether to use ALTREP when
209-
reading in string data.
210-
**Default:** `FALSE`
256+
- **use_alt_rep**
257+
For `qd_read`, a logical flag indicating whether to use ALTREP when
258+
reading in string data.
259+
**Default:** `FALSE`
211260
212-
-----
261+
------------------------------------------------------------------------

man/zstd_compress_file.Rd

Lines changed: 0 additions & 28 deletions
This file was deleted.

man/zstd_decompress_file.Rd

Lines changed: 0 additions & 28 deletions
This file was deleted.

man/zstd_file_functions.Rd

Lines changed: 45 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)