@@ -78,6 +78,55 @@ qs_save(data, "myfile.qs2")
7878data <- qs_read(" myfile.qs2" , validate_checksum = TRUE )
7979```
8080
81+ # Bindings to ZSTD compression library
82+
83+ The package exposes the ZSTD compression library for both in memory data
84+ and file workflows.
85+
86+ ## In memory compression and decompression
87+
88+ Use these functions when you already have raw vectors in memory and want
89+ direct control of compression.
90+
91+ ``` r
92+ x <- serialize(mtcars , connection = NULL )
93+ xz <- zstd_compress_raw(x , compress_level = 3 )
94+ x2 <- zstd_decompress_raw(xz )
95+ stopifnot(identical(x , x2 ))
96+ ```
97+
98+ ## File compression
99+
100+ These functions mirror typical file compression tools and keep the
101+ workflow simple when you want explicit input and output files.
102+
103+ ``` r
104+ infile <- tempfile()
105+ writeBin(as.raw(1 : 5 ), infile )
106+ zfile <- tempfile(fileext = " .zst" )
107+ zstd_compress_file(infile , zfile , compress_level = 1 )
108+ outfile <- tempfile()
109+ zstd_decompress_file(zfile , outfile )
110+ stopifnot(identical(readBin(infile , " raw" , 5 ), readBin(outfile , " raw" , 5 )))
111+ ```
112+
113+ ## zstd_in and zstd_out
114+
115+ These generic wrappers substitute a zstd compressed file for a normal
116+ file path, so you can add zstd compression support to existing functions
117+ for reading and writing data.
118+
119+ ``` r
120+ # library(data.table)
121+ save_file <- tempfile(fileext = " .csv.zst" )
122+
123+ # write out zstd compressed table
124+ zstd_out(data.table :: fwrite , mtcars , file = save_file )
125+
126+ # read in zstd compressed table
127+ dt <- zstd_in(data.table :: fread , file = save_file )
128+ ```
129+
81130# The qdata format
82131
83132The package also introduces the ` qdata ` format which has its own
@@ -103,7 +152,7 @@ A summary across 4 datasets is presented below.
103152#### Single-threaded
104153
105154| Algorithm | Compression | Save Time (s) | Read Time (s) |
106- | --------------- | ----------- | ------------- | ------------- |
155+ | ----------------- | ------------- | --------------- | --------------- |
107156| qs2 | 7.96 | 13.4 | 50.4 |
108157| qdata | 8.45 | 10.5 | 34.8 |
109158| base::serialize | 1.1 | 8.87 | 51.4 |
@@ -115,24 +164,24 @@ A summary across 4 datasets is presented below.
115164#### Multi-threaded (8 threads)
116165
117166| Algorithm | Compression | Save Time (s) | Read Time (s) |
118- | ----------- | ----------- | ------------- | ------------- |
167+ | ------------- | ------------- | --------------- | --------------- |
119168| qs2 | 7.96 | 3.79 | 48.1 |
120169| qdata | 8.45 | 1.98 | 33.1 |
121170| fst | 2.59 | 5.05 | 46.6 |
122171| parquet | 8.29 | 20.2 | 37.0 |
123172| qs (legacy) | 7.97 | 3.21 | 52.0 |
124173
125- - ` qs2 ` , ` qdata ` and ` qs ` with ` compress_level = 3 `
126- - ` parquet ` via the ` arrow ` package using zstd ` compression_level = 3 `
127- - ` base::serialize ` with ` ascii = FALSE ` and ` xdr = FALSE `
174+ - ` qs2 ` , ` qdata ` and ` qs ` with ` compress_level = 3 `
175+ - ` parquet ` via the ` arrow ` package using zstd ` compression_level = 3 `
176+ - ` base::serialize ` with ` ascii = FALSE ` and ` xdr = FALSE `
128177
129178** Datasets used**
130179
131- - ` 1000 genomes non-coding VCF ` 1000 genomes non-coding variants (2743
132- MB)
133- - ` B-cell data ` B-cell mouse data, Greiff 2017 (1057 MB)
134- - ` IP location ` IPV4 range data with location information (198 MB)
135- - ` Netflix movie ratings ` Netflix ML prediction dataset (571 MB)
180+ - ` 1000 genomes non-coding VCF ` 1000 genomes non-coding variants (2743
181+ MB)
182+ - ` B-cell data ` B-cell mouse data, Greiff 2017 (1057 MB)
183+ - ` IP location ` IPV4 range data with location information (198 MB)
184+ - ` Netflix movie ratings ` Netflix ML prediction dataset (571 MB)
136185
137186These datasets are openly licensed and represent a combination of
138187numeric and text data across multiple domains. See
@@ -181,32 +230,32 @@ The following global options control the behavior of the `qs2`
181230functions. These global options can be queried or modified using `qopt`
182231function.
183232
184- - **compress\_level **
185- The default compression level used when compressing data.
186- **Default:** `3L`
233+ - **compress_level **
234+ The default compression level used when compressing data.
235+ **Default:** `3L`
187236
188- - **shuffle**
189- A logical flag indicating whether to allow byte shuffling during
190- compression.
191- **Default:** `TRUE`
237+ - **shuffle**
238+ A logical flag indicating whether to allow byte shuffling during
239+ compression.
240+ **Default:** `TRUE`
192241
193- - **nthreads**
194- The number of threads used for compression and decompression.
195- **Default:** `1L`
242+ - **nthreads**
243+ The number of threads used for compression and decompression.
244+ **Default:** `1L`
196245
197- - **validate\_checksum **
198- A logical flag indicating whether to validate the stored checksum
199- when reading data.
200- **Default:** `FALSE`
246+ - **validate_checksum **
247+ A logical flag indicating whether to validate the stored checksum when
248+ reading data.
249+ **Default:** `FALSE`
201250
202- - **warn\_unsupported\_types **
203- For `qd_save`, a logical flag indicating whether to warn when saving
204- an object with unsupported types.
205- **Default:** `TRUE`
251+ - **warn_unsupported_types **
252+ For `qd_save`, a logical flag indicating whether to warn when saving
253+ an object with unsupported types.
254+ **Default:** `TRUE`
206255
207- - **use\_alt\_rep **
208- For `qd_read`, a logical flag indicating whether to use ALTREP when
209- reading in string data.
210- **Default:** `FALSE`
256+ - **use_alt_rep **
257+ For `qd_read`, a logical flag indicating whether to use ALTREP when
258+ reading in string data.
259+ **Default:** `FALSE`
211260
212- -----
261+ ------------------------------------------------------------------------
0 commit comments