Commit 14cd71e
feat: add compression level configuration for JSON/CSV writers (#18954)
## Which issue does this PR close?
Closes #18947
## Rationale for this change
Currently, DataFusion uses default compression levels when writing
compressed JSON and CSV files. For ZSTD, this means level 3, which
prioritizes speed over compression ratio. Users working with large
datasets who want to optimize for storage costs or network transfer have
no way to increase the compression level.
This is particularly important for cloud data lake scenarios where
storage and egress costs can be significant.
## What changes are included in this PR?
- Add `compression_level: Option<u32>` field to `JsonOptions` and
`CsvOptions` in `config.rs`
- Add `convert_async_writer_with_level()` method to
`FileCompressionType` (non-breaking API extension)
- Keep original `convert_async_writer()` as a convenience wrapper for
backward compatibility
- Update `JsonWriterOptions` and `CsvWriterOptions` with
`compression_level` field
- Update `ObjectWriterBuilder` to support compression level
- Update JSON and CSV sinks to pass compression level through the write
pipeline
- Update proto definitions and conversions for serialization support
- Fix unrelated unused import warning in `udf.rs` (conditional
compilation for debug-only imports)
## Are these changes tested?
The changes follow the existing patterns used throughout the codebase.
The implementation was verified by:
- Building successfully with `cargo build`
- Running existing tests with `cargo test --package datafusion-proto`
- All 131 proto integration tests pass
## Are there any user-facing changes?
Yes, users can now specify compression level when writing JSON/CSV
files:
```rust
use datafusion::common::config::JsonOptions;
use datafusion::common::parsers::CompressionTypeVariant;
let json_opts = JsonOptions {
compression: CompressionTypeVariant::ZSTD,
compression_level: Some(9), // Higher compression
..Default::default()
};
```
**Supported compression levels:**
- ZSTD: 1-22 (default: 3)
- GZIP: 0-9 (default: 6)
- BZIP2: 1-9 (default: 9)
- XZ: 0-9 (default: 6)
**This is a non-breaking change** - the original
`convert_async_writer()` method signature is preserved for backward
compatibility.
Co-authored-by: Andrew Lamb <[email protected]>1 parent 1e4bd75 commit 14cd71e
File tree
16 files changed
+195
-8
lines changed- datafusion
- common/src
- file_options
- datasource-csv/src
- datasource-json/src
- datasource/src
- write
- proto-common
- proto
- src
- from_proto
- generated
- to_proto
- proto
- src
- generated
- logical_plan
- tests/cases
16 files changed
+195
-8
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1725 | 1725 | | |
1726 | 1726 | | |
1727 | 1727 | | |
| 1728 | + | |
1728 | 1729 | | |
1729 | 1730 | | |
1730 | 1731 | | |
| |||
2844 | 2845 | | |
2845 | 2846 | | |
2846 | 2847 | | |
| 2848 | + | |
| 2849 | + | |
| 2850 | + | |
| 2851 | + | |
| 2852 | + | |
| 2853 | + | |
| 2854 | + | |
| 2855 | + | |
2847 | 2856 | | |
2848 | 2857 | | |
2849 | 2858 | | |
| |||
2966 | 2975 | | |
2967 | 2976 | | |
2968 | 2977 | | |
| 2978 | + | |
| 2979 | + | |
| 2980 | + | |
| 2981 | + | |
| 2982 | + | |
| 2983 | + | |
| 2984 | + | |
| 2985 | + | |
2969 | 2986 | | |
2970 | 2987 | | |
2971 | 2988 | | |
| |||
2991 | 3008 | | |
2992 | 3009 | | |
2993 | 3010 | | |
| 3011 | + | |
| 3012 | + | |
| 3013 | + | |
| 3014 | + | |
| 3015 | + | |
| 3016 | + | |
| 3017 | + | |
| 3018 | + | |
2994 | 3019 | | |
2995 | 3020 | | |
2996 | 3021 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
31 | 31 | | |
32 | 32 | | |
33 | 33 | | |
| 34 | + | |
| 35 | + | |
34 | 36 | | |
35 | 37 | | |
36 | 38 | | |
| |||
41 | 43 | | |
42 | 44 | | |
43 | 45 | | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
44 | 60 | | |
45 | 61 | | |
46 | 62 | | |
| |||
81 | 97 | | |
82 | 98 | | |
83 | 99 | | |
| 100 | + | |
84 | 101 | | |
85 | 102 | | |
86 | 103 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
| 30 | + | |
30 | 31 | | |
31 | 32 | | |
32 | 33 | | |
33 | 34 | | |
34 | | - | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
35 | 50 | | |
36 | 51 | | |
37 | 52 | | |
| |||
41 | 56 | | |
42 | 57 | | |
43 | 58 | | |
| 59 | + | |
44 | 60 | | |
45 | 61 | | |
46 | 62 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
780 | 780 | | |
781 | 781 | | |
782 | 782 | | |
| 783 | + | |
783 | 784 | | |
784 | 785 | | |
785 | 786 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
373 | 373 | | |
374 | 374 | | |
375 | 375 | | |
| 376 | + | |
376 | 377 | | |
377 | 378 | | |
378 | 379 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
155 | 155 | | |
156 | 156 | | |
157 | 157 | | |
158 | | - | |
| 158 | + | |
159 | 159 | | |
160 | 160 | | |
161 | 161 | | |
162 | 162 | | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
163 | 179 | | |
164 | 180 | | |
165 | | - | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
166 | 187 | | |
167 | | - | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
168 | 194 | | |
169 | | - | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
170 | 201 | | |
171 | | - | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
172 | 208 | | |
173 | 209 | | |
| 210 | + | |
| 211 | + | |
174 | 212 | | |
175 | 213 | | |
176 | 214 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
131 | 131 | | |
132 | 132 | | |
133 | 133 | | |
| 134 | + | |
| 135 | + | |
134 | 136 | | |
135 | 137 | | |
136 | 138 | | |
| |||
145 | 147 | | |
146 | 148 | | |
147 | 149 | | |
| 150 | + | |
148 | 151 | | |
149 | 152 | | |
150 | 153 | | |
| |||
202 | 205 | | |
203 | 206 | | |
204 | 207 | | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
205 | 224 | | |
206 | 225 | | |
207 | 226 | | |
| |||
215 | 234 | | |
216 | 235 | | |
217 | 236 | | |
| 237 | + | |
218 | 238 | | |
219 | 239 | | |
220 | 240 | | |
221 | 241 | | |
222 | 242 | | |
223 | 243 | | |
224 | 244 | | |
225 | | - | |
| 245 | + | |
| 246 | + | |
226 | 247 | | |
227 | 248 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
248 | 248 | | |
249 | 249 | | |
250 | 250 | | |
| 251 | + | |
251 | 252 | | |
252 | 253 | | |
253 | 254 | | |
| |||
273 | 274 | | |
274 | 275 | | |
275 | 276 | | |
| 277 | + | |
276 | 278 | | |
277 | 279 | | |
278 | 280 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
461 | 461 | | |
462 | 462 | | |
463 | 463 | | |
| 464 | + | |
464 | 465 | | |
465 | 466 | | |
466 | 467 | | |
467 | 468 | | |
468 | 469 | | |
469 | 470 | | |
| 471 | + | |
470 | 472 | | |
471 | 473 | | |
472 | 474 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
908 | 908 | | |
909 | 909 | | |
910 | 910 | | |
| 911 | + | |
911 | 912 | | |
912 | 913 | | |
913 | 914 | | |
| |||
1095 | 1096 | | |
1096 | 1097 | | |
1097 | 1098 | | |
| 1099 | + | |
1098 | 1100 | | |
1099 | 1101 | | |
1100 | 1102 | | |
| |||
0 commit comments