-
Notifications
You must be signed in to change notification settings - Fork 133
Default to ZSTD compression when writing Parquet #981
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 4 commits
0c3fed9
b1db46c
819de0d
56965f4
df7d65e
f62a7a8
b5b3c47
2362992
b86b142
41e1742
fe502e8
67529b8
811f633
50a58b3
55fc97e
73519fe
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -620,16 +620,25 @@ def write_csv(self, path: str | pathlib.Path, with_header: bool = False) -> None | |||||||||||||||||||||||||||||||||||||
| def write_parquet( | ||||||||||||||||||||||||||||||||||||||
| self, | ||||||||||||||||||||||||||||||||||||||
| path: str | pathlib.Path, | ||||||||||||||||||||||||||||||||||||||
| compression: str = "uncompressed", | ||||||||||||||||||||||||||||||||||||||
| compression: str = "ZSTD", | ||||||||||||||||||||||||||||||||||||||
| compression_level: int | None = None, | ||||||||||||||||||||||||||||||||||||||
| ) -> None: | ||||||||||||||||||||||||||||||||||||||
| """Execute the :py:class:`DataFrame` and write the results to a Parquet file. | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| Args: | ||||||||||||||||||||||||||||||||||||||
| path: Path of the Parquet file to write. | ||||||||||||||||||||||||||||||||||||||
| compression: Compression type to use. | ||||||||||||||||||||||||||||||||||||||
| compression_level: Compression level to use. | ||||||||||||||||||||||||||||||||||||||
| """ | ||||||||||||||||||||||||||||||||||||||
| compression: Compression type to use. Default is "ZSTD". | ||||||||||||||||||||||||||||||||||||||
| compression_level: Compression level to use. For ZSTD, the | ||||||||||||||||||||||||||||||||||||||
| recommended range is 1 to 22, with the default being 4. Higher levels | ||||||||||||||||||||||||||||||||||||||
| provide better compression but slower speed. | ||||||||||||||||||||||||||||||||||||||
| """ | ||||||||||||||||||||||||||||||||||||||
| if compression == "ZSTD": | ||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||
| "zstd" => Compression::ZSTD( | |
| ZstdLevel::try_new(verify_compression_level(compression_level)? as i32) | |
| .map_err(|e| PyValueError::new_err(format!("{e}")))?, | |
| ), |
Compression levels are tested in:
datafusion-python/python/tests/test_dataframe.py
Lines 1093 to 1106 in 63b13da
| @pytest.mark.parametrize( | |
| "compression, compression_level", | |
| [("gzip", 12), ("brotli", 15), ("zstd", 23), ("wrong", 12)], | |
| ) | |
| def test_write_compressed_parquet_wrong_compression_level( | |
| df, tmp_path, compression, compression_level | |
| ): | |
| path = tmp_path | |
| with pytest.raises(ValueError): | |
| df.write_parquet( | |
| str(path), | |
| compression=compression, | |
| compression_level=compression_level, |
Uh oh!
There was an error while loading. Please reload this page.