Skip to content

Commit d103d88

Browse files
authored
chore: remove LZO Parquet compression (#19726)
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes #19720. ## Rationale for this change - Choosing LZO compression errors, I think it might never get supported so the best option moving forward is to remove it algother and update the docs. <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? - Removed LZO from parse_compression_string() function - Removed docs - Updated exptected test output <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? Yes <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? User choosing LZO as compression will get a clear error message: ``` Unknown or unsupported parquet compression: lzo. Valid values are: uncompressed, snappy, gzip(level), brotli(level), lz4, zstd(level), and lz4_raw. ``` <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
1 parent f9697c1 commit d103d88

File tree

5 files changed

+37
-41
lines changed

5 files changed

+37
-41
lines changed

datafusion/common/src/config.rs

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -772,7 +772,7 @@ config_namespace! {
772772

773773
/// (writing) Sets default parquet compression codec.
774774
/// Valid values are: uncompressed, snappy, gzip(level),
775-
/// lzo, brotli(level), lz4, zstd(level), and lz4_raw.
775+
/// brotli(level), lz4, zstd(level), and lz4_raw.
776776
/// These values are not case sensitive. If NULL, uses
777777
/// default parquet writer setting
778778
///
@@ -2499,7 +2499,7 @@ config_namespace_with_hashmap! {
24992499

25002500
/// Sets default parquet compression codec for the column path.
25012501
/// Valid values are: uncompressed, snappy, gzip(level),
2502-
/// lzo, brotli(level), lz4, zstd(level), and lz4_raw.
2502+
/// brotli(level), lz4, zstd(level), and lz4_raw.
25032503
/// These values are not case-sensitive. If NULL, uses
25042504
/// default parquet options
25052505
pub compression: Option<String>, transform = str::to_lowercase, default = None

datafusion/common/src/file_options/parquet_writer.rs

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -341,10 +341,6 @@ pub fn parse_compression_string(
341341
level,
342342
)?))
343343
}
344-
"lzo" => {
345-
check_level_is_none(codec, &level)?;
346-
Ok(parquet::basic::Compression::LZO)
347-
}
348344
"brotli" => {
349345
let level = require_level(codec, level)?;
350346
Ok(parquet::basic::Compression::BROTLI(BrotliLevel::try_new(
@@ -368,7 +364,7 @@ pub fn parse_compression_string(
368364
_ => Err(DataFusionError::Configuration(format!(
369365
"Unknown or unsupported parquet compression: \
370366
{str_setting}. Valid values are: uncompressed, snappy, gzip(level), \
371-
lzo, brotli(level), lz4, zstd(level), and lz4_raw."
367+
brotli(level), lz4, zstd(level), and lz4_raw."
372368
))),
373369
}
374370
}

datafusion/sqllogictest/test_files/information_schema.slt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -373,7 +373,7 @@ datafusion.execution.parquet.bloom_filter_on_read true (reading) Use any availab
373373
datafusion.execution.parquet.bloom_filter_on_write false (writing) Write bloom filters for all columns when creating parquet files
374374
datafusion.execution.parquet.coerce_int96 NULL (reading) If true, parquet reader will read columns of physical type int96 as originating from a different resolution than nanosecond. This is useful for reading data from systems like Spark which stores microsecond resolution timestamps in an int96 allowing it to write values with a larger date range than 64-bit timestamps with nanosecond resolution.
375375
datafusion.execution.parquet.column_index_truncate_length 64 (writing) Sets column index truncate length
376-
datafusion.execution.parquet.compression zstd(3) (writing) Sets default parquet compression codec. Valid values are: uncompressed, snappy, gzip(level), lzo, brotli(level), lz4, zstd(level), and lz4_raw. These values are not case sensitive. If NULL, uses default parquet writer setting Note that this default setting is not the same as the default parquet writer setting.
376+
datafusion.execution.parquet.compression zstd(3) (writing) Sets default parquet compression codec. Valid values are: uncompressed, snappy, gzip(level), brotli(level), lz4, zstd(level), and lz4_raw. These values are not case sensitive. If NULL, uses default parquet writer setting Note that this default setting is not the same as the default parquet writer setting.
377377
datafusion.execution.parquet.created_by datafusion (writing) Sets "created by" property
378378
datafusion.execution.parquet.data_page_row_count_limit 20000 (writing) Sets best effort maximum number of rows in data page
379379
datafusion.execution.parquet.data_pagesize_limit 1048576 (writing) Sets best effort maximum size of data page in bytes

docs/source/user-guide/configs.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,7 @@ The following configuration settings are available:
9696
| datafusion.execution.parquet.write_batch_size | 1024 | (writing) Sets write_batch_size in bytes |
9797
| datafusion.execution.parquet.writer_version | 1.0 | (writing) Sets parquet writer version valid values are "1.0" and "2.0" |
9898
| datafusion.execution.parquet.skip_arrow_metadata | false | (writing) Skip encoding the embedded arrow metadata in the KV_meta This is analogous to the `ArrowWriterOptions::with_skip_arrow_metadata`. Refer to <https://docs.rs/parquet/53.3.0/parquet/arrow/arrow_writer/struct.ArrowWriterOptions.html#method.with_skip_arrow_metadata> |
99-
| datafusion.execution.parquet.compression | zstd(3) | (writing) Sets default parquet compression codec. Valid values are: uncompressed, snappy, gzip(level), lzo, brotli(level), lz4, zstd(level), and lz4_raw. These values are not case sensitive. If NULL, uses default parquet writer setting Note that this default setting is not the same as the default parquet writer setting. |
99+
| datafusion.execution.parquet.compression | zstd(3) | (writing) Sets default parquet compression codec. Valid values are: uncompressed, snappy, gzip(level), brotli(level), lz4, zstd(level), and lz4_raw. These values are not case sensitive. If NULL, uses default parquet writer setting Note that this default setting is not the same as the default parquet writer setting. |
100100
| datafusion.execution.parquet.dictionary_enabled | true | (writing) Sets if dictionary encoding is enabled. If NULL, uses default parquet writer setting |
101101
| datafusion.execution.parquet.dictionary_page_size_limit | 1048576 | (writing) Sets best effort maximum dictionary page size, in bytes |
102102
| datafusion.execution.parquet.statistics_enabled | page | (writing) Sets if statistics are enabled for any column Valid values are: "none", "chunk", and "page" These values are not case sensitive. If NULL, uses default parquet writer setting |

0 commit comments

Comments
 (0)