Skip to content

Parquet: Avoid page size exceeds i32::MAX #8263

@mapleFU

Description

@mapleFU

Describe the bug

In Parquet, page size cannot exceeds i32, since it uses thirft to store uncompressed_page_size and compressed_page_size.

See: https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L802

It's unlikely to happen, since arrow-rs change page-size to 1MiB by default. However, when we enlarge batch-size and page size limit, it's likely to happen

To Reproduce

Trying to write huge blob to parquet

Expected behavior

Switching to smaller boundery > Throw error > Leaving bad parquet page

Additional context

No

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugparquetChanges to the parquet crate

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions