Skip to content

ListingTable provider does not prune partitions when no filters are supplied #17957

@peasee

Description

@peasee

Describe the bug

When the ListingTable provider performs a scan, it does not prune any partitions when there are no filters supplied.

If there are partitions present that do not match the partition scheme, this results in them being returned from the scan which can cause query errors due to missing partition values. For example, I encountered this while reading a delta lake table which contained a _delta_log directory. The _delta_log was not pruned:

DataSourceExec: file_groups={1 group: [[peasee-hive-test/_delta_log/0000.checkpoint.parquet, peasee-hive-test/pid=1/data.parquet, peasee-hive-test/pid=2/data.parquet]]}

This results in a Invalid partitioning found on disk error when executed when retrieving the partitioning column in the query.

To Reproduce

Setup a hive partitioned object store, with a table partition column. Add an extra random folder (not a partition key, an invalid partition key), and perform a scan with no filters.

Expected behavior

The ListingTable provider should correctly prune the partitions when no filters are defined. This seems to already be implied from a note on the ListingOptions:

Files that don't follow this partitioning scheme will be
ignored.

Additional context

I have already created a fix which I will be raising shortly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions