-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
Describe the bug
When the ListingTable
provider performs a scan, it does not prune any partitions when there are no filters supplied.
If there are partitions present that do not match the partition scheme, this results in them being returned from the scan which can cause query errors due to missing partition values. For example, I encountered this while reading a delta lake table which contained a _delta_log
directory. The _delta_log
was not pruned:
DataSourceExec: file_groups={1 group: [[peasee-hive-test/_delta_log/0000.checkpoint.parquet, peasee-hive-test/pid=1/data.parquet, peasee-hive-test/pid=2/data.parquet]]}
This results in a Invalid partitioning found on disk
error when executed when retrieving the partitioning column in the query.
To Reproduce
Setup a hive partitioned object store, with a table partition column. Add an extra random folder (not a partition key, an invalid partition key), and perform a scan with no filters.
Expected behavior
The ListingTable
provider should correctly prune the partitions when no filters are defined. This seems to already be implied from a note on the ListingOptions
:
Files that don't follow this partitioning scheme will be
ignored.
Additional context
I have already created a fix which I will be raising shortly.