HIVE-29637: Incorrect Results for NULL Predicate on Partition Column by deniskuzZ · Pull Request #6515 · apache/hive

deniskuzZ · 2026-05-29T11:28:19Z

…n Column

What changes were proposed in this pull request?

IcebergTableUtil.makeSpecFromName() — a new helper that builds a Hive-compatible partition spec map from Iceberg's PartitionData. When a partition field value is null, it maps it to an actual NULL instead of the literal string "null".

Why are the changes needed?

Queries with WHERE <partition_col> IS NULL on Iceberg tables returned incorrect results (empty result set).

Does this PR introduce any user-facing change?

No

How was this patch tested?

mvn test -Dtest=TestIcebergCliDriver -Dqfile=iceberg_isnull_partition_pruning.q

…n Column

Copilot

Pull request overview

Fixes incorrect results for WHERE <partition_col> IS NULL on Iceberg tables by ensuring null partition values are represented as actual Java null during partition predicate evaluation, and adds a regression qtest to cover the scenario.

Changes:

Add IcebergTableUtil.makeSpecFromName(...) to build a partition spec map from an Iceberg partition path while translating Iceberg null partition values to Java null.
Update HiveIcebergStorageHandler#getPartitionsByExpr to use the new helper when constructing DummyPartition specs for expression evaluation.
Add a new positive qtest (iceberg_isnull_partition_pruning.q + expected output) validating IS NULL partition pruning behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
`iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/IcebergTableUtil.java`	Adds a helper for parsing Iceberg partition paths into a spec map with correct null handling.
`iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java`	Switches partition spec construction in `getPartitionsByExpr` to the new helper to fix `IS NULL` results.
`iceberg/iceberg-handler/src/test/queries/positive/iceberg_isnull_partition_pruning.q`	Adds a regression query file covering `WHERE str_col IS NULL` on an Iceberg-partitioned table.
`iceberg/iceberg-handler/src/test/results/positive/iceberg_isnull_partition_pruning.q.out`	Adds the expected output for the new regression test.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

zabetak

Left some high level comments/questions cause I am not very familiar with the modified APIs.

zabetak · 2026-06-01T16:59:23Z

            String partName = spec.partitionToPath(partitionData);

-            Map<String, String> partSpecMap = Maps.newLinkedHashMap();
-            Warehouse.makeSpecFromName(partSpecMap, new Path(partName), null);


There are more calls to Warehouse#makeSpecFromName methods in IcebergTableUtil#convertNameToMetastorePartition, IcebergQueryCompactor, and potentially other places as well. Do we need to update them as well?

The other call sites (e.g. IcebergTableUtil#convertNameToMetastorePartition) don't have access to partitionData — they only receive the partition path string. Without the underlying partition data, there's no way to disambiguate whether "null" in the path originated from a true NULL value or from a literal string "null". The fix here works precisely because we have partitionData available to check the actual value. If those other paths turn out to be affected, they'd need a different approach to obtain that context.

Note: IcebergQueryCompactor should be fine — it deals with compaction where NULL filtering semantics aren't in play

zabetak · 2026-06-01T17:02:41Z

  }

+  /**
+   * Parses an Iceberg partition path into a Hive-compatible spec map.


The method somewhat "overrides" the default behavior of Warehouse.makeSpecFromName but it's unclear why this specificity is required for Iceberg especially since we want to create a "Hive-compatible" spec map.

The issue is in how the partition path string is generated and then parsed back. spec.partitionToPath(partitionData) uses Iceberg's toHumanString() to convert partition values to path segments. For NULL values, toHumanString() produces the string "null". So the path looks like col=null. When Warehouse.makeSpecFromName parses this path back into a spec map, it just sees the literal string "null" as the value — it has no way to know whether this came from an actual NULL or from a legitimate string value "null". It puts "null" (the string) into the spec map rather than NULL.

The fix uses partitionData to check whether the value is actually NULL at the Iceberg level, and if so, puts NULL into the spec map directly. We need this because toHumanString() is lossy — it conflates NULL and the string "null" into the same path representation.

Thanks for the additional details. I have two small follow-up questions:

How do we avoid this problem in non-Iceberg tables?

Do we have test coverage for partitioning where data contain both the "null" string and actual NULL?

this is an Iceberg-specific problem.

added

For non-Iceberg tables I assume that actual NULL values go into the default partition (hive.exec.default.partition.name). I am trying to understand if when we built the spec we should have the value of hive.exec.default.partition.name or null and if it makes any difference.

Mapping null to HIVE_DEFAULT_PARTITION didn't work in PCR and filter was replaced with false.
unlike prunePartitionNames

if (partitionValue.equals(defaultPartitionName)) { convertedValues.add(null); // Null for default partition. }

evalExprWithPart doesn't have null-awareness

I assume that you refer to org.apache.hadoop.hive.ql.optimizer.ppr.PartExprEvalUtils#evalExprWithPart. If the latter does not have null-awareness then it means that the wrong result issue affects all partitioned tables and not only Iceberg.

I tried the following test case on master and it seems to confirm my hypothesis.

create table pcr_t1 (key string, value string) partitioned by (ds string); INSERT INTO pcr_t1 PARTITION (ds) SELECT 'A', 'V1', '2000-04-08' ; INSERT INTO pcr_t1 PARTITION (ds) SELECT 'B', 'V2', 'null'; INSERT INTO pcr_t1 PARTITION (ds) SELECT 'C', 'V3', null; select key, value, ds from pcr_t1 where ds is null;

Should we treat this as apart?

fixed in 45662e4, @zabetak please re-check

sonarqubecloud · 2026-06-03T15:03:06Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

asf-ci-hive added tests pending tests unstable and removed tests pending labels May 29, 2026

HIVE-29637: Iceberg: Incorrect Results for NULL Predicate on Partitio…

23b0f2a

…n Column

deniskuzZ force-pushed the HIVE-29637 branch from 4bef91c to 23b0f2a Compare May 29, 2026 16:50

asf-ci-hive added tests pending tests passed and removed tests unstable tests pending labels May 29, 2026

deniskuzZ requested a review from Copilot June 1, 2026 05:59

Copilot started reviewing on behalf of deniskuzZ June 1, 2026 05:59 View session

Copilot AI reviewed Jun 1, 2026

View reviewed changes

Comment thread iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/IcebergTableUtil.java

Comment thread iceberg/iceberg-handler/src/test/queries/positive/iceberg_isnull_partition_pruning.q Outdated

aturoczy approved these changes Jun 1, 2026

View reviewed changes

rubenada approved these changes Jun 1, 2026

View reviewed changes

zabetak reviewed Jun 1, 2026

View reviewed changes

asf-ci-hive added tests pending and removed tests passed labels Jun 2, 2026

review comments #2

d82a04d

deniskuzZ force-pushed the HIVE-29637 branch from 44ad713 to d82a04d Compare June 2, 2026 11:34

asf-ci-hive added tests passed tests pending and removed tests pending tests passed labels Jun 2, 2026

deniskuzZ changed the title ~~HIVE-29637: Iceberg: Incorrect Results for NULL Predicate on Partitio…~~ HIVE-29637: Incorrect Results for NULL Predicate on Partitio… Jun 2, 2026

deniskuzZ changed the title ~~HIVE-29637: Incorrect Results for NULL Predicate on Partitio…~~ HIVE-29637: Incorrect Results for NULL Predicate on Partition Column Jun 2, 2026

asf-ci-hive removed the tests pending label Jun 2, 2026

asf-ci-hive added the tests unstable label Jun 2, 2026

deniskuzZ force-pushed the HIVE-29637 branch from 6d5e293 to 98024bb Compare June 3, 2026 08:53

asf-ci-hive added tests pending and removed tests unstable labels Jun 3, 2026

review comments #3

45662e4

deniskuzZ force-pushed the HIVE-29637 branch from 98024bb to 45662e4 Compare June 3, 2026 10:54

asf-ci-hive added tests unstable tests pending and removed tests pending tests unstable labels Jun 3, 2026

asf-ci-hive added tests unstable and removed tests pending labels Jun 3, 2026

Conversation

deniskuzZ commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

zabetak left a comment

Choose a reason for hiding this comment

Uh oh!

zabetak Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

deniskuzZ Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zabetak Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

deniskuzZ Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zabetak Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

deniskuzZ Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

zabetak Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

deniskuzZ Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

zabetak Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

deniskuzZ Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sonarqubecloud Bot commented Jun 3, 2026

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

deniskuzZ commented May 29, 2026 •

edited

Loading

deniskuzZ Jun 1, 2026 •

edited

Loading

deniskuzZ Jun 1, 2026 •

edited

Loading

zabetak Jun 2, 2026 •

edited

Loading

deniskuzZ Jun 2, 2026 •

edited

Loading