Skip to content

fix(connector): Enable NDV stats collection for Iceberg in native mode#27207

Merged
tdcmeehan merged 1 commit intoprestodb:masterfrom
nmahadevuni:enable_iceberg_ndv_stats_collection
Feb 26, 2026
Merged

fix(connector): Enable NDV stats collection for Iceberg in native mode#27207
tdcmeehan merged 1 commit intoprestodb:masterfrom
nmahadevuni:enable_iceberg_ndv_stats_collection

Conversation

@nmahadevuni
Copy link
Member

@nmahadevuni nmahadevuni commented Feb 25, 2026

Description

This change enables the collection of these NDV stats for Iceberg tables in Prestissimo.

Motivation and Context

We disabled collection of NDV stats for Iceberg tables in Prestissimo because the sketch theta functions were not implemented. Now the functions are implemented as part of PR #25685.

Impact

No impact

Test Plan

There is an existing test in TestPrestoNativeIcebergGeneralQueries.java

== NO RELEASE NOTE ==

Summary by Sourcery

Enhancements:

  • Allow Iceberg connector to collect NDV-related column statistics in native execution mode alongside existing Hive statistics.

@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Feb 25, 2026
@prestodb-ci prestodb-ci requested review from a team, Mariamalmesfer and jkhaliqi and removed request for a team February 25, 2026 17:17
@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Feb 25, 2026

Reviewer's guide (collapsed on small PRs)

Reviewer's Guide

Re-enables collection of all supported column statistics, including NDV, for Iceberg tables in native execution by aligning Iceberg metadata behavior with the base statistics metadata provider.

Sequence diagram for statistics collection in Iceberg native execution

sequenceDiagram
    actor QueryEngine
    participant IcebergHiveMetadata
    participant ConnectorSystemConfig
    participant BaseStatisticsMetadata as BaseStatisticsMetadata
    participant TableStatisticsMetadata

    QueryEngine->>IcebergHiveMetadata: getStatisticsCollectionMetadata(session, tableMetadata)
    IcebergHiveMetadata->>ConnectorSystemConfig: isNativeExecution(session)
    ConnectorSystemConfig-->>IcebergHiveMetadata: nativeExecutionFlag
    Note over IcebergHiveMetadata: Previously skipped base stats when nativeExecutionFlag was true
    IcebergHiveMetadata->>BaseStatisticsMetadata: getStatisticsCollectionMetadata(session, tableMetadata)
    BaseStatisticsMetadata-->>IcebergHiveMetadata: baseColumnStatistics
    IcebergHiveMetadata-->>QueryEngine: TableStatisticsMetadata(supportedStatistics, tableStatistics)
Loading

Updated class diagram for IcebergHiveMetadata statistics metadata

classDiagram
    class IcebergHiveMetadata {
        +TableStatisticsMetadata getStatisticsCollectionMetadata(ConnectorSession session, TableMetadata tableMetadata)
    }

    class ConnectorSession
    class TableMetadata
    class ColumnStatisticMetadata
    class TableStatisticType
    class TableStatisticsMetadata {
        +TableStatisticsMetadata(Set~ColumnStatisticMetadata~ columnStatistics, Set~TableStatisticType~ tableStatistics, List~ColumnStatisticMetadata~ groupedColumns)
    }
    class ConnectorSystemConfig {
        +boolean isNativeExecution(ConnectorSession session)
    }

    IcebergHiveMetadata --> ConnectorSystemConfig : uses
    IcebergHiveMetadata --> TableStatisticsMetadata : creates
    IcebergHiveMetadata --> ColumnStatisticMetadata : aggregates
    IcebergHiveMetadata --> TableStatisticType : aggregates
    IcebergHiveMetadata --> ConnectorSession : parameter
    IcebergHiveMetadata --> TableMetadata : parameter

    TableStatisticsMetadata --> ColumnStatisticMetadata : contains
    TableStatisticsMetadata --> TableStatisticType : contains
Loading

File-Level Changes

Change Details Files
Always include the base connector's column statistics for Iceberg tables, even in native execution mode, so NDV statistics are collected.
  • Removed the conditional that excluded base statistics when native execution is enabled
  • Unconditionally added column statistics from the superclass's statistics metadata into the Iceberg-specific supported statistics set
  • Kept table-level statistics limited to ROW_COUNT while expanding the column statistics set
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergHiveMetadata.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • Since NDV stats are now enabled for native execution, check whether connectorSystemConfig.isNativeExecution() (and any related branching) is still used elsewhere in IcebergHiveMetadata and remove any dead code or config flags that are no longer needed.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Since NDV stats are now enabled for native execution, check whether `connectorSystemConfig.isNativeExecution()` (and any related branching) is still used elsewhere in `IcebergHiveMetadata` and remove any dead code or config flags that are no longer needed.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@nmahadevuni nmahadevuni force-pushed the enable_iceberg_ndv_stats_collection branch from e586e89 to f13bba8 Compare February 26, 2026 05:28
@nmahadevuni nmahadevuni changed the title Enable NDV stats collection for Iceberg in native mode fix(Iceberg): Enable NDV stats collection in native mode Feb 26, 2026
@nmahadevuni nmahadevuni changed the title fix(Iceberg): Enable NDV stats collection in native mode fix(connector): Enable NDV stats collection for Iceberg in native mode Feb 26, 2026
@tdcmeehan tdcmeehan merged commit 1fc8fad into prestodb:master Feb 26, 2026
83 of 86 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:IBM PR from IBM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants