Skip to content

Conversation

@pramodsatya
Copy link
Contributor

@pramodsatya pramodsatya commented Jan 30, 2026

Description

Changes the type of index in DEREFERENCE expression from BIGINT to INTEGER to match the Presto RowExpression type.

Motivation and Context

Resolves #27037.

Impact

Sql invoked functions with dereference expression won't fail with native expression optimizer enabled.

Test Plan

Added e2e tests.

== NO RELEASE NOTE ==

Summary by Sourcery

Align DEREFERENCE expression index type with Presto RowExpression and add coverage for native optimizer queries using dereference.

Bug Fixes:

  • Change DEREFERENCE expression index type from BIGINT to INTEGER in VeloxToPrestoExprConverter to match Presto RowExpression expectations and avoid failures in SQL-invoked functions when the native optimizer is enabled.

Tests:

  • Extend native optimizer end-to-end tests to cover dereference expressions via array_least_frequent usage.

@pramodsatya pramodsatya requested review from a team and pdabre12 as code owners January 30, 2026 02:58
@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Jan 30, 2026
@prestodb-ci prestodb-ci requested a review from a team January 30, 2026 02:58
@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Jan 30, 2026

Reviewer's guide (collapsed on small PRs)

Reviewer's Guide

Adjusts the Velox-to-Presto dereference expression index type to use INTEGER instead of BIGINT and adds end-to-end tests to cover dereference usage via array_least_frequent under the native optimizer.

Sequence diagram for DEREFERENCE conversion with INTEGER index

sequenceDiagram
  participant NativeOptimizer
  participant VeloxToPrestoExprConverter as Converter
  participant velox_core_FieldAccessTypedExpr as FieldAccessExpr
  participant velox_core_ConstantTypedExpr as ConstantExpr
  participant protocol_RowExpression as RowExpr

  NativeOptimizer->>Converter: getDereferenceExpression(FieldAccessExpr)
  Converter->>FieldAccessExpr: inputs()
  FieldAccessExpr-->>Converter: std_vector_TypedExprPtr
  Converter->>FieldAccessExpr: index()
  FieldAccessExpr-->>Converter: index_int32
  Converter->>ConstantExpr: new ConstantTypedExpr(velox_INTEGER, index_int32)
  Converter->>Converter: build dereferenceInputs[0] = base_expr
  Converter->>Converter: build dereferenceInputs[1] = ConstantExpr
  loop for each input in dereferenceInputs
    Converter->>Converter: getRowExpression(input)
    Converter-->>Converter: RowExpr
  end
  Converter-->>NativeOptimizer: RowExpr representing DEREFERENCE with INTEGER index
Loading

Class diagram for updated dereference conversion in VeloxToPrestoExprConverter

classDiagram
  class VeloxToPrestoExprConverter {
    +getDereferenceExpression(dereferenceExpr : velox_core_FieldAccessTypedExprPtr) SpecialFormExpressionPtr
    +getRowExpression(input : velox_core_TypedExprPtr) protocol_RowExpressionPtr
  }

  class velox_core_FieldAccessTypedExpr {
    +inputs() std_vector_velox_core_TypedExprPtr
    +index() int32_t
  }

  class velox_core_TypedExpr {
  }

  class velox_core_ConstantTypedExpr {
    +ConstantTypedExpr(type : velox_TypePtr, value : int32_t)
  }

  class velox_Type {
  }

  class protocol_RowExpression {
  }

  VeloxToPrestoExprConverter --> velox_core_FieldAccessTypedExpr : uses
  VeloxToPrestoExprConverter --> velox_core_TypedExpr : builds_dereferenceInputs
  VeloxToPrestoExprConverter --> velox_core_ConstantTypedExpr : creates_index_constant
  velox_core_ConstantTypedExpr --> velox_Type : has_type
  VeloxToPrestoExprConverter --> protocol_RowExpression : returns
Loading

File-Level Changes

Change Details Files
Align dereference expression index type with Presto RowExpression expectations.
  • Change the constant index expression type from BIGINT to INTEGER when building dereference expressions in VeloxToPrestoExprConverter, including narrowing the cast from int64_t to int32_t.
  • Ensure the serialized JSON RowExpression for dereference uses an INTEGER index to match Presto’s type system.
presto-native-execution/presto_cpp/main/types/VeloxToPrestoExpr.cpp
Add native optimizer e2e coverage for dereference expressions used by array_least_frequent.
  • Extend TestNativeSidecarPlugin native optimizer tests to run queries invoking array_least_frequent over a row field (orders_ex.quantities).
  • Add a second array_least_frequent query using split(comment, '') with an integer argument to exercise dereference behavior under different input shapes.
presto-native-sidecar-plugin/src/test/java/com/facebook/presto/sidecar/TestNativeSidecarPlugin.java

Assessment against linked issues

Issue Objective Addressed Explanation
#27037 Fix the native expression optimizer crash when running queries using array_least_frequent (e.g., on orders_ex.quantities and split(comment, '') from nation) by correcting the handling of the relevant expressions.
#27037 Add regression tests to ensure queries using array_least_frequent with native expression optimizer enabled execute successfully without crashing.

Possibly linked issues


Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • The change from BIGINT to INTEGER introduces a narrowing cast on dereferenceExpr->index(); consider adding an assertion or explicit range check (or a comment explaining why the index is guaranteed to fit in int32) to make the assumption about index bounds clear.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The change from BIGINT to INTEGER introduces a narrowing cast on `dereferenceExpr->index()`; consider adding an assertion or explicit range check (or a comment explaining why the index is guaranteed to fit in int32) to make the assumption about index bounds clear.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@pramodsatya
Copy link
Contributor Author

@aditi-pandit, could you please help review this fix?

Copy link
Contributor

@pdabre12 pdabre12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pramodsatya Changes look good.
Seems like the test case is failing because of a known varchar(N) issue, can you instead add another test case without varchar column ?
Optionally, you can keep this test case with an assertQueryFails message , whatever you prefer.

Copy link
Contributor

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pramodsatya. Please add more tests.

assertQuerySucceeds(session, "SELECT row_number() OVER (PARTITION BY orderdate ORDER BY orderdate) FROM orders");

// Test dereference expression with array_least_frequent function
assertQuerySucceeds(session, "SELECT array_least_frequent(quantities) from orders_ex");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pramodsatya : Can you add tests with other Dereference expressions as well ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:IBM PR from IBM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

native] Native expression optimizer crashes worker while optimizing array_least_frequent

4 participants