Skip to content

fix: Replace regex-based json_parse safety wrapper with AST-level rewriter#27202

Open
han-yan01 wants to merge 1 commit intoprestodb:masterfrom
han-yan01:export-D94175149
Open

fix: Replace regex-based json_parse safety wrapper with AST-level rewriter#27202
han-yan01 wants to merge 1 commit intoprestodb:masterfrom
han-yan01:export-D94175149

Conversation

@han-yan01
Copy link
Contributor

@han-yan01 han-yan01 commented Feb 24, 2026

testing infra fix:

com.facebook.presto.verifier.framework.PrestoQueryException: com.facebook.presto.spi.PrestoException: Cannot convert '' to JSON

The previous applyJsonParseSafetyWrapper() used SqlFormatter.formatSql() -> regex ->
sqlParser.createStatement() round-trip that silently failed on complex queries with
lambdas and $internal$try(BIND(...)) patterns, leaving bare json_parse() calls that
crash on empty strings during verifier replay.

Replace with AST-level JsonParseTryWrapper using DefaultTreeRewriter +
ExpressionTreeRewriter (same pattern as FunctionCallRewriter). Wraps
json_parse FunctionCall nodes directly in TryExpression on the AST, eliminating the
fragile format/re-parse cycle.

Releas Notes

== NO RELEASE NOTE ==

Differential Revision: D94175149

Summary by Sourcery

Apply an AST-level rewriter to wrap json_parse() calls in TRY() during query rewriting, improving robustness over the previous regex-based approach and ensuring it works with function substitution and complex queries.

Bug Fixes:

  • Ensure json_parse() calls are reliably wrapped in TRY() to prevent verifier failures on malformed or empty JSON, including in complex queries with lambdas, subqueries, and function substitutions.

Enhancements:

  • Replace the SQL format/regex/re-parse json_parse safety wrapper with an AST-based JsonParseTryWrapper integrated into the query rewriting pipeline and applied consistently to CREATE TABLE AS, INSERT, and SELECT queries.

Tests:

  • Add comprehensive verifier rewrite tests covering json_parse wrapping behavior across simple selects, expressions, lambdas, subqueries, WHERE clauses, function substitution, and multiple/mixed json_parse usages.

@han-yan01 han-yan01 requested a review from a team as a code owner February 24, 2026 17:17
@prestodb-ci prestodb-ci added the from:Meta PR from Meta label Feb 24, 2026
@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Feb 24, 2026

Reviewer's Guide

Replaces the previous regex/format-based json_parse safety wrapper with an AST-level rewriter in QueryRewriter, and adds comprehensive tests to ensure json_parse calls are wrapped in TRY expressions across complex query shapes and after function substitution.

Sequence diagram for AST-level json_parse TRY wrapping during query rewrite

sequenceDiagram
    participant Client
    participant QueryRewriter
    participant JsonParseTryWrapper
    participant ExpressionTreeRewriter as ExprTreeRewriter
    participant ExpressionRewriter as JsonParseRewriter
    participant Query
    participant FunctionCall
    participant TryExpression

    Client->>QueryRewriter: rewriteQuery(sql, configuration, clusterType)
    QueryRewriter->>QueryRewriter: parse sql to Query
    QueryRewriter->>QueryRewriter: apply FunctionCallRewriter
    QueryRewriter->>JsonParseTryWrapper: applyJsonParseSafetyWrapper(query)
    JsonParseTryWrapper->>JsonParseTryWrapper: process(query, null)
    JsonParseTryWrapper->>ExprTreeRewriter: rewriteWith(JsonParseRewriter, expressionRoot, false)

    loop traverse expressions
        ExprTreeRewriter->>JsonParseRewriter: rewriteFunctionCall(functionCall, insideTry=false, treeRewriter)
        JsonParseRewriter->>ExprTreeRewriter: defaultRewrite(functionCall, false)
        ExprTreeRewriter-->>JsonParseRewriter: defaultRewrite result as FunctionCall
        alt function name is json_parse and not inside TRY
            JsonParseRewriter->>TryExpression: new TryExpression(functionCall)
            JsonParseRewriter-->>ExprTreeRewriter: TryExpression
        else other function or inside TRY
            JsonParseRewriter-->>ExprTreeRewriter: rewritten FunctionCall
        end

        ExprTreeRewriter->>JsonParseRewriter: rewriteTryExpression(tryExpression, insideTry=false, treeRewriter)
        JsonParseRewriter->>ExprTreeRewriter: rewrite(innerExpression, true)
        ExprTreeRewriter-->>JsonParseRewriter: possibly rewritten inner expression
        alt inner changed
            JsonParseRewriter->>TryExpression: new TryExpression(rewrittenInner)
            JsonParseRewriter-->>ExprTreeRewriter: new TryExpression
        else inner unchanged
            JsonParseRewriter-->>ExprTreeRewriter: original TryExpression
        end

        ExprTreeRewriter->>JsonParseRewriter: rewriteSubqueryExpression(subqueryExpression, insideTry, treeRewriter)
        JsonParseRewriter->>JsonParseTryWrapper: process(subqueryQuery, null)
        JsonParseTryWrapper-->>JsonParseRewriter: rewritten subquery Query
        alt query changed
            JsonParseRewriter-->>ExprTreeRewriter: new SubqueryExpression(rewrittenQuery)
        else query unchanged
            JsonParseRewriter-->>ExprTreeRewriter: original SubqueryExpression
        end
    end

    ExprTreeRewriter-->>JsonParseTryWrapper: rewritten root Expression
    JsonParseTryWrapper-->>QueryRewriter: rewritten Query
    QueryRewriter-->>Client: QueryObjectBundle with safe json_parse wrapped in TRY
Loading

Class diagram for AST-level json_parse TRY wrapper in QueryRewriter

classDiagram
    class QueryRewriter {
        - SqlParser sqlParser
        - TypeManager typeManager
        - BlockEncodingSerde blockEncodingSerde
        + QueryObjectBundle rewriteQuery(String query, QueryConfiguration queryConfiguration, ClusterType clusterType)
        - static Query applyJsonParseSafetyWrapper(Query query)
    }

    class JsonParseTryWrapper {
        + Query process(Query query, Void context)
        + Node visitExpression(Expression node, Void context)
    }

    class DefaultTreeRewriter~T~ {
        + Node process(Node node, T context)
        + Node visitExpression(Expression node, T context)
    }

    class ExpressionTreeRewriter~T~ {
        + static Expression rewriteWith(ExpressionRewriter~T~ rewriter, Expression node, T context)
        + Expression defaultRewrite(Expression node, T context)
    }

    class ExpressionRewriter~T~ {
        + Expression rewriteFunctionCall(FunctionCall original, T context, ExpressionTreeRewriter~T~ treeRewriter)
        + Expression rewriteTryExpression(TryExpression original, T context, ExpressionTreeRewriter~T~ treeRewriter)
        + Expression rewriteSubqueryExpression(SubqueryExpression expression, T context, ExpressionTreeRewriter~T~ treeRewriter)
    }

    class Query {
    }

    class Node {
    }

    class Expression {
    }

    class FunctionCall {
        + QualifiedName getName()
    }

    class TryExpression {
        + TryExpression(Expression innerExpression)
        + Expression getInnerExpression()
    }

    class SubqueryExpression {
        + Query getQuery()
    }

    class QualifiedName {
        + String getSuffix()
    }

    QueryRewriter ..> JsonParseTryWrapper : uses
    JsonParseTryWrapper --|> DefaultTreeRewriter~Void~
    JsonParseTryWrapper ..> ExpressionTreeRewriter~Boolean~ : uses
    JsonParseTryWrapper ..> ExpressionRewriter~Boolean~ : anonymous
    ExpressionTreeRewriter~Boolean~ ..> ExpressionRewriter~Boolean~ : collaborates

    Query <|-- Node
    Expression <|-- Node
    FunctionCall <|-- Expression
    TryExpression <|-- Expression
    SubqueryExpression <|-- Expression
    QualifiedName ..> FunctionCall : returned by

    QueryRewriter ..> Query : rewrites
    JsonParseTryWrapper ..> Query : rewrites
    JsonParseTryWrapper ..> FunctionCall : wraps json_parse
    JsonParseTryWrapper ..> TryExpression : creates
    JsonParseTryWrapper ..> SubqueryExpression : rewrites subqueries
Loading

File-Level Changes

Change Details Files
Replace SqlFormatter/regex-based json_parse safety wrapper with AST-level TryExpression wrapper in QueryRewriter.
  • Remove logging-based, best-effort applyJsonParseSafetyWrapper implementation that re-formatted SQL and reparsed it
  • Introduce a static applyJsonParseSafetyWrapper that applies a JsonParseTryWrapper over the Query AST using DefaultTreeRewriter and ExpressionTreeRewriter
  • Implement JsonParseTryWrapper to wrap json_parse FunctionCall nodes in TryExpression unless already under a TRY
  • Ensure TryExpression children are rewritten with an inside-TRY flag to prevent double wrapping and to allow nested json_parse transformation
  • Handle SubqueryExpression specially so json_parse calls inside subqueries are also rewritten by recursively processing inner Query nodes
  • Apply the json_parse safety wrapper unconditionally after the FunctionCallRewriter step for CREATE TABLE AS, INSERT, and SELECT rewrites so that it runs after function substitution
presto-verifier/src/main/java/com/facebook/presto/verifier/rewrite/QueryRewriter.java
Add tests covering AST-level json_parse TRY-wrapping behavior, including interaction with function substitution.
  • Add testJsonParseSafetyWrapper to validate wrapping of bare json_parse calls in SELECT, WHERE, subqueries, lambdas, nested in json_extract, and multiple occurrences, while preserving existing TRY(json_parse()) calls and queries without json_parse
  • Add testJsonParseSafetyWrapperWithFunctionSubstitutes to verify that function substitutions (e.g., ARBITRARY -> MIN) occur before json_parse wrapping, and that the wrapped query matches expectations
  • Use assertCreateTableAs with QueryRewriter.rewriteQuery to assert the exact rewritten SQL strings for various scenarios
presto-verifier/src/test/java/com/facebook/presto/verifier/rewrite/TestQueryRewriter.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • Consider making JsonParseTryWrapper a static singleton or reusing an instance instead of new JsonParseTryWrapper() on every call to applyJsonParseSafetyWrapper, since it is stateless and used frequently in the rewrite path.
  • In JsonParseTryWrapper.rewriteSubqueryExpression, you construct a new SubqueryExpression without preserving any existing properties (e.g., location/labels); if those are meaningful elsewhere, it may be safer to copy them from the original node when only the query changes.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Consider making `JsonParseTryWrapper` a static singleton or reusing an instance instead of `new JsonParseTryWrapper()` on every call to `applyJsonParseSafetyWrapper`, since it is stateless and used frequently in the rewrite path.
- In `JsonParseTryWrapper.rewriteSubqueryExpression`, you construct a new `SubqueryExpression` without preserving any existing properties (e.g., location/labels); if those are meaningful elsewhere, it may be safer to copy them from the original node when only the query changes.

## Individual Comments

### Comment 1
<location path="presto-verifier/src/test/java/com/facebook/presto/verifier/rewrite/TestQueryRewriter.java" line_range="671-672" />
<code_context>
         assertEquals(actualQueries, expectedQueries);
     }

+    @Test
+    public void testJsonParseSafetyWrapper()
+    {
+        QueryRewriter queryRewriter = getQueryRewriter();
</code_context>
<issue_to_address>
**suggestion (testing):** Add tests for case-insensitive and qualified `json_parse` function names

The rewriter compares function names case-insensitively, but the tests only cover the lower-case, unqualified `json_parse`. Please add cases like `SELECT JSON_PARSE(b)` and a fully qualified call (e.g., `some_catalog.some_schema.json_parse(b)`) and assert they’re wrapped in `TRY(...)` as well, to validate behavior across case and qualification variants.
</issue_to_address>

### Comment 2
<location path="presto-verifier/src/test/java/com/facebook/presto/verifier/rewrite/TestQueryRewriter.java" line_range="711-716" />
<code_context>
+                        CONFIGURATION, CONTROL).getQuery(),
+                "SELECT * FROM (SELECT TRY(json_parse(b)) AS parsed FROM test_table) t");
+
+        // json_parse in WHERE clause
+        assertCreateTableAs(
+                queryRewriter.rewriteQuery(
+                        "SELECT a FROM test_table WHERE json_parse(b) IS NOT NULL",
+                        CONFIGURATION, CONTROL).getQuery(),
+                "SELECT a FROM test_table WHERE TRY(json_parse(b)) IS NOT NULL");
+
+        // json_parse nested inside json_extract (common pattern from function substitution)
</code_context>
<issue_to_address>
**suggestion (testing):** Consider covering json_parse usages in ORDER BY / GROUP BY / HAVING as well

You already cover `json_parse` in SELECT projections, lambdas, subqueries, and WHERE clauses. Please add at least one test where `json_parse` appears in ORDER BY, GROUP BY, or HAVING (e.g., `GROUP BY json_parse(b)` or `HAVING json_parse(b) IS NOT NULL`) and verify those occurrences are also wrapped in `TRY(...)` to confirm the rewriter applies consistently across clauses.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +671 to +672
@Test
public void testJsonParseSafetyWrapper()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Add tests for case-insensitive and qualified json_parse function names

The rewriter compares function names case-insensitively, but the tests only cover the lower-case, unqualified json_parse. Please add cases like SELECT JSON_PARSE(b) and a fully qualified call (e.g., some_catalog.some_schema.json_parse(b)) and assert they’re wrapped in TRY(...) as well, to validate behavior across case and qualification variants.

Comment on lines +711 to +716
// json_parse in WHERE clause
assertCreateTableAs(
queryRewriter.rewriteQuery(
"SELECT a FROM test_table WHERE json_parse(b) IS NOT NULL",
CONFIGURATION, CONTROL).getQuery(),
"SELECT a FROM test_table WHERE TRY(json_parse(b)) IS NOT NULL");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Consider covering json_parse usages in ORDER BY / GROUP BY / HAVING as well

You already cover json_parse in SELECT projections, lambdas, subqueries, and WHERE clauses. Please add at least one test where json_parse appears in ORDER BY, GROUP BY, or HAVING (e.g., GROUP BY json_parse(b) or HAVING json_parse(b) IS NOT NULL) and verify those occurrences are also wrapped in TRY(...) to confirm the rewriter applies consistently across clauses.

han-yan01 added a commit to han-yan01/presto that referenced this pull request Feb 24, 2026
…riter (prestodb#27202)

Summary:

The previous applyJsonParseSafetyWrapper() used SqlFormatter.formatSql() -> regex ->
sqlParser.createStatement() round-trip that silently failed on complex queries with
lambdas and $internal$try(BIND(...)) patterns, leaving bare json_parse() calls that
crash on empty strings during verifier replay.

Replace with AST-level JsonParseTryWrapper using DefaultTreeRewriter +
ExpressionTreeRewriter<Boolean> (same pattern as FunctionCallRewriter). Wraps
json_parse FunctionCall nodes directly in TryExpression on the AST, eliminating the
fragile format/re-parse cycle.

# Releas Notes
```
== NO RELEASE NOTE ==
```

Differential Revision: D94175149
@shrinidhijoshi
Copy link
Collaborator

@feilong-liu Can you help review this change ? Thanks!

@han-yan01 han-yan01 changed the title [presto][verifier] Replace regex-based json_parse safety wrapper with AST-level rewriter fix: Replace regex-based json_parse safety wrapper with AST-level rewriter Feb 24, 2026
Copy link
Collaborator

@shrinidhijoshi shrinidhijoshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a first pass. Looks good. One main comment

han-yan01 added a commit to han-yan01/presto that referenced this pull request Feb 25, 2026
… AST-level rewriter (prestodb#27202)

Summary:

testing infra fix:
```
com.facebook.presto.verifier.framework.PrestoQueryException: com.facebook.presto.spi.PrestoException: Cannot convert '' to JSON
```
The previous applyJsonParseSafetyWrapper() used SqlFormatter.formatSql() -> regex ->
sqlParser.createStatement() round-trip that silently failed on complex queries with
lambdas and $internal$try(BIND(...)) patterns, leaving bare json_parse() calls that
crash on empty strings during verifier replay.

Replace with AST-level JsonParseTryWrapper using DefaultTreeRewriter +
ExpressionTreeRewriter<Boolean> (same pattern as FunctionCallRewriter). Wraps
json_parse FunctionCall nodes directly in TryExpression on the AST, eliminating the
fragile format/re-parse cycle.

The feature is gated behind a `json-parse-safety-wrapper-enabled` config flag
(default false). Enabled in sapphire velox shadow testing via run-shadow-test.sh.

# Release Notes
```
== NO RELEASE NOTE ==
```

Differential Revision: D94175149
… AST-level rewriter (prestodb#27202)

Summary:

testing infra fix:
```
com.facebook.presto.verifier.framework.PrestoQueryException: com.facebook.presto.spi.PrestoException: Cannot convert '' to JSON
```
The previous applyJsonParseSafetyWrapper() used SqlFormatter.formatSql() -> regex ->
sqlParser.createStatement() round-trip that silently failed on complex queries with
lambdas and $internal$try(BIND(...)) patterns, leaving bare json_parse() calls that
crash on empty strings during verifier replay.

Replace with AST-level JsonParseTryWrapper using DefaultTreeRewriter +
ExpressionTreeRewriter<Boolean> (same pattern as FunctionCallRewriter). Wraps
json_parse FunctionCall nodes directly in TryExpression on the AST, eliminating the
fragile format/re-parse cycle.

The feature is gated behind a `json-parse-safety-wrapper-enabled` config flag
(default false). Enabled in sapphire velox shadow testing via run-shadow-test.sh.

# Release Notes
```
== NO RELEASE NOTE ==
```

Differential Revision: D94175149
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants