fix: Replace regex-based json_parse safety wrapper with AST-level rewriter#27202
fix: Replace regex-based json_parse safety wrapper with AST-level rewriter#27202han-yan01 wants to merge 1 commit intoprestodb:masterfrom
Conversation
Reviewer's GuideReplaces the previous regex/format-based json_parse safety wrapper with an AST-level rewriter in QueryRewriter, and adds comprehensive tests to ensure json_parse calls are wrapped in TRY expressions across complex query shapes and after function substitution. Sequence diagram for AST-level json_parse TRY wrapping during query rewritesequenceDiagram
participant Client
participant QueryRewriter
participant JsonParseTryWrapper
participant ExpressionTreeRewriter as ExprTreeRewriter
participant ExpressionRewriter as JsonParseRewriter
participant Query
participant FunctionCall
participant TryExpression
Client->>QueryRewriter: rewriteQuery(sql, configuration, clusterType)
QueryRewriter->>QueryRewriter: parse sql to Query
QueryRewriter->>QueryRewriter: apply FunctionCallRewriter
QueryRewriter->>JsonParseTryWrapper: applyJsonParseSafetyWrapper(query)
JsonParseTryWrapper->>JsonParseTryWrapper: process(query, null)
JsonParseTryWrapper->>ExprTreeRewriter: rewriteWith(JsonParseRewriter, expressionRoot, false)
loop traverse expressions
ExprTreeRewriter->>JsonParseRewriter: rewriteFunctionCall(functionCall, insideTry=false, treeRewriter)
JsonParseRewriter->>ExprTreeRewriter: defaultRewrite(functionCall, false)
ExprTreeRewriter-->>JsonParseRewriter: defaultRewrite result as FunctionCall
alt function name is json_parse and not inside TRY
JsonParseRewriter->>TryExpression: new TryExpression(functionCall)
JsonParseRewriter-->>ExprTreeRewriter: TryExpression
else other function or inside TRY
JsonParseRewriter-->>ExprTreeRewriter: rewritten FunctionCall
end
ExprTreeRewriter->>JsonParseRewriter: rewriteTryExpression(tryExpression, insideTry=false, treeRewriter)
JsonParseRewriter->>ExprTreeRewriter: rewrite(innerExpression, true)
ExprTreeRewriter-->>JsonParseRewriter: possibly rewritten inner expression
alt inner changed
JsonParseRewriter->>TryExpression: new TryExpression(rewrittenInner)
JsonParseRewriter-->>ExprTreeRewriter: new TryExpression
else inner unchanged
JsonParseRewriter-->>ExprTreeRewriter: original TryExpression
end
ExprTreeRewriter->>JsonParseRewriter: rewriteSubqueryExpression(subqueryExpression, insideTry, treeRewriter)
JsonParseRewriter->>JsonParseTryWrapper: process(subqueryQuery, null)
JsonParseTryWrapper-->>JsonParseRewriter: rewritten subquery Query
alt query changed
JsonParseRewriter-->>ExprTreeRewriter: new SubqueryExpression(rewrittenQuery)
else query unchanged
JsonParseRewriter-->>ExprTreeRewriter: original SubqueryExpression
end
end
ExprTreeRewriter-->>JsonParseTryWrapper: rewritten root Expression
JsonParseTryWrapper-->>QueryRewriter: rewritten Query
QueryRewriter-->>Client: QueryObjectBundle with safe json_parse wrapped in TRY
Class diagram for AST-level json_parse TRY wrapper in QueryRewriterclassDiagram
class QueryRewriter {
- SqlParser sqlParser
- TypeManager typeManager
- BlockEncodingSerde blockEncodingSerde
+ QueryObjectBundle rewriteQuery(String query, QueryConfiguration queryConfiguration, ClusterType clusterType)
- static Query applyJsonParseSafetyWrapper(Query query)
}
class JsonParseTryWrapper {
+ Query process(Query query, Void context)
+ Node visitExpression(Expression node, Void context)
}
class DefaultTreeRewriter~T~ {
+ Node process(Node node, T context)
+ Node visitExpression(Expression node, T context)
}
class ExpressionTreeRewriter~T~ {
+ static Expression rewriteWith(ExpressionRewriter~T~ rewriter, Expression node, T context)
+ Expression defaultRewrite(Expression node, T context)
}
class ExpressionRewriter~T~ {
+ Expression rewriteFunctionCall(FunctionCall original, T context, ExpressionTreeRewriter~T~ treeRewriter)
+ Expression rewriteTryExpression(TryExpression original, T context, ExpressionTreeRewriter~T~ treeRewriter)
+ Expression rewriteSubqueryExpression(SubqueryExpression expression, T context, ExpressionTreeRewriter~T~ treeRewriter)
}
class Query {
}
class Node {
}
class Expression {
}
class FunctionCall {
+ QualifiedName getName()
}
class TryExpression {
+ TryExpression(Expression innerExpression)
+ Expression getInnerExpression()
}
class SubqueryExpression {
+ Query getQuery()
}
class QualifiedName {
+ String getSuffix()
}
QueryRewriter ..> JsonParseTryWrapper : uses
JsonParseTryWrapper --|> DefaultTreeRewriter~Void~
JsonParseTryWrapper ..> ExpressionTreeRewriter~Boolean~ : uses
JsonParseTryWrapper ..> ExpressionRewriter~Boolean~ : anonymous
ExpressionTreeRewriter~Boolean~ ..> ExpressionRewriter~Boolean~ : collaborates
Query <|-- Node
Expression <|-- Node
FunctionCall <|-- Expression
TryExpression <|-- Expression
SubqueryExpression <|-- Expression
QualifiedName ..> FunctionCall : returned by
QueryRewriter ..> Query : rewrites
JsonParseTryWrapper ..> Query : rewrites
JsonParseTryWrapper ..> FunctionCall : wraps json_parse
JsonParseTryWrapper ..> TryExpression : creates
JsonParseTryWrapper ..> SubqueryExpression : rewrites subqueries
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- Consider making
JsonParseTryWrappera static singleton or reusing an instance instead ofnew JsonParseTryWrapper()on every call toapplyJsonParseSafetyWrapper, since it is stateless and used frequently in the rewrite path. - In
JsonParseTryWrapper.rewriteSubqueryExpression, you construct a newSubqueryExpressionwithout preserving any existing properties (e.g., location/labels); if those are meaningful elsewhere, it may be safer to copy them from the original node when only the query changes.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Consider making `JsonParseTryWrapper` a static singleton or reusing an instance instead of `new JsonParseTryWrapper()` on every call to `applyJsonParseSafetyWrapper`, since it is stateless and used frequently in the rewrite path.
- In `JsonParseTryWrapper.rewriteSubqueryExpression`, you construct a new `SubqueryExpression` without preserving any existing properties (e.g., location/labels); if those are meaningful elsewhere, it may be safer to copy them from the original node when only the query changes.
## Individual Comments
### Comment 1
<location path="presto-verifier/src/test/java/com/facebook/presto/verifier/rewrite/TestQueryRewriter.java" line_range="671-672" />
<code_context>
assertEquals(actualQueries, expectedQueries);
}
+ @Test
+ public void testJsonParseSafetyWrapper()
+ {
+ QueryRewriter queryRewriter = getQueryRewriter();
</code_context>
<issue_to_address>
**suggestion (testing):** Add tests for case-insensitive and qualified `json_parse` function names
The rewriter compares function names case-insensitively, but the tests only cover the lower-case, unqualified `json_parse`. Please add cases like `SELECT JSON_PARSE(b)` and a fully qualified call (e.g., `some_catalog.some_schema.json_parse(b)`) and assert they’re wrapped in `TRY(...)` as well, to validate behavior across case and qualification variants.
</issue_to_address>
### Comment 2
<location path="presto-verifier/src/test/java/com/facebook/presto/verifier/rewrite/TestQueryRewriter.java" line_range="711-716" />
<code_context>
+ CONFIGURATION, CONTROL).getQuery(),
+ "SELECT * FROM (SELECT TRY(json_parse(b)) AS parsed FROM test_table) t");
+
+ // json_parse in WHERE clause
+ assertCreateTableAs(
+ queryRewriter.rewriteQuery(
+ "SELECT a FROM test_table WHERE json_parse(b) IS NOT NULL",
+ CONFIGURATION, CONTROL).getQuery(),
+ "SELECT a FROM test_table WHERE TRY(json_parse(b)) IS NOT NULL");
+
+ // json_parse nested inside json_extract (common pattern from function substitution)
</code_context>
<issue_to_address>
**suggestion (testing):** Consider covering json_parse usages in ORDER BY / GROUP BY / HAVING as well
You already cover `json_parse` in SELECT projections, lambdas, subqueries, and WHERE clauses. Please add at least one test where `json_parse` appears in ORDER BY, GROUP BY, or HAVING (e.g., `GROUP BY json_parse(b)` or `HAVING json_parse(b) IS NOT NULL`) and verify those occurrences are also wrapped in `TRY(...)` to confirm the rewriter applies consistently across clauses.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| @Test | ||
| public void testJsonParseSafetyWrapper() |
There was a problem hiding this comment.
suggestion (testing): Add tests for case-insensitive and qualified json_parse function names
The rewriter compares function names case-insensitively, but the tests only cover the lower-case, unqualified json_parse. Please add cases like SELECT JSON_PARSE(b) and a fully qualified call (e.g., some_catalog.some_schema.json_parse(b)) and assert they’re wrapped in TRY(...) as well, to validate behavior across case and qualification variants.
| // json_parse in WHERE clause | ||
| assertCreateTableAs( | ||
| queryRewriter.rewriteQuery( | ||
| "SELECT a FROM test_table WHERE json_parse(b) IS NOT NULL", | ||
| CONFIGURATION, CONTROL).getQuery(), | ||
| "SELECT a FROM test_table WHERE TRY(json_parse(b)) IS NOT NULL"); |
There was a problem hiding this comment.
suggestion (testing): Consider covering json_parse usages in ORDER BY / GROUP BY / HAVING as well
You already cover json_parse in SELECT projections, lambdas, subqueries, and WHERE clauses. Please add at least one test where json_parse appears in ORDER BY, GROUP BY, or HAVING (e.g., GROUP BY json_parse(b) or HAVING json_parse(b) IS NOT NULL) and verify those occurrences are also wrapped in TRY(...) to confirm the rewriter applies consistently across clauses.
…riter (prestodb#27202) Summary: The previous applyJsonParseSafetyWrapper() used SqlFormatter.formatSql() -> regex -> sqlParser.createStatement() round-trip that silently failed on complex queries with lambdas and $internal$try(BIND(...)) patterns, leaving bare json_parse() calls that crash on empty strings during verifier replay. Replace with AST-level JsonParseTryWrapper using DefaultTreeRewriter + ExpressionTreeRewriter<Boolean> (same pattern as FunctionCallRewriter). Wraps json_parse FunctionCall nodes directly in TryExpression on the AST, eliminating the fragile format/re-parse cycle. # Releas Notes ``` == NO RELEASE NOTE == ``` Differential Revision: D94175149
135d169 to
e953311
Compare
|
@feilong-liu Can you help review this change ? Thanks! |
shrinidhijoshi
left a comment
There was a problem hiding this comment.
Took a first pass. Looks good. One main comment
presto-verifier/src/main/java/com/facebook/presto/verifier/rewrite/QueryRewriter.java
Show resolved
Hide resolved
… AST-level rewriter (prestodb#27202) Summary: testing infra fix: ``` com.facebook.presto.verifier.framework.PrestoQueryException: com.facebook.presto.spi.PrestoException: Cannot convert '' to JSON ``` The previous applyJsonParseSafetyWrapper() used SqlFormatter.formatSql() -> regex -> sqlParser.createStatement() round-trip that silently failed on complex queries with lambdas and $internal$try(BIND(...)) patterns, leaving bare json_parse() calls that crash on empty strings during verifier replay. Replace with AST-level JsonParseTryWrapper using DefaultTreeRewriter + ExpressionTreeRewriter<Boolean> (same pattern as FunctionCallRewriter). Wraps json_parse FunctionCall nodes directly in TryExpression on the AST, eliminating the fragile format/re-parse cycle. The feature is gated behind a `json-parse-safety-wrapper-enabled` config flag (default false). Enabled in sapphire velox shadow testing via run-shadow-test.sh. # Release Notes ``` == NO RELEASE NOTE == ``` Differential Revision: D94175149
e953311 to
6451211
Compare
… AST-level rewriter (prestodb#27202) Summary: testing infra fix: ``` com.facebook.presto.verifier.framework.PrestoQueryException: com.facebook.presto.spi.PrestoException: Cannot convert '' to JSON ``` The previous applyJsonParseSafetyWrapper() used SqlFormatter.formatSql() -> regex -> sqlParser.createStatement() round-trip that silently failed on complex queries with lambdas and $internal$try(BIND(...)) patterns, leaving bare json_parse() calls that crash on empty strings during verifier replay. Replace with AST-level JsonParseTryWrapper using DefaultTreeRewriter + ExpressionTreeRewriter<Boolean> (same pattern as FunctionCallRewriter). Wraps json_parse FunctionCall nodes directly in TryExpression on the AST, eliminating the fragile format/re-parse cycle. The feature is gated behind a `json-parse-safety-wrapper-enabled` config flag (default false). Enabled in sapphire velox shadow testing via run-shadow-test.sh. # Release Notes ``` == NO RELEASE NOTE == ``` Differential Revision: D94175149
6451211 to
89ab7d6
Compare
testing infra fix:
The previous applyJsonParseSafetyWrapper() used SqlFormatter.formatSql() -> regex ->
sqlParser.createStatement() round-trip that silently failed on complex queries with
lambdas and $internal$try(BIND(...)) patterns, leaving bare json_parse() calls that
crash on empty strings during verifier replay.
Replace with AST-level JsonParseTryWrapper using DefaultTreeRewriter +
ExpressionTreeRewriter (same pattern as FunctionCallRewriter). Wraps
json_parse FunctionCall nodes directly in TryExpression on the AST, eliminating the
fragile format/re-parse cycle.
Releas Notes
Differential Revision: D94175149
Summary by Sourcery
Apply an AST-level rewriter to wrap json_parse() calls in TRY() during query rewriting, improving robustness over the previous regex-based approach and ensuring it works with function substitution and complex queries.
Bug Fixes:
Enhancements:
Tests: