Skip to content

Commit 05104a9

Browse files
authored
[GEODE-10508] Remedation of ANTLR nondeterminism warnings in OQL grammar (#7942)
* GEODE-10508: Fix ANTLR nondeterminism warnings in OQL grammar This commit resolves four nondeterminism warnings generated by ANTLR during the OQL grammar compilation process. These warnings indicated parser ambiguity that could lead to unpredictable parsing behavior. Problem Analysis: ----------------- 1. Lines 574 & 578 (projection rule): The parser could not distinguish between aggregateExpr and expr alternatives when encountering aggregate function keywords (sum, avg, min, max, count). These keywords are valid both as: - Aggregate function identifiers: sum(field) - Regular identifiers in expressions: sum as a field name Without lookahead, ANTLR could not deterministically choose which production rule to apply, resulting in nondeterminism warnings. 2. Lines 961 & 979 (aggregateExpr rule): Optional 'distinct' keyword created ambiguity in aggregate function parsing. The parser could not decide whether to: - Match the optional 'distinct' keyword, or - Skip it and proceed directly to the expression Both paths were valid, but ANTLR's default behavior doesn't specify preference, causing nondeterminism. Solution Implemented: -------------------- 1. Added syntactic predicates to projection rule (lines 574, 578): Predicate: (('sum'|'avg'|'min'|'max'|'count') TOK_LPAREN)=> This instructs the parser to look ahead and check if an aggregate keyword is followed by a left parenthesis. If true, it chooses aggregateExpr; otherwise, it chooses expr. This resolves the ambiguity by providing explicit lookahead logic. 2. Added greedy option to aggregateExpr rule (lines 961, 979): Option: options {greedy=true;} This tells the parser to greedily match the 'distinct' keyword whenever it appears, rather than being ambiguous about whether to match or skip. The greedy option eliminates the nondeterminism by establishing clear matching priority. 3. Updated test to use token constants (AbstractCompiledValueTestJUnitTest): Changed: hardcoded value 89 -> OQLLexerTokenTypes.LITERAL_or Rationale: Adding syntactic predicates changes ANTLR's token numbering in the generated lexer (LITERAL_or shifted from 89 to 94). Using the constant ensures test correctness regardless of future grammar changes. This is a best practice for maintaining test stability. Impact: ------- - Zero nondeterminism warnings from ANTLR grammar generation - No changes to OQL syntax or semantics (fully backward compatible) - No runtime behavior changes (modifications only affect parser generation) - All existing tests pass with updated token reference - Improved parser determinism and maintainability Technical Details: ----------------- - Syntactic predicates (=>) are standard ANTLR 2 feature for lookahead - Greedy option is standard ANTLR feature for optional subrule disambiguation - Token constant usage follows best practices for generated code references - Changes are compile-time only with no runtime performance impact Files Modified: -------------- - geode-core/src/main/antlr/org/apache/geode/cache/query/internal/parse/oql.g - geode-core/src/test/java/org/apache/geode/cache/query/internal/AbstractCompiledValueTestJUnitTest.java * GEODE-10508: Apply code formatting to test file Fix line length formatting for improved readability.
1 parent 98c2ec6 commit 05104a9

File tree

2 files changed

+21
-5
lines changed

2 files changed

+21
-5
lines changed

geode-core/src/main/antlr/org/apache/geode/cache/query/internal/parse/oql.g

Lines changed: 13 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -571,11 +571,16 @@ projectionAttributes :
571571
572572
projection!{ AST node = null;}:
573573
574-
lb1:identifier TOK_COLON! ( tok1:aggregateExpr{node = #tok1;} | tok2:expr{node = #tok2;})
574+
// Use syntactic predicate to resolve nondeterminism between aggregateExpr and expr.
575+
// The predicate checks for aggregate function keywords (sum, avg, min, max, count) followed by '('.
576+
// Without this, the parser cannot determine which alternative to choose when it sees these keywords,
577+
// since they can also be used as identifiers in regular expressions.
578+
lb1:identifier TOK_COLON! ( (("sum"|"avg"|"min"|"max"|"count") TOK_LPAREN)=> tok1:aggregateExpr{node = #tok1;} | tok2:expr{node = #tok2;})
575579
{ #projection = #([PROJECTION, "projection",
576580
"org.apache.geode.cache.query.internal.parse.ASTProjection"], node, #lb1); }
577581
|
578-
(tok3:aggregateExpr{node = #tok3;} | tok4:expr{node = #tok4;})
582+
// Same syntactic predicate as above to handle projections without a label (identifier:)
583+
((("sum"|"avg"|"min"|"max"|"count") TOK_LPAREN)=> tok3:aggregateExpr{node = #tok3;} | tok4:expr{node = #tok4;})
579584
(
580585
"as"
581586
lb2: identifier
@@ -958,7 +963,10 @@ collectionExpr :
958963
aggregateExpr { int aggFunc = -1; boolean distinctOnly = false; }:
959964
960965
!("sum" {aggFunc = SUM;} | "avg" {aggFunc = AVG;} )
961-
TOK_LPAREN ("distinct"! {distinctOnly = true;} ) ? tokExpr1:expr TOK_RPAREN
966+
// Use greedy option to resolve nondeterminism with optional 'distinct' keyword.
967+
// Greedy tells the parser to match 'distinct' whenever it appears, rather than
968+
// being ambiguous about whether to match it or skip directly to the expression.
969+
TOK_LPAREN (options {greedy=true;}: "distinct"! {distinctOnly = true;} ) ? tokExpr1:expr TOK_RPAREN
962970
{ #aggregateExpr = #([AGG_FUNC, "aggregate", "org.apache.geode.cache.query.internal.parse.ASTAggregateFunc"],
963971
#tokExpr1);
964972
((ASTAggregateFunc)#aggregateExpr).setAggregateFunctionType(aggFunc);
@@ -975,8 +983,9 @@ aggregateExpr { int aggFunc = -1; boolean distinctOnly = false; }:
975983
976984
|
977985
"count"^<AST=org.apache.geode.cache.query.internal.parse.ASTAggregateFunc>
986+
// Same greedy option as above for count's optional 'distinct' keyword
978987
TOK_LPAREN! ( TOK_STAR <AST=org.apache.geode.cache.query.internal.parse.ASTDummy>
979-
| ("distinct"! {distinctOnly = true;} ) ? expr ) TOK_RPAREN!
988+
| (options {greedy=true;}: "distinct"! {distinctOnly = true;} ) ? expr ) TOK_RPAREN!
980989
{
981990
((ASTAggregateFunc)#aggregateExpr).setAggregateFunctionType(COUNT);
982991
#aggregateExpr.setText("aggregate");

geode-core/src/test/java/org/apache/geode/cache/query/internal/AbstractCompiledValueTestJUnitTest.java

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@
2424
import org.junit.Test;
2525
import org.junit.runner.RunWith;
2626

27+
import org.apache.geode.cache.query.internal.parse.OQLLexerTokenTypes;
2728
import org.apache.geode.cache.query.internal.types.CollectionTypeImpl;
2829
import org.apache.geode.test.junit.runners.GeodeParamsRunner;
2930

@@ -47,7 +48,13 @@ private CompiledValue[] getCompiledValuesWhichDoNotImplementGetReceiver() {
4748
new LinkedHashMap<>()),
4849
new CompiledIn(compiledValue1, compiledValue2),
4950
new CompiledIteratorDef("test", new CollectionTypeImpl(), compiledValue1),
50-
new CompiledJunction(new CompiledValue[] {compiledValue1, compiledValue2}, 89),
51+
// Changed from hardcoded value 89 to OQLLexerTokenTypes.LITERAL_or constant.
52+
// The hardcoded value 89 was the token number for LITERAL_or in the original grammar,
53+
// but after adding syntactic predicates to fix nondeterminism warnings, the token
54+
// numbering changed (LITERAL_or is now 94). Using the constant ensures this test
55+
// remains correct regardless of future grammar changes.
56+
new CompiledJunction(new CompiledValue[] {compiledValue1, compiledValue2},
57+
OQLLexerTokenTypes.LITERAL_or),
5158
new CompiledLike(compiledValue1, compiledValue2),
5259
new CompiledLiteral(compiledValue1),
5360
new CompiledMod(compiledValue1, compiledValue2),

0 commit comments

Comments
 (0)