-
Notifications
You must be signed in to change notification settings - Fork 25.6k
ESQL: Improve error message for ( and [ #124177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Due to recent grammar changes made ( token to no longer be reported by its text rather by his internal token name. Due to the use of pushMode, the symbol is not treated as a literal rather as a symbol. To address this, the parser listener looks at the error message and changes the message before returning it to the user. Fix elastic#124145 Relates elastic#123085 elastic#121948
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
|
Pinging @elastic/kibana-esql (ES|QL-ui) |
|
Hi @costin, I've created a changelog YAML for you. |
| Matcher m = REPLACE_LP.matcher(message); | ||
| message = m.replaceAll(BEFORE_REPLACEMENT + "(" + AFTER_REPLACEMENT); | ||
| m = REPLACE_LEFT_BRACKET.matcher(message); | ||
| message = m.replaceAll(BEFORE_REPLACEMENT + "[" + AFTER_REPLACEMENT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a chance we could introduce corresponding tokens to the mode back?
I think it is a bit better to generate correct message from antlr when possible rather than rewrite it in our code.
This also introduces a string usage of the token names outside of the grammar and one more place to update if that is ever changed or renamed.
| // this test checks that their are properly replaced in the error message | ||
| public void testPreserveParanthesis() { | ||
| // test for ( | ||
| expectError("row a = 1 not in", "line 1:17: mismatched input '<EOF>' expecting '('"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be row a = 1 | where a not in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was hoping, when I seen the title of this PR, that it would help with the "rogue" closing paren also #119025. But it doesn't, e.g. row a = 1 not in] or FROM test | eval x = [1, 2, 3]] .
java.util.EmptyStackException
at org.antlr.v4.runtime.Lexer.popMode(Lexer.java:192)
at org.antlr.v4.runtime.atn.LexerPopModeAction.execute(LexerPopModeAction.java:58)
at org.antlr.v4.runtime.atn.LexerActionExecutor.execute(LexerActionExecutor.java:168)
...
I only mention it since the use of modes, and the limitations of handling errors in ANTLR, are related, which might influence the course of action here.
[ clearly, these are nonsensical queries, but we should handle the error more gracefully ]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be row a = 1 | where a not in?
- not really because row points to a value expression which is a bool expression. As the error indicates
- I've added another test for it just in case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was hoping, when I seen the title of this PR, that it would help with the "rogue" closing paren also #119025.
Sorry :) Go for it!
P.S. that should be a separate PR, since it needs backporting to the 8.x line vs this one that is targeting only 8.x.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P.S. that should be a separate PR, since it needs backporting to the 8.x line vs this one that is targeting only 8.x.
... this one that is targeting only 8.x 9.x. Ah, yes, of course. Let's keep that separate then.
alex-spies
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like Ievgen, I think a solution based on regexes will be more brittle than somehow going back to when the parser/lexer referred to parens/brackets by their native symbols; but I didn't investigate how feasible that would be, so this is fine IMO.
I think we should at the very least be symmetric, however, and not specialize to opening brackets; the error messages for their closing counterparts at least need tests (and probably an analogous fix).
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/parser/EsqlParser.java
Outdated
Show resolved
Hide resolved
| private final Pattern REPLACE_LP = Pattern.compile(BEFORE_REGEX + "LP" + AFTER_REGEX); | ||
| private final Pattern REPLACE_LEFT_BRACKET = Pattern.compile(BEFORE_REGEX + "OPENING_BRACKET" + AFTER_REGEX); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For symmetry, shouldn't we also replace the token for right parens, and for the closing bracket?
Similarly, shouldn't we have a test for .. | WHERE a in ("foo", and an analogous test for ]?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For symmetry, shouldn't we also replace the token for right parens, and for the closing bracket?
No because those are not lexed as symbols and are still kept as tokens (I think because they simply pop the stack instead pushing twice to a new mode).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation!
alex-spies
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One last minor comment (which I'll take care of myself) and this can be merged as a temporary hack until we find a better solution from my perspective.
| expectError("row a = 1 | where a not in 123", "line 1:28: missing '(' at '123'"); | ||
| // test for [ | ||
| expectError("explain", "line 1:8: mismatched input '<EOF>' expecting '['"); | ||
| expectError("explain ]", "line 1:9: token recognition error at: ']'"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is nice but we don't cover the case of expecting a ] and raising an error message that may call this RIGHT_BRACKET of ].
I'll push a small test case for this.
ChrisHegarty
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an incremental improvement over the existing behaviour. LGTM.
as pluggable as it could be yet much better)
|
Letting the comment here for the record - I couldn't find a way to change the symbol to lexer through the grammar and instead modified the underlying |
* main: (95 commits) Mute org.elasticsearch.datastreams.lifecycle.DataStreamLifecycleServiceIT testLifecycleAppliedToFailureStore elastic#124999 Merge template mappings properly during validation (elastic#124784) [Build] Rework internal build plugin plugin to work with Isolated Projects (elastic#123461) [Build] Require reason for usesDefaultDistribution (elastic#124707) Mute org.elasticsearch.packaging.test.DockerTests test011SecurityEnabledStatus elastic#124990 Mute org.elasticsearch.xpack.ilm.TimeSeriesDataStreamsIT testRolloverAction elastic#124987 Mute org.elasticsearch.packaging.test.BootstrapCheckTests test10Install elastic#124957 Mute org.elasticsearch.integration.DataStreamLifecycleServiceRuntimeSecurityIT testRolloverLifecycleAndForceMergeAuthorized elastic#124978 Mute org.elasticsearch.xpack.esql.action.CrossClusterAsyncQueryStopIT testStopQuery elastic#124977 Mute org.elasticsearch.xpack.esql.action.CrossClusterAsyncQueryStopIT testStopQueryLocal elastic#121672 Mention zero-window state in networking docs (elastic#124967) Remove remoteAddress field from TransportResponse (elastic#120016) Include failures in partial response (elastic#124929) Prevent work starvation bug if using scaling EsThreadPoolExecutor with core pool size = 0 (elastic#124732) Re-enable analysis stemmer test (elastic#124961) Mute org.elasticsearch.xpack.esql.action.CrossClusterAsyncQueryStopIT testStopQueryLocalNoRemotes elastic#124959 ESQL: Catch parsing exception (elastic#124958) ESQL: Improve error message for ( and [ (elastic#124177) Mute org.elasticsearch.xpack.esql.qa.single_node.EsqlSpecIT test {lookup-join.MvJoinKeyFromRow SYNC} elastic#124951 Mute org.elasticsearch.datastreams.lifecycle.DataStreamLifecycleServiceIT testErrorRecordingOnRetention elastic#124950 ... # Conflicts: # server/src/main/java/org/elasticsearch/index/mapper/vectors/DenseVectorFieldMapper.java # server/src/test/java/org/elasticsearch/index/mapper/vectors/DenseVectorFieldTypeTests.java
Due to recent grammar changes made ( token to no longer be reported by its text rather by his internal token name. Due to the use of pushMode, the symbol is not treated as a literal rather as a symbol. To address this, the parser listener looks at the error message and changes the message before returning it to the user. Replace hacky regex approach with Vocabulary substitution (not as pluggable as it could be yet much better) Fix elastic#124145 Relates elastic#123085 elastic#121948 Co-authored-by: Alexander Spies <[email protected]>
Due to recent grammar changes made ( token to no longer be reported by its text rather by his internal token name. Due to the use of pushMode, the symbol is not treated as a literal rather as a symbol. To address this, the parser listener looks at the error message and changes the message before returning it to the user. Fix elastic#124145 Relates elastic#123085 elastic#121948
Due to recent grammar changes made ( token to no longer be reported by its text rather by his internal token name. Due to the use of pushMode, the symbol is not treated as a literal rather as a symbol. To address this, the parser listener looks at the error message and changes the message before returning it to the user. Fix #124145 Relates #123085 #121948
Due to recent grammar changes made ( token to no longer be reported by
its text rather by his internal token name. Due to the use of pushMode,
the symbol is not treated as a literal rather as a symbol.
To address this, the parser listener looks at the error message and
changes the message before returning it to the user.
Fix #124145
Relates #123085 #121948