Skip to content

Conversation

@costin
Copy link
Member

@costin costin commented Mar 6, 2025

Due to recent grammar changes made ( token to no longer be reported by
its text rather by his internal token name. Due to the use of pushMode,
the symbol is not treated as a literal rather as a symbol.
To address this, the parser listener looks at the error message and
changes the message before returning it to the user.

Fix #124145
Relates #123085 #121948

Due to recent grammar changes made ( token to no longer be reported by
 its text rather by his internal token name. Due to the use of pushMode,
 the symbol is not treated as a literal rather as a symbol.
To address this, the parser listener looks at the error message and
 changes the message before returning it to the user.

Fix elastic#124145
Relates elastic#123085 elastic#121948
@costin costin requested review from alex-spies and ioanatia March 6, 2025 03:16
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Mar 6, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/kibana-esql (ES|QL-ui)

@elasticsearchmachine
Copy link
Collaborator

Hi @costin, I've created a changelog YAML for you.

@costin costin requested review from ChrisHegarty and bpintea March 6, 2025 03:16
Matcher m = REPLACE_LP.matcher(message);
message = m.replaceAll(BEFORE_REPLACEMENT + "(" + AFTER_REPLACEMENT);
m = REPLACE_LEFT_BRACKET.matcher(message);
message = m.replaceAll(BEFORE_REPLACEMENT + "[" + AFTER_REPLACEMENT);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a chance we could introduce corresponding tokens to the mode back?

I think it is a bit better to generate correct message from antlr when possible rather than rewrite it in our code.
This also introduces a string usage of the token names outside of the grammar and one more place to update if that is ever changed or renamed.

// this test checks that their are properly replaced in the error message
public void testPreserveParanthesis() {
// test for (
expectError("row a = 1 not in", "line 1:17: mismatched input '<EOF>' expecting '('");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be row a = 1 | where a not in?

Copy link
Contributor

@ChrisHegarty ChrisHegarty Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping, when I seen the title of this PR, that it would help with the "rogue" closing paren also #119025. But it doesn't, e.g. row a = 1 not in] or FROM test | eval x = [1, 2, 3]] .

java.util.EmptyStackException
  at org.antlr.v4.runtime.Lexer.popMode(Lexer.java:192)
  at org.antlr.v4.runtime.atn.LexerPopModeAction.execute(LexerPopModeAction.java:58)
  at org.antlr.v4.runtime.atn.LexerActionExecutor.execute(LexerActionExecutor.java:168)
...

I only mention it since the use of modes, and the limitations of handling errors in ANTLR, are related, which might influence the course of action here.

[ clearly, these are nonsensical queries, but we should handle the error more gracefully ]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be row a = 1 | where a not in?

  1. not really because row points to a value expression which is a bool expression. As the error indicates
  2. I've added another test for it just in case.

Copy link
Member Author

@costin costin Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping, when I seen the title of this PR, that it would help with the "rogue" closing paren also #119025.

Sorry :) Go for it!
P.S. that should be a separate PR, since it needs backporting to the 8.x line vs this one that is targeting only 8.x.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P.S. that should be a separate PR, since it needs backporting to the 8.x line vs this one that is targeting only 8.x.

... this one that is targeting only 8.x 9.x. Ah, yes, of course. Let's keep that separate then.

Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like Ievgen, I think a solution based on regexes will be more brittle than somehow going back to when the parser/lexer referred to parens/brackets by their native symbols; but I didn't investigate how feasible that would be, so this is fine IMO.

I think we should at the very least be symmetric, however, and not specialize to opening brackets; the error messages for their closing counterparts at least need tests (and probably an analogous fix).

Comment on lines 136 to 137
private final Pattern REPLACE_LP = Pattern.compile(BEFORE_REGEX + "LP" + AFTER_REGEX);
private final Pattern REPLACE_LEFT_BRACKET = Pattern.compile(BEFORE_REGEX + "OPENING_BRACKET" + AFTER_REGEX);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For symmetry, shouldn't we also replace the token for right parens, and for the closing bracket?

Similarly, shouldn't we have a test for .. | WHERE a in ("foo", and an analogous test for ]?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For symmetry, shouldn't we also replace the token for right parens, and for the closing bracket?

No because those are not lexed as symbols and are still kept as tokens (I think because they simply pop the stack instead pushing twice to a new mode).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation!

Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last minor comment (which I'll take care of myself) and this can be merged as a temporary hack until we find a better solution from my perspective.

expectError("row a = 1 | where a not in 123", "line 1:28: missing '(' at '123'");
// test for [
expectError("explain", "line 1:8: mismatched input '<EOF>' expecting '['");
expectError("explain ]", "line 1:9: token recognition error at: ']'");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice but we don't cover the case of expecting a ] and raising an error message that may call this RIGHT_BRACKET of ].

I'll push a small test case for this.

Copy link
Contributor

@ChrisHegarty ChrisHegarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an incremental improvement over the existing behaviour. LGTM.

@costin costin requested review from a team as code owners March 15, 2025 21:57
 as pluggable as it could be yet much better)
@costin
Copy link
Member Author

costin commented Mar 15, 2025

Letting the comment here for the record - I couldn't find a way to change the symbol to lexer through the grammar and instead modified the underlying Vocabulary, the class ANTLR uses to get the display name for the token.
While it is available as the interface, its underlying implementation is statically initialized based on the constants resulting from the grammar however it does expose its underlying string arrays which I've been able to hook into per the comments.
Obviously this is language specific - no idea how easy it would be to replicate it elsewhere.
The upside is in the future, new symbols can be added by just adding them to the replacement map.

@costin costin merged commit e8533c1 into elastic:main Mar 15, 2025
17 checks passed
@costin costin deleted the esql/fix-124145 branch March 15, 2025 23:18
weizijun added a commit to weizijun/elasticsearch that referenced this pull request Mar 17, 2025
* main: (95 commits)
  Mute org.elasticsearch.datastreams.lifecycle.DataStreamLifecycleServiceIT testLifecycleAppliedToFailureStore elastic#124999
  Merge template mappings properly during validation (elastic#124784)
  [Build] Rework internal build plugin plugin to work with Isolated Projects (elastic#123461)
  [Build] Require reason for usesDefaultDistribution (elastic#124707)
  Mute org.elasticsearch.packaging.test.DockerTests test011SecurityEnabledStatus elastic#124990
  Mute org.elasticsearch.xpack.ilm.TimeSeriesDataStreamsIT testRolloverAction elastic#124987
  Mute org.elasticsearch.packaging.test.BootstrapCheckTests test10Install elastic#124957
  Mute org.elasticsearch.integration.DataStreamLifecycleServiceRuntimeSecurityIT testRolloverLifecycleAndForceMergeAuthorized elastic#124978
  Mute org.elasticsearch.xpack.esql.action.CrossClusterAsyncQueryStopIT testStopQuery elastic#124977
  Mute org.elasticsearch.xpack.esql.action.CrossClusterAsyncQueryStopIT testStopQueryLocal elastic#121672
  Mention zero-window state in networking docs (elastic#124967)
  Remove remoteAddress field from TransportResponse (elastic#120016)
  Include failures in partial response (elastic#124929)
  Prevent work starvation bug if using scaling EsThreadPoolExecutor with core pool size = 0  (elastic#124732)
  Re-enable analysis stemmer test (elastic#124961)
  Mute org.elasticsearch.xpack.esql.action.CrossClusterAsyncQueryStopIT testStopQueryLocalNoRemotes elastic#124959
  ESQL: Catch parsing exception (elastic#124958)
  ESQL: Improve error message for ( and [ (elastic#124177)
  Mute org.elasticsearch.xpack.esql.qa.single_node.EsqlSpecIT test {lookup-join.MvJoinKeyFromRow SYNC} elastic#124951
  Mute org.elasticsearch.datastreams.lifecycle.DataStreamLifecycleServiceIT testErrorRecordingOnRetention elastic#124950
  ...

# Conflicts:
#	server/src/main/java/org/elasticsearch/index/mapper/vectors/DenseVectorFieldMapper.java
#	server/src/test/java/org/elasticsearch/index/mapper/vectors/DenseVectorFieldTypeTests.java
omricohenn pushed a commit to omricohenn/elasticsearch that referenced this pull request Mar 28, 2025
Due to recent grammar changes made ( token to no longer be reported by
 its text rather by his internal token name. Due to the use of pushMode,
 the symbol is not treated as a literal rather as a symbol.
To address this, the parser listener looks at the error message and
 changes the message before returning it to the user.

Replace hacky regex approach with Vocabulary substitution (not
 as pluggable as it could be yet much better)

Fix elastic#124145
Relates elastic#123085 elastic#121948

Co-authored-by: Alexander Spies <[email protected]>
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request May 1, 2025
Due to recent grammar changes made ( token to no longer be reported by
its text rather by his internal token name. Due to the use of pushMode,
the symbol is not treated as a literal rather as a symbol.
To address this, the parser listener looks at the error message and
changes the message before returning it to the user.

Fix elastic#124145
Relates elastic#123085 elastic#121948
nik9000 added a commit that referenced this pull request May 1, 2025
Due to recent grammar changes made ( token to no longer be reported by
its text rather by his internal token name. Due to the use of pushMode,
the symbol is not treated as a literal rather as a symbol.
To address this, the parser listener looks at the error message and
changes the message before returning it to the user.

Fix #124145
Relates #123085 #121948
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >bug ES|QL-ui Impacts ES|QL UI Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ES|QL] restore better syntax error messages

5 participants