-
Notifications
You must be signed in to change notification settings - Fork 25.7k
ESQL: Qualifiers syntax #132925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ESQL: Qualifiers syntax #132925
Conversation
Allow LOOKUP JOIN to define a qualifier.
All documented commands are tested now that can have qualifiers
Constructors of FieldAttribute and ReferenceAttribute without qualifiers should normally only be used in tests going forward.
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
|
@stratoula , this and this is the grammar update we were talking about. We do change the ANTLr grammar here, but all changes are restricted to SNAPSHOT builds. Does this still mess up things on your end, or is it fine as long as it's SNAPSHOT-only? |
|
It doesn't make any difference if it is on snapshots or not. These changes when moved to kibana will make our ast parsing logic to fail and we will need to adapt. This week we have on week so no one will be able to evaluate further. Next week we will have the time to check the changes and understand how big change this is for us and what we should do to move forward. I wonder if this PR can wait till then. |
|
Thanks @stratoula , I'll hold the PR until we're sure we don't break Kibana. It'd be great if we could make sure quite soon, as this PR blocks subsequent work on qualifiers. |
|
@alex-spies from the kibana side you are ok to merge this PR. At the first stage we will get the antlr changes but not repreresent the qualifiers in our AST yet. We will do it on a second stage |
|
Thanks for having a look @stratoula ! |
| } | ||
|
|
||
| private void assertQualifiedAttributeInExpressions(String query, String qualifier, String name, int expectedCount) { | ||
| LogicalPlan plan = statement(query); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we stopping at logical plan here? Can we go all the way to local physical planning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can still check at logical planning, but also check at local physical planning after that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The statement parser tests are focused on parsing, only.
This PR does not attempt to plug qualifiers into commands just yet, to keep the scope reasonable. That's the next step in #133068. In the next step, we definitely want to have tests that inspect the generated logical and physical plans more closely.
The original PoC plugs everything in end-to-end, which is a big change, and I'm (roughly) following the PoC by splitting out parts into smaller PRs that we can iterate on more easily.
| @@ -0,0 +1,463 @@ | |||
| /* | |||
| * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I missed them but I did not see any tests with escaping. Can we add the following tests if not already there?
A field containing left bracket (first, middle, last) [field, fiel[d, field[
A field containing right bracket (first, middle, last) ]field, fiel]d, field]
An index name containing left bracket (first, middle, last)
An index name containing right bracket (first, middle, last)
An field name containing dot (first, middle, last) .field, fiel.d, field.
An index name containing dot (first, middle, last)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to address quoting, escaping, validating against weird characters etc. in a follow-up PR. This is subtle enough that it deserves a bunch of comments and some discussions.
I noted in #133068 that we want to disallow everything but very basic qualifiers at first.
I'd also really disallow dots in qualifier names for now unless really needed. Let's avoid doing the same mistake twice.
x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/parser/QualifierTests.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/parser/QualifierTests.java
Show resolved
Hide resolved
x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/parser/QualifierTests.java
Show resolved
Hide resolved
...gin/esql-core/src/main/java/org/elasticsearch/xpack/esql/core/expression/FieldAttribute.java
Outdated
Show resolved
Hide resolved
...k/plugin/esql-core/src/main/java/org/elasticsearch/xpack/esql/core/expression/Attribute.java
Show resolved
Hide resolved
julian-elastic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks great! It seems like there is one potential serious bug with CCS that we should probably address. Other than that mostly some ideas for additional tests.
fang-xing-esql
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for putting these together @alex-spies, that's a great first step for qualifiers ! I added some comments around the grammar and parser.
|
|
||
| qualifiedNamePattern | ||
| : identifierPattern (DOT identifierPattern)* | ||
| : {this.isDevVersion()}? OPENING_BRACKET qualifier=ID_PATTERN CLOSING_BRACKET DOT OPENING_BRACKET name=fieldNamePattern CLOSING_BRACKET |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if simplifying the grammar, reducing the scope to allow only qualifier = UNQUOTED_IDENTIFIER for qualifiedNamePattern would make it a bit easier as the first step. If I understand the intention correctly here, we want to allow patterns in qualifiers, as we allow patterns on index/field/attribute names.
For example, allowing the query below to be valid
FROM test
LOOKUP JOIN test t1 on x
LOOKUP JOIN test t2 on y
DROP [t*].[z*]
As far as I know, SQL does not use * in qualifiers to represent multiple tables. As qualifier is new, and we have the control of its scope, expanding a scope is easier than shrinking it, as the later could be a breaking change potentially.
Alternatively, if we really want to allow wildcard in the qualifier, will qualifier=UNQUOTED_ID_PATTERN work? Does the qualifiers need to be quoted? As it looks like the qualifiers defined in Join is UNQUOTED_SOURCE/UNQUOTED_IDENTIFIER.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if simplifying the grammar, reducing the scope to allow only qualifier = UNQUOTED_IDENTIFIER
I wonder if simplifying the grammar, reducing the scope to allow only qualifier = UNQUOTED_IDENTIFIER
I wish the same! Unfortunately, when trying this I realized that you cannot have this here because he lexer cannot distinguish between UNQUOTED_IDENTIFIER and ID_PATTERN , and ID_PATTERN is required to make the rest of KEEP/DROP work correctly.
As qualifier is new, and we have the control of its scope, expanding a scope is easier than shrinking it, as the later could be a breaking change potentially.
Absolutely. The allowed characters for qualifiers have to be restricted, both in patterns and in non-patterns. But let's do that in a separate PR; it's already tracked in the meta issue #133068.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, if we really want to allow wildcard in the qualifier, will qualifier=UNQUOTED_ID_PATTERN work?
UNQUOTED_ID_PATTERN is a fragment. I didn't want to risk confusing the lexer and breaking KEEP/DROP by making it a token. I think making it a token would introduce ambiguity into the lexing of KEEP/DROP commands as it would conflict with ID_PATTERN.
|
|
||
| var src = source(ctx); | ||
| String qualifier = qualifiedCtx.qualifier != null ? qualifiedCtx.qualifier.getText() : null; | ||
| if (qualifier != null && qualifier.charAt(0) == '`') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the quoted pattern is not allowed in the grammar, perhaps this check can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above - I don't want to mess up the lexer.
| return visitList(this, contexts, Expression.class); | ||
| } | ||
|
|
||
| protected static ParsingException qualifiersUnsupportedInFieldDefinitions(Source source, String qualifiedName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative approach that I can think of is limiting the scope of qualifiers in the grammar where the qualifiers are not supported, where we throw these qualifiers unsupported exceptions - in the grammar, use fieldName and fieldNamePattern instead of qualifiedName and qualifiedNamePattern.
Both approaches need multiple lines of changes though, I'm not quite sure one is easier than the other. The advantage of this approach is that we can get more meaningful error messages in the ParsingException, the advantage of limiting it in the grammar is that it could help auto-completion to detect the unsupported places earlier from the grammar errors.
Just want to bring this up, so that we are aware of these options, I'm fine with either of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- in the grammar, use fieldName and fieldNamePattern instead of qualifiedName and qualifiedNamePattern.
Oh yes! Just using fieldName and fieldNamePattern in commands is much cleaner.
But I did not want to make a large scale change to the grammar while this is still so in flux. I'll add it to #133068, though, because this approach is less brittle.
| ; | ||
|
|
||
| qualifiedName | ||
| : {this.isDevVersion()}? OPENING_BRACKET qualifier=UNQUOTED_IDENTIFIER? CLOSING_BRACKET DOT OPENING_BRACKET name=fieldName CLOSING_BRACKET |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a quiet verbose syntax that requires wrapping an identifier with brackets rather than just pre-pending it with qualifier when changing existing query.
I wonder about the following alternatives:
[qualifier].identifierqualifier -> identifier
They are a bit shorter. Could they be ambiguous?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR's syntax may not be the final one. That said, I think verbose and unambiguous is fine because we plan to introduce shorthands when unambiguous: qualifier.identifier and just identifier.
I personally like [qualifier].identifier as well, but received feedback not to use -> in case we want to introduce lambdas into the language at some point in time.
| * 352b - remove EsIndex mapping serialization #119580 | ||
| * 350b - remove unused fields from FieldAttribute #127854 | ||
| * 351b - added time series field type to EsField #129649 | ||
| * 352b - added qualifier back to FieldAttribute #132925 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does back implies we had them at some point before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! They were removed in #127854.
julian-elastic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for addressing my concerns. I am OK with the end to end tests and quoting tests to be done in a separate PR so we can start working on those in parallel.
|
CI is green but somehow won't update the status. This is safe to merge. |
Relates #133068
Add initial syntax for qualifiers. This will enable referring to columns from LOOKUP JOIN via qualified names (in a follow-up PR):
(The
ASis optional).The syntax is likely still subject to change and is only available on SNAPSHOT builds. This is supposed to be a starting point so that we can start implementing the resolution of qualifiers and start looking into the myriad of edge cases that come with that.
The point of this PR is to provide an unambiguous syntax that is reserved for qualifiers, therefore we don't use just the dot as separator:
[lookup].[field]vs.lookup.field. Shorthand syntax likelookup.fieldor justfield(when unambiguous) can be added in later steps.