Skip to content

Conversation

@alex-spies
Copy link
Contributor

@alex-spies alex-spies commented Aug 14, 2025

Relates #133068

Add initial syntax for qualifiers. This will enable referring to columns from LOOKUP JOIN via qualified names (in a follow-up PR):

...
| LOOKUP JOIN lookup_idx AS lookup ON key
| WHERE [lookup].[field] > 10 AND 2*([lookup].[field] - [lookup].[otherfield]) < 1

(The AS is optional).

The syntax is likely still subject to change and is only available on SNAPSHOT builds. This is supposed to be a starting point so that we can start implementing the resolution of qualifiers and start looking into the myriad of edge cases that come with that.

The point of this PR is to provide an unambiguous syntax that is reserved for qualifiers, therefore we don't use just the dot as separator: [lookup].[field] vs. lookup.field. Shorthand syntax like lookup.field or just field (when unambiguous) can be added in later steps.

Constructors of FieldAttribute and ReferenceAttribute without qualifiers
should normally only be used in tests going forward.
@alex-spies alex-spies marked this pull request as ready for review August 18, 2025 15:53
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Aug 18, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@alex-spies
Copy link
Contributor Author

alex-spies commented Aug 18, 2025

@stratoula , this and this is the grammar update we were talking about. We do change the ANTLr grammar here, but all changes are restricted to SNAPSHOT builds. Does this still mess up things on your end, or is it fine as long as it's SNAPSHOT-only?

@stratoula
Copy link

stratoula commented Aug 18, 2025

It doesn't make any difference if it is on snapshots or not. These changes when moved to kibana will make our ast parsing logic to fail and we will need to adapt.

This week we have on week so no one will be able to evaluate further. Next week we will have the time to check the changes and understand how big change this is for us and what we should do to move forward.

I wonder if this PR can wait till then.

@alex-spies
Copy link
Contributor Author

Thanks @stratoula , I'll hold the PR until we're sure we don't break Kibana. It'd be great if we could make sure quite soon, as this PR blocks subsequent work on qualifiers.

@stratoula
Copy link

@alex-spies from the kibana side you are ok to merge this PR. At the first stage we will get the antlr changes but not repreresent the qualifiers in our AST yet. We will do it on a second stage

@alex-spies
Copy link
Contributor Author

Thanks for having a look @stratoula !

}

private void assertQualifiedAttributeInExpressions(String query, String qualifier, String name, int expectedCount) {
LogicalPlan plan = statement(query);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we stopping at logical plan here? Can we go all the way to local physical planning?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can still check at logical planning, but also check at local physical planning after that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The statement parser tests are focused on parsing, only.

This PR does not attempt to plug qualifiers into commands just yet, to keep the scope reasonable. That's the next step in #133068. In the next step, we definitely want to have tests that inspect the generated logical and physical plans more closely.

The original PoC plugs everything in end-to-end, which is a big change, and I'm (roughly) following the PoC by splitting out parts into smaller PRs that we can iterate on more easily.

@@ -0,0 +1,463 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I missed them but I did not see any tests with escaping. Can we add the following tests if not already there?
A field containing left bracket (first, middle, last) [field, fiel[d, field[
A field containing right bracket (first, middle, last) ]field, fiel]d, field]
An index name containing left bracket (first, middle, last)
An index name containing right bracket (first, middle, last)
An field name containing dot (first, middle, last) .field, fiel.d, field.
An index name containing dot (first, middle, last)

Copy link
Contributor Author

@alex-spies alex-spies Aug 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to address quoting, escaping, validating against weird characters etc. in a follow-up PR. This is subtle enough that it deserves a bunch of comments and some discussions.

I noted in #133068 that we want to disallow everything but very basic qualifiers at first.

I'd also really disallow dots in qualifier names for now unless really needed. Let's avoid doing the same mistake twice.

Copy link
Contributor

@julian-elastic julian-elastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks great! It seems like there is one potential serious bug with CCS that we should probably address. Other than that mostly some ideas for additional tests.

Copy link
Member

@fang-xing-esql fang-xing-esql left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for putting these together @alex-spies, that's a great first step for qualifiers ! I added some comments around the grammar and parser.


qualifiedNamePattern
: identifierPattern (DOT identifierPattern)*
: {this.isDevVersion()}? OPENING_BRACKET qualifier=ID_PATTERN CLOSING_BRACKET DOT OPENING_BRACKET name=fieldNamePattern CLOSING_BRACKET
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if simplifying the grammar, reducing the scope to allow only qualifier = UNQUOTED_IDENTIFIER for qualifiedNamePattern would make it a bit easier as the first step. If I understand the intention correctly here, we want to allow patterns in qualifiers, as we allow patterns on index/field/attribute names.

For example, allowing the query below to be valid

FROM test
LOOKUP JOIN test t1 on x
LOOKUP JOIN test t2 on y
DROP [t*].[z*]

As far as I know, SQL does not use * in qualifiers to represent multiple tables. As qualifier is new, and we have the control of its scope, expanding a scope is easier than shrinking it, as the later could be a breaking change potentially.

Alternatively, if we really want to allow wildcard in the qualifier, will qualifier=UNQUOTED_ID_PATTERN work? Does the qualifiers need to be quoted? As it looks like the qualifiers defined in Join is UNQUOTED_SOURCE/UNQUOTED_IDENTIFIER.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if simplifying the grammar, reducing the scope to allow only qualifier = UNQUOTED_IDENTIFIER
I wonder if simplifying the grammar, reducing the scope to allow only qualifier = UNQUOTED_IDENTIFIER

I wish the same! Unfortunately, when trying this I realized that you cannot have this here because he lexer cannot distinguish between UNQUOTED_IDENTIFIER and ID_PATTERN , and ID_PATTERN is required to make the rest of KEEP/DROP work correctly.

As qualifier is new, and we have the control of its scope, expanding a scope is easier than shrinking it, as the later could be a breaking change potentially.

Absolutely. The allowed characters for qualifiers have to be restricted, both in patterns and in non-patterns. But let's do that in a separate PR; it's already tracked in the meta issue #133068.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, if we really want to allow wildcard in the qualifier, will qualifier=UNQUOTED_ID_PATTERN work?

UNQUOTED_ID_PATTERN is a fragment. I didn't want to risk confusing the lexer and breaking KEEP/DROP by making it a token. I think making it a token would introduce ambiguity into the lexing of KEEP/DROP commands as it would conflict with ID_PATTERN.


var src = source(ctx);
String qualifier = qualifiedCtx.qualifier != null ? qualifiedCtx.qualifier.getText() : null;
if (qualifier != null && qualifier.charAt(0) == '`') {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the quoted pattern is not allowed in the grammar, perhaps this check can be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above - I don't want to mess up the lexer.

return visitList(this, contexts, Expression.class);
}

protected static ParsingException qualifiersUnsupportedInFieldDefinitions(Source source, String qualifiedName) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative approach that I can think of is limiting the scope of qualifiers in the grammar where the qualifiers are not supported, where we throw these qualifiers unsupported exceptions - in the grammar, use fieldName and fieldNamePattern instead of qualifiedName and qualifiedNamePattern.

Both approaches need multiple lines of changes though, I'm not quite sure one is easier than the other. The advantage of this approach is that we can get more meaningful error messages in the ParsingException, the advantage of limiting it in the grammar is that it could help auto-completion to detect the unsupported places earlier from the grammar errors.

Just want to bring this up, so that we are aware of these options, I'm fine with either of them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • in the grammar, use fieldName and fieldNamePattern instead of qualifiedName and qualifiedNamePattern.

Oh yes! Just using fieldName and fieldNamePattern in commands is much cleaner.

But I did not want to make a large scale change to the grammar while this is still so in flux. I'll add it to #133068, though, because this approach is less brittle.

;

qualifiedName
: {this.isDevVersion()}? OPENING_BRACKET qualifier=UNQUOTED_IDENTIFIER? CLOSING_BRACKET DOT OPENING_BRACKET name=fieldName CLOSING_BRACKET
Copy link
Contributor

@idegtiarenko idegtiarenko Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a quiet verbose syntax that requires wrapping an identifier with brackets rather than just pre-pending it with qualifier when changing existing query.

I wonder about the following alternatives:

  • [qualifier].identifier
  • qualifier -> identifier

They are a bit shorter. Could they be ambiguous?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR's syntax may not be the final one. That said, I think verbose and unambiguous is fine because we plan to introduce shorthands when unambiguous: qualifier.identifier and just identifier.

I personally like [qualifier].identifier as well, but received feedback not to use -> in case we want to introduce lambdas into the language at some point in time.

* 352b - remove EsIndex mapping serialization #119580
* 350b - remove unused fields from FieldAttribute #127854
* 351b - added time series field type to EsField #129649
* 352b - added qualifier back to FieldAttribute #132925
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does back implies we had them at some point before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! They were removed in #127854.

Copy link
Contributor

@julian-elastic julian-elastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for addressing my concerns. I am OK with the end to end tests and quoting tests to be done in a separate PR so we can start working on those in parallel.

@alex-spies
Copy link
Contributor Author

CI is green but somehow won't update the status. This is safe to merge.

@alex-spies alex-spies merged commit 99318f4 into elastic:main Aug 28, 2025
32 of 33 checks passed
@alex-spies alex-spies deleted the qualifiers-in-lu-join branch August 28, 2025 11:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants