Skip to content

Conversation

jbaiera
Copy link
Member

@jbaiera jbaiera commented Jan 22, 2025

Selector syntax (#118614) was introduced as part of the work to support implementing failure stores in data streams. This is a new feature of index patterns which allows a user to specify which indices inside of a data stream should be used in an action. Selectors are denoted by using the :: separator between a data stream name and the component of the data stream the user wants to target for an operation.

To search a data stream's backing indices only, the ::data selector is used:

FROM my-data-stream::data

To search a data stream's failure indices only, the ::failures selector is used:

FROM my-data-stream::failures

These selectors support being mixed with other index pattern syntax as well

FROM logs-*::data
FROM *::data,*::failures,-logs-private::failures
FROM <my-data-stream-dates-{now/M{yyyy.MM}}>::data

By default, when a data stream has no selector specified, the ::data selector is implied to maintain backwards compatibility with current search functionality. The ::data selector primarily exists as a way to explicitly select the backing indices, but is not required for normal usage.

This PR updates the ES|QL grammar to include the selector portion of an index pattern. Patterns are recombined before being sent to field caps. Field caps already supports this functionality, so this is primarily wiring it up where needed.

@jbaiera jbaiera marked this pull request as ready for review January 24, 2025 17:59
@elasticsearchmachine elasticsearchmachine added Team:Data Management Meta label for data/management team Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) labels Jan 24, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick drive by comments - I'll let @fang-xing-esql drive this. Thanks!

@costin costin requested review from costin and removed request for costin January 24, 2025 19:07
Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @jbaiera, thank you for adding this to ES|QL.
I've left a minor comment regarding exception wrapping.
Also, does testing :: with a Security-enabled cluster have any particularities, or the selector should just work?

Copy link
Member

@fang-xing-esql fang-xing-esql left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for enabling selector in ES|QL @jbaiera! I added something that I can think of so far, they are mainly around where do we expect the selector to work.

The index pattern is allowed in multiple commands/places in the ES|QL statements, such as from, metrics, join etc. and plus CCS, -(exclusion), *(wildcard), there is a good amount of combinations of these factors. It will be nice to have tests to cover some more combinations of different scenarios where selector is expected to work or not work, so that we have a better idea of the scope of the change.


indexPattern
: (clusterString COLON)? indexString
| {this.isDevVersion()}? (clusterString COLON)? indexString (CAST_OP selectorString)?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indexPattern is used in a few other places like joinCommand and metricsCommand etc. Do we expect the selector work in the other commands besides fromCommand? If we don't want to support it under the other command, we can put the restrictions in LogicalPlanBuilder, and it will be great to have negative/positive tests to cover the them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to say that this should work in the joinCommand, but I don't know if there are any functional restrictions on that command that might bar the pattern's usage. I'm not sure I know what the metricsCommand actually is and what the difference is between it and the fromCommand.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both joinCommand and metricsCommand are under snapshot today.

Here are something that I can think of that might affect ::, that we need to validate, the best way to validate them is to test them.

  • * is not supported on the right hand side of the join, however it is a valid :: option
var rightPattern = visitIndexPattern(List.of(target.index));
        if (rightPattern.contains(WILDCARD)) {
            throw new ParsingException(source(target), "invalid index pattern [{}], * is not allowed in LOOKUP JOIN", rightPattern);
        }
  • metrics is mainly for tsdb, I'm not quite sure if :: is relevant to tsdb honestly, if it is not relevant we can error out early in parsing, otherwise we may also want to have tests to show how it works.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the reason that wildcards are banned from the join command because they resolve to multiple indices? I can say for sure that a selector of ::* has a good chance of resolving to multiple indices. That said, putting any data stream name in the right hand side of the join operation will likely resolve to multiple backing indices, much like an alias pointing to multiple indices would. How do we handle aliases and data streams on the right hand side of the join today?

As for metrics and TSDB - the failure store doesn't surface any time series features like a time series data stream might. A time series data stream can have a failure store associated with it, but its failure store doesn't exhibit any of the TSDB functionality. If the expectation is that the metrics command is only ever used with TSDB, I don't know if it makes sense to allow selectors in it for the time being, since the only valid selectors are likely ::data in that case.

;

selectorString
: UNQUOTED_SOURCE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we allow quoting selector?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to simply follow how the definition for the clusterString are laid out. I don't have too much of an opinion on quoting the selector or not. The selector contents can only ever be data, failures, or *. I don't know what quoting them separately from the rest of the index pattern might buy us.

@jbaiera jbaiera requested a review from fang-xing-esql March 18, 2025 06:46
Copy link
Member

@fang-xing-esql fang-xing-esql left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jbaiera! I added some comments that I can think of so far.

var joinPattern = randomIndexPattern(without(CROSS_CLUSTER), without(WILDCARD_PATTERN), without(INDEX_SELECTOR));
joinPattern = maybeQuote(joinPattern + "::garbage");
expectError(
"FROM " + randomIndexPatterns() + " | JOIN " + joinPattern + " ON " + randomIdentifier(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message looks a bit unclear. I wonder if it is possible to get a message says that ""selector is not supported on the right hand side of a lookup join"? And why using without(INDEX_SELECTOR) with maybeQuote(joinPattern + "::garbage"), instead of just calling randomIndexPattern(without(CROSS_CLUSTER), without(WILDCARD_PATTERN), INDEX_SELECTOR)? And it seems the left hand side allows both clusterString and selectorString, which may add some variations of the error message? If it does, perhaps test the left hand side and right hand side of a lookup join separate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message looks a bit unclear. I wonder if it is possible to get a message says that ""selector is not supported on the right hand side of a lookup join"?

This seems to be something that is generated as part of the antlr parsing - I'm not sure how to override the error returned from antlr in this case with something more helpful

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And why using without(INDEX_SELECTOR) with maybeQuote(joinPattern + "::garbage"), instead of just calling randomIndexPattern(without(CROSS_CLUSTER), without(WILDCARD_PATTERN), INDEX_SELECTOR)?

I'll fix that up - I think that's left behind from when I was validating that incorrect selectors are rejected from join operations. Now all selectors are rejected from join operations.

And it seems the left hand side allows both clusterString and selectorString, which may add some variations of the error message? If it does, perhaps test the left hand side and right hand side of a lookup join separate.

I'll try and capture a couple of different scenarios for the left side of the query to make sure it's failing with expected messages.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added additional cases and clean up with 6d5b049

if (EsqlCapabilities.Cap.INDEX_COMPONENT_SELECTORS.isEnabled()) {
assertStringAsIndexPattern("foo::data", command + " foo::data");
assertStringAsIndexPattern("foo::failures", command + " foo::failures");
assertStringAsIndexPattern("cluster:foo::failures", command + " cluster:\"foo::failures\"");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I might miss something... should this fail, if selector does not work with remote cluster?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is part of the problem with the validation logic as laid out by #122497 and #122651 - Because the cluster is outside of the quotation marks we parse it as normal, but the content inside of the quotation marks is never parsed or checked because remote cluster resources are not validated locally.

validateClusterString(clusterString, c);
// Do not allow selectors on remote cluster expressions until they are supported
if (selectorString != null) {
InvalidIndexNameException ie = new InvalidIndexNameException(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem like this code path will be exercised, EsqlBaseParser.g4 does not allow clusterString coexist with selectorString

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean it should be removed?

clusterAndIndexAsIndexPattern(command, "cluster*:*");
clusterAndIndexAsIndexPattern(command, "*:index*");
clusterAndIndexAsIndexPattern(command, "*:*");
if (EsqlCapabilities.Cap.INDEX_COMPONENT_SELECTORS.isEnabled()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have tests with * as selector?

Copy link
Member Author

@jbaiera jbaiera Mar 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* is no longer a supported selector value as of #121900

@jbaiera jbaiera requested a review from fang-xing-esql March 21, 2025 14:35
Copy link
Member

@fang-xing-esql fang-xing-esql left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jbaiera, LGTM!

pattern = maybeQuote(cluster + ":" + pattern);
} else if (EsqlCapabilities.Cap.INDEX_COMPONENT_SELECTORS.isEnabled() && canAdd(Features.INDEX_SELECTOR, features)) {
var selector = ESTestCase.randomFrom(IndexComponentSelector.values());
pattern = maybeQuote(pattern + "::" + selector.getKey());
Copy link
Contributor

@idegtiarenko idegtiarenko Mar 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned above in the thread, cross cluster is independent from index selectors.
Could you please convert if else to if?

@jbaiera jbaiera merged commit 81ef7dd into elastic:main Mar 24, 2025
17 checks passed
@jbaiera jbaiera deleted the failure-store-esql-support branch March 24, 2025 18:08
jbaiera added a commit to jbaiera/elasticsearch that referenced this pull request Mar 24, 2025
This PR updates the ES|QL grammar to include the selector portion of an index pattern. Patterns
are recombined before being sent to field caps. Field caps already supports this functionality, so
this is primarily wiring it up where needed.
FROM_OPENING_BRACKET : OPENING_BRACKET -> type(OPENING_BRACKET);
FROM_CLOSING_BRACKET : CLOSING_BRACKET -> type(CLOSING_BRACKET);
FROM_COLON : COLON -> type(COLON);
FROM_SELECTOR : {this.isDevVersion()}? CAST_OP -> type(CAST_OP);
Copy link
Contributor

@idegtiarenko idegtiarenko Mar 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks unused. indexPattern appears to be relying on CAST_OP directly

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parser does use the cast op directly, but without this entry in the lexicon the parsing logic rejects the :: character with an error like line 1:10: extraneous input ':' expecting {QUOTED_STRING, UNQUOTED_SOURCE}

elasticsearchmachine pushed a commit that referenced this pull request Mar 28, 2025
* Support index pattern selector syntax in ES|QL (#120660)

This PR updates the ES|QL grammar to include the selector portion of an index pattern. Patterns
are recombined before being sent to field caps. Field caps already supports this functionality, so
this is primarily wiring it up where needed.

* Add validation step to quoted join query construction (#125731)

* Add validation step to quoted join query construction

* Fix double validation

* Add lexer entry for METRICS commands

* Simplifying test case

* regen
omricohenn pushed a commit to omricohenn/elasticsearch that referenced this pull request Mar 28, 2025
This PR updates the ES|QL grammar to include the selector portion of an index pattern. Patterns 
are recombined before being sent to field caps. Field caps already supports this functionality, so 
this is primarily wiring it up where needed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL :Data Management/Data streams Data streams and their lifecycles >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Data Management Meta label for data/management team v8.19.0 v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants