Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions docs/changelog/120291.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pr: 120291
summary: ESQL - Allow full text functions disjunctions for non-full text functions
area: ES|QL
type: feature
issues: []
111 changes: 94 additions & 17 deletions docs/reference/esql/esql-limitations.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -30,11 +30,11 @@ include::processing-commands/limit.asciidoc[tag=limitation]
** You can use `to_datetime` to cast to millisecond dates to use unsupported functions
* `double` (`float`, `half_float`, `scaled_float` are represented as `double`)
* `ip`
* `keyword` family including `keyword`, `constant_keyword`, and `wildcard`
* `keyword` <<keyword, family>> including `keyword`, `constant_keyword`, and `wildcard`
* `int` (`short` and `byte` are represented as `int`)
* `long`
* `null`
* `text`
* `text` <<text, family>> including `text`, `semantic_text` and `match_only_text`
* experimental:[] `unsigned_long`
* `version`
* Spatial types
Expand Down Expand Up @@ -112,33 +112,57 @@ it is necessary to use the search function, like <<esql-match>>, in a <<esql-whe
directly after the <<esql-from>> source command, or close enough to it.
Otherwise, the query will fail with a validation error.
Another limitation is that any <<esql-where>> command containing a full-text search function
cannot also use disjunctions (`OR`).
cannot use disjunctions (`OR`), unless:

Because of <<esql-limitations-text-fields,the way {esql} treats `text` values>>,
queries on `text` fields are like queries on `keyword` fields: they are
case-sensitive and need to match the full string.
* All functions used in the OR clauses are full-text functions themselves, or scoring is not used

For example, this query is valid:

For example, after indexing a field of type `text` with the value `Elasticsearch
query language`, the following `WHERE` clause does not match because the `LIKE`
operator is case-sensitive:
[source,esql]
----
| WHERE field LIKE "elasticsearch query language"
FROM books
| WHERE MATCH(author, "Faulkner") AND MATCH(author, "Tolkien")
----

The following `WHERE` clause does not match either, because the `LIKE` operator
tries to match the whole string:
But this query will fail due to the <<esql-stats-by, STATS>> command:

[source,esql]
----
| WHERE field LIKE "Elasticsearch"
FROM books
| STATS AVG(price) BY author
| WHERE MATCH(author, "Faulkner")
----

As a workaround, use wildcards and regular expressions. For example:
And this query that uses a disjunction will succeed:

[source,esql]
----
| WHERE field RLIKE "[Ee]lasticsearch.*"
FROM books
| WHERE MATCH(author, "Faulkner") OR QSTR("author: Hemingway")
----

However using scoring will fail because it uses a non full text function as part of the disjunction:

[source,esql]
----
FROM books METADATA _score
| WHERE MATCH(author, "Faulkner") OR author LIKE "Hemingway"
----

Scoring will work in the following query, as it uses full text functions on both `OR` clauses:

[source,esql]
----
FROM books METADATA _score
| WHERE MATCH(author, "Faulkner") OR QSTR("author: Hemingway")
----


Note that, because of <<esql-limitations-text-fields,the way {esql} treats `text` values>>,
any queries on `text` fields that do not explicitly use the full-text functions,
<<esql-match>>, <<esql-qstr>> or <<esql-kql>>, will behave as if the fields are actually `keyword` fields:
they are case-sensitive and need to match the full string.

[discrete]
[[esql-limitations-text-fields]]
=== `text` fields behave like `keyword` fields
Expand All @@ -151,15 +175,68 @@ that. If it's not possible to retrieve a `keyword` subfield, {esql} will get the
string from a document's `_source`. If the `_source` cannot be retrieved, for
example when using synthetic source, `null` is returned.

Once a `text` field is retrieved, if the query touches it in any way, for example passing
it into a function, the type will be converted to `keyword`. In fact, functions that operate on both
`text` and `keyword` fields will perform as if the `text` field was a `keyword` field all along.

For example, the following query will return a column `greatest` of type `keyword` no matter
whether any or all of `field1`, `field2`, and `field3` are of type `text`:
[source,esql]
----
| FROM index
| EVAL greatest = GREATEST(field1, field2, field3)
----

Note that {esql}'s retrieval of `keyword` subfields may have unexpected
consequences. An {esql} query on a `text` field is case-sensitive. Furthermore,
a subfield may have been mapped with a <<normalizer,normalizer>>, which can
consequences. Other than when explicitly using the full-text functions, <<esql-match>> and <<esql-qstr>>,
any {esql} query on a `text` field is case-sensitive.

For example, after indexing a field of type `text` with the value `Elasticsearch
query language`, the following `WHERE` clause does not match because the `LIKE`
operator is case-sensitive:
[source,esql]
----
| WHERE field LIKE "elasticsearch query language"
----

The following `WHERE` clause does not match either, because the `LIKE` operator
tries to match the whole string:
[source,esql]
----
| WHERE field LIKE "Elasticsearch"
----

As a workaround, use wildcards and regular expressions. For example:
[source,esql]
----
| WHERE field RLIKE "[Ee]lasticsearch.*"
----

Furthermore, a subfield may have been mapped with a <<normalizer,normalizer>>, which can
transform the original string. Or it may have been mapped with <<ignore-above>>,
which can truncate the string. None of these mapping operations are applied to
an {esql} query, which may lead to false positives or negatives.

To avoid these issues, a best practice is to be explicit about the field that
you query, and query `keyword` sub-fields instead of `text` fields.
Or consider using one of the <<esql-search-functions,full-text search>> functions.

[discrete]
[[esql-multi-index-limitations]]
=== Using {esql} to query multiple indices

As discussed in more detail in <<esql-multi-index>>, {esql} can execute a single query across multiple indices,
data streams, or aliases. However, there are some limitations to be aware of:

* All underlying indexes and shards must be active. Using admin commands or UI,
it is possible to pause an index or shard, for example by disabling a frozen tier instance,
but then any {esql} query that includes that index or shard will fail, even if the query uses
<<esql-where>> to filter out the results from the paused index.
If you see an error of type `search_phase_execution_exception`,
with the message `Search rejected due to missing shards`, you likely have an index or shard in `UNASSIGNED` state.
* The same field must have the same type across all indexes. If the same field is mapped to different types
it is still possible to query the indexes,
but the field must be <<esql-multi-index-union-types,explicitly converted to a single type>>.

[discrete]
[[esql-tsdb]]
Expand Down
2 changes: 1 addition & 1 deletion x-pack/plugin/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -217,5 +217,5 @@ tasks.named("yamlRestTestV7CompatTransform").configure({ task ->
task.skipTest("esql/190_lookup_join/alias-repeated-index", "LOOKUP JOIN does not support index aliases for now")
task.skipTest("esql/190_lookup_join/alias-pattern-multiple", "LOOKUP JOIN does not support index aliases for now")
task.skipTest("esql/190_lookup_join/alias-pattern-single", "LOOKUP JOIN does not support index aliases for now")

task.skipTest("esql/180_match_operator/match with disjunctions", "Disjunctions in full text functions work now")
})
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
import org.elasticsearch.compute.data.DocVector;
import org.elasticsearch.compute.data.IntVector;
import org.elasticsearch.compute.data.Page;
import org.elasticsearch.compute.operator.DriverContext;
import org.elasticsearch.compute.operator.EvalOperator;
import org.elasticsearch.core.Releasable;
import org.elasticsearch.core.Releasables;
Expand All @@ -44,19 +45,20 @@ public record ShardConfig(Query query, IndexSearcher searcher) {}

private final BlockFactory blockFactory;
private final ShardConfig[] shards;
private final int docChannel;

private ShardState[] perShardState = EMPTY_SHARD_STATES;

public LuceneQueryExpressionEvaluator(BlockFactory blockFactory, ShardConfig[] shards, int docChannel) {
public LuceneQueryExpressionEvaluator(BlockFactory blockFactory, ShardConfig[] shards) {
this.blockFactory = blockFactory;
this.shards = shards;
this.docChannel = docChannel;
}

@Override
public Block eval(Page page) {
DocVector docs = page.<DocBlock>getBlock(docChannel).asVector();
// Lucene based operators retrieve DocVectors as first block
Block block = page.getBlock(0);
assert block instanceof DocBlock : "LuceneQueryExpressionEvaluator expects DocBlock as input";
DocVector docs = (DocVector) block.asVector();
try {
if (docs.singleSegmentNonDecreasing()) {
return evalSingleSegmentNonDecreasing(docs).asBlock();
Expand Down Expand Up @@ -341,4 +343,17 @@ public void close() {
Releasables.closeExpectNoException(builder);
}
}

public static class Factory implements EvalOperator.ExpressionEvaluator.Factory {
private final ShardConfig[] shardConfigs;

public Factory(ShardConfig[] shardConfigs) {
this.shardConfigs = shardConfigs;
}

@Override
public EvalOperator.ExpressionEvaluator get(DriverContext context) {
return new LuceneQueryExpressionEvaluator(context.blockFactory(), shardConfigs);
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -183,8 +183,8 @@ private List<Page> runQuery(Set<String> values, Query query, boolean shuffleDocs
);
LuceneQueryExpressionEvaluator luceneQueryEvaluator = new LuceneQueryExpressionEvaluator(
blockFactory,
new LuceneQueryExpressionEvaluator.ShardConfig[] { shard },
0
new LuceneQueryExpressionEvaluator.ShardConfig[] { shard }

);

List<Operator> operators = new ArrayList<>();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -152,3 +152,40 @@ emp_no:integer | first_name:keyword | last_name:keyword
10053 | Sanjiv | Zschoche
10069 | Margareta | Bierman
;

testKqlWithNonPushableDisjunctions
required_capability: kql_function
required_capability: full_text_functions_disjunctions_compute_engine

from books
| where kql("title:lord") or length(title) > 130
| keep book_no
;
ignoreOrder: true

book_no:keyword
2675
2714
4023
7140
8678
;

testKqlWithNonPushableDisjunctionsOnComplexExpressions
required_capability: kql_function
required_capability: full_text_functions_disjunctions_compute_engine

from books
| where (kql("title:lord") and ratings > 4.5) or (kql("author:dostoevsky") and length(title) > 50)
| keep book_no
;
ignoreOrder: true

book_no:keyword
2675
2924
4023
1937
7140
2714
;
Original file line number Diff line number Diff line change
Expand Up @@ -718,3 +718,40 @@ from books
title:text
The Hobbit or There and Back Again
;

testMatchWithNonPushableDisjunctions
required_capability: match_function
required_capability: full_text_functions_disjunctions_compute_engine

from books
| where match(title, "lord") or length(title) > 130
| keep book_no
;
ignoreOrder: true

book_no:keyword
2675
2714
4023
7140
8678
;

testMatchWithNonPushableDisjunctionsOnComplexExpressions
required_capability: match_function
required_capability: full_text_functions_disjunctions_compute_engine

from books
| where (match(title, "lord") and ratings > 4.5) or (match(author, "dostoevsky") and length(title) > 50)
| keep book_no
;
ignoreOrder: true

book_no:keyword
2675
2924
4023
1937
7140
2714
;
Original file line number Diff line number Diff line change
Expand Up @@ -684,3 +684,40 @@ from semantic_text
host:keyword | semantic_text_field:text
"host1" | live long and prosper
;

testMatchWithNonPushableDisjunctions
required_capability: match_operator_colon
required_capability: full_text_functions_disjunctions_compute_engine

from books
| where title:"lord" or length(title) > 130
| keep book_no
;
ignoreOrder: true

book_no:keyword
2675
2714
4023
7140
8678
;

testMatchWithNonPushableDisjunctionsOnComplexExpressions
required_capability: match_operator_colon
required_capability: full_text_functions_disjunctions_compute_engine

from books
| where (title:"lord" and ratings > 4.5) or (author:"dostoevsky" and length(title) > 50)
| keep book_no
;
ignoreOrder: true

book_no:keyword
2675
2924
4023
1937
7140
2714
;
Original file line number Diff line number Diff line change
Expand Up @@ -152,3 +152,40 @@ emp_no:integer | first_name:keyword | last_name:keyword
10053 | Sanjiv | Zschoche
10069 | Margareta | Bierman
;

testQstrWithNonPushableDisjunctions
required_capability: qstr_function
required_capability: full_text_functions_disjunctions_compute_engine

from books
| where qstr("title:lord") or length(title) > 130
| keep book_no
;
ignoreOrder: true

book_no:keyword
2675
2714
4023
7140
8678
;

testQstrWithNonPushableDisjunctionsOnComplexExpressions
required_capability: qstr_function
required_capability: full_text_functions_disjunctions_compute_engine

from books
| where (qstr("title:lord") and ratings > 4.5) or (qstr("author:dostoevsky") and length(title) > 50)
| keep book_no
;
ignoreOrder: true

book_no:keyword
2675
2924
4023
1937
7140
2714
;
Loading