Skip to content

Commit 2db1d4f

Browse files
authored
ESQL - Allow full text functions disjunctions for non-full text functions (#120291) (#121026)
(cherry picked from commit a87bd7a) # Conflicts: # docs/reference/esql/esql-limitations.asciidoc
1 parent 604c015 commit 2db1d4f

File tree

25 files changed

+541
-177
lines changed

25 files changed

+541
-177
lines changed

docs/changelog/120291.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
pr: 120291
2+
summary: ESQL - Allow full text functions disjunctions for non-full text functions
3+
area: ES|QL
4+
type: feature
5+
issues: []

docs/reference/esql/esql-limitations.asciidoc

Lines changed: 94 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -30,11 +30,11 @@ include::processing-commands/limit.asciidoc[tag=limitation]
3030
** You can use `to_datetime` to cast to millisecond dates to use unsupported functions
3131
* `double` (`float`, `half_float`, `scaled_float` are represented as `double`)
3232
* `ip`
33-
* `keyword` family including `keyword`, `constant_keyword`, and `wildcard`
33+
* `keyword` <<keyword, family>> including `keyword`, `constant_keyword`, and `wildcard`
3434
* `int` (`short` and `byte` are represented as `int`)
3535
* `long`
3636
* `null`
37-
* `text`
37+
* `text` <<text, family>> including `text`, `semantic_text` and `match_only_text`
3838
* experimental:[] `unsigned_long`
3939
* `version`
4040
* Spatial types
@@ -112,33 +112,57 @@ it is necessary to use the search function, like <<esql-match>>, in a <<esql-whe
112112
directly after the <<esql-from>> source command, or close enough to it.
113113
Otherwise, the query will fail with a validation error.
114114
Another limitation is that any <<esql-where>> command containing a full-text search function
115-
cannot also use disjunctions (`OR`).
115+
cannot use disjunctions (`OR`), unless:
116116

117-
Because of <<esql-limitations-text-fields,the way {esql} treats `text` values>>,
118-
queries on `text` fields are like queries on `keyword` fields: they are
119-
case-sensitive and need to match the full string.
117+
* All functions used in the OR clauses are full-text functions themselves, or scoring is not used
118+
119+
For example, this query is valid:
120120

121-
For example, after indexing a field of type `text` with the value `Elasticsearch
122-
query language`, the following `WHERE` clause does not match because the `LIKE`
123-
operator is case-sensitive:
124121
[source,esql]
125122
----
126-
| WHERE field LIKE "elasticsearch query language"
123+
FROM books
124+
| WHERE MATCH(author, "Faulkner") AND MATCH(author, "Tolkien")
127125
----
128126

129-
The following `WHERE` clause does not match either, because the `LIKE` operator
130-
tries to match the whole string:
127+
But this query will fail due to the <<esql-stats-by, STATS>> command:
128+
131129
[source,esql]
132130
----
133-
| WHERE field LIKE "Elasticsearch"
131+
FROM books
132+
| STATS AVG(price) BY author
133+
| WHERE MATCH(author, "Faulkner")
134134
----
135135

136-
As a workaround, use wildcards and regular expressions. For example:
136+
And this query that uses a disjunction will succeed:
137+
137138
[source,esql]
138139
----
139-
| WHERE field RLIKE "[Ee]lasticsearch.*"
140+
FROM books
141+
| WHERE MATCH(author, "Faulkner") OR QSTR("author: Hemingway")
140142
----
141143

144+
However using scoring will fail because it uses a non full text function as part of the disjunction:
145+
146+
[source,esql]
147+
----
148+
FROM books METADATA _score
149+
| WHERE MATCH(author, "Faulkner") OR author LIKE "Hemingway"
150+
----
151+
152+
Scoring will work in the following query, as it uses full text functions on both `OR` clauses:
153+
154+
[source,esql]
155+
----
156+
FROM books METADATA _score
157+
| WHERE MATCH(author, "Faulkner") OR QSTR("author: Hemingway")
158+
----
159+
160+
161+
Note that, because of <<esql-limitations-text-fields,the way {esql} treats `text` values>>,
162+
any queries on `text` fields that do not explicitly use the full-text functions,
163+
<<esql-match>>, <<esql-qstr>> or <<esql-kql>>, will behave as if the fields are actually `keyword` fields:
164+
they are case-sensitive and need to match the full string.
165+
142166
[discrete]
143167
[[esql-limitations-text-fields]]
144168
=== `text` fields behave like `keyword` fields
@@ -151,15 +175,68 @@ that. If it's not possible to retrieve a `keyword` subfield, {esql} will get the
151175
string from a document's `_source`. If the `_source` cannot be retrieved, for
152176
example when using synthetic source, `null` is returned.
153177

178+
Once a `text` field is retrieved, if the query touches it in any way, for example passing
179+
it into a function, the type will be converted to `keyword`. In fact, functions that operate on both
180+
`text` and `keyword` fields will perform as if the `text` field was a `keyword` field all along.
181+
182+
For example, the following query will return a column `greatest` of type `keyword` no matter
183+
whether any or all of `field1`, `field2`, and `field3` are of type `text`:
184+
[source,esql]
185+
----
186+
| FROM index
187+
| EVAL greatest = GREATEST(field1, field2, field3)
188+
----
189+
154190
Note that {esql}'s retrieval of `keyword` subfields may have unexpected
155-
consequences. An {esql} query on a `text` field is case-sensitive. Furthermore,
156-
a subfield may have been mapped with a <<normalizer,normalizer>>, which can
191+
consequences. Other than when explicitly using the full-text functions, <<esql-match>> and <<esql-qstr>>,
192+
any {esql} query on a `text` field is case-sensitive.
193+
194+
For example, after indexing a field of type `text` with the value `Elasticsearch
195+
query language`, the following `WHERE` clause does not match because the `LIKE`
196+
operator is case-sensitive:
197+
[source,esql]
198+
----
199+
| WHERE field LIKE "elasticsearch query language"
200+
----
201+
202+
The following `WHERE` clause does not match either, because the `LIKE` operator
203+
tries to match the whole string:
204+
[source,esql]
205+
----
206+
| WHERE field LIKE "Elasticsearch"
207+
----
208+
209+
As a workaround, use wildcards and regular expressions. For example:
210+
[source,esql]
211+
----
212+
| WHERE field RLIKE "[Ee]lasticsearch.*"
213+
----
214+
215+
Furthermore, a subfield may have been mapped with a <<normalizer,normalizer>>, which can
157216
transform the original string. Or it may have been mapped with <<ignore-above>>,
158217
which can truncate the string. None of these mapping operations are applied to
159218
an {esql} query, which may lead to false positives or negatives.
160219

161220
To avoid these issues, a best practice is to be explicit about the field that
162221
you query, and query `keyword` sub-fields instead of `text` fields.
222+
Or consider using one of the <<esql-search-functions,full-text search>> functions.
223+
224+
[discrete]
225+
[[esql-multi-index-limitations]]
226+
=== Using {esql} to query multiple indices
227+
228+
As discussed in more detail in <<esql-multi-index>>, {esql} can execute a single query across multiple indices,
229+
data streams, or aliases. However, there are some limitations to be aware of:
230+
231+
* All underlying indexes and shards must be active. Using admin commands or UI,
232+
it is possible to pause an index or shard, for example by disabling a frozen tier instance,
233+
but then any {esql} query that includes that index or shard will fail, even if the query uses
234+
<<esql-where>> to filter out the results from the paused index.
235+
If you see an error of type `search_phase_execution_exception`,
236+
with the message `Search rejected due to missing shards`, you likely have an index or shard in `UNASSIGNED` state.
237+
* The same field must have the same type across all indexes. If the same field is mapped to different types
238+
it is still possible to query the indexes,
239+
but the field must be <<esql-multi-index-union-types,explicitly converted to a single type>>.
163240

164241
[discrete]
165242
[[esql-tsdb]]

x-pack/plugin/build.gradle

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -217,5 +217,5 @@ tasks.named("yamlRestTestV7CompatTransform").configure({ task ->
217217
task.skipTest("esql/190_lookup_join/alias-repeated-index", "LOOKUP JOIN does not support index aliases for now")
218218
task.skipTest("esql/190_lookup_join/alias-pattern-multiple", "LOOKUP JOIN does not support index aliases for now")
219219
task.skipTest("esql/190_lookup_join/alias-pattern-single", "LOOKUP JOIN does not support index aliases for now")
220-
220+
task.skipTest("esql/180_match_operator/match with disjunctions", "Disjunctions in full text functions work now")
221221
})

x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/lucene/LuceneQueryExpressionEvaluator.java

Lines changed: 19 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@
2525
import org.elasticsearch.compute.data.DocVector;
2626
import org.elasticsearch.compute.data.IntVector;
2727
import org.elasticsearch.compute.data.Page;
28+
import org.elasticsearch.compute.operator.DriverContext;
2829
import org.elasticsearch.compute.operator.EvalOperator;
2930
import org.elasticsearch.core.Releasable;
3031
import org.elasticsearch.core.Releasables;
@@ -44,19 +45,20 @@ public record ShardConfig(Query query, IndexSearcher searcher) {}
4445

4546
private final BlockFactory blockFactory;
4647
private final ShardConfig[] shards;
47-
private final int docChannel;
4848

4949
private ShardState[] perShardState = EMPTY_SHARD_STATES;
5050

51-
public LuceneQueryExpressionEvaluator(BlockFactory blockFactory, ShardConfig[] shards, int docChannel) {
51+
public LuceneQueryExpressionEvaluator(BlockFactory blockFactory, ShardConfig[] shards) {
5252
this.blockFactory = blockFactory;
5353
this.shards = shards;
54-
this.docChannel = docChannel;
5554
}
5655

5756
@Override
5857
public Block eval(Page page) {
59-
DocVector docs = page.<DocBlock>getBlock(docChannel).asVector();
58+
// Lucene based operators retrieve DocVectors as first block
59+
Block block = page.getBlock(0);
60+
assert block instanceof DocBlock : "LuceneQueryExpressionEvaluator expects DocBlock as input";
61+
DocVector docs = (DocVector) block.asVector();
6062
try {
6163
if (docs.singleSegmentNonDecreasing()) {
6264
return evalSingleSegmentNonDecreasing(docs).asBlock();
@@ -341,4 +343,17 @@ public void close() {
341343
Releasables.closeExpectNoException(builder);
342344
}
343345
}
346+
347+
public static class Factory implements EvalOperator.ExpressionEvaluator.Factory {
348+
private final ShardConfig[] shardConfigs;
349+
350+
public Factory(ShardConfig[] shardConfigs) {
351+
this.shardConfigs = shardConfigs;
352+
}
353+
354+
@Override
355+
public EvalOperator.ExpressionEvaluator get(DriverContext context) {
356+
return new LuceneQueryExpressionEvaluator(context.blockFactory(), shardConfigs);
357+
}
358+
}
344359
}

x-pack/plugin/esql/compute/src/test/java/org/elasticsearch/compute/lucene/LuceneQueryExpressionEvaluatorTests.java

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -183,8 +183,8 @@ private List<Page> runQuery(Set<String> values, Query query, boolean shuffleDocs
183183
);
184184
LuceneQueryExpressionEvaluator luceneQueryEvaluator = new LuceneQueryExpressionEvaluator(
185185
blockFactory,
186-
new LuceneQueryExpressionEvaluator.ShardConfig[] { shard },
187-
0
186+
new LuceneQueryExpressionEvaluator.ShardConfig[] { shard }
187+
188188
);
189189

190190
List<Operator> operators = new ArrayList<>();

x-pack/plugin/esql/qa/testFixtures/src/main/resources/kql-function.csv-spec

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -152,3 +152,40 @@ emp_no:integer | first_name:keyword | last_name:keyword
152152
10053 | Sanjiv | Zschoche
153153
10069 | Margareta | Bierman
154154
;
155+
156+
testKqlWithNonPushableDisjunctions
157+
required_capability: kql_function
158+
required_capability: full_text_functions_disjunctions_compute_engine
159+
160+
from books
161+
| where kql("title:lord") or length(title) > 130
162+
| keep book_no
163+
;
164+
ignoreOrder: true
165+
166+
book_no:keyword
167+
2675
168+
2714
169+
4023
170+
7140
171+
8678
172+
;
173+
174+
testKqlWithNonPushableDisjunctionsOnComplexExpressions
175+
required_capability: kql_function
176+
required_capability: full_text_functions_disjunctions_compute_engine
177+
178+
from books
179+
| where (kql("title:lord") and ratings > 4.5) or (kql("author:dostoevsky") and length(title) > 50)
180+
| keep book_no
181+
;
182+
ignoreOrder: true
183+
184+
book_no:keyword
185+
2675
186+
2924
187+
4023
188+
1937
189+
7140
190+
2714
191+
;

x-pack/plugin/esql/qa/testFixtures/src/main/resources/match-function.csv-spec

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -718,3 +718,40 @@ from books
718718
title:text
719719
The Hobbit or There and Back Again
720720
;
721+
722+
testMatchWithNonPushableDisjunctions
723+
required_capability: match_function
724+
required_capability: full_text_functions_disjunctions_compute_engine
725+
726+
from books
727+
| where match(title, "lord") or length(title) > 130
728+
| keep book_no
729+
;
730+
ignoreOrder: true
731+
732+
book_no:keyword
733+
2675
734+
2714
735+
4023
736+
7140
737+
8678
738+
;
739+
740+
testMatchWithNonPushableDisjunctionsOnComplexExpressions
741+
required_capability: match_function
742+
required_capability: full_text_functions_disjunctions_compute_engine
743+
744+
from books
745+
| where (match(title, "lord") and ratings > 4.5) or (match(author, "dostoevsky") and length(title) > 50)
746+
| keep book_no
747+
;
748+
ignoreOrder: true
749+
750+
book_no:keyword
751+
2675
752+
2924
753+
4023
754+
1937
755+
7140
756+
2714
757+
;

x-pack/plugin/esql/qa/testFixtures/src/main/resources/match-operator.csv-spec

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -684,3 +684,40 @@ from semantic_text
684684
host:keyword | semantic_text_field:text
685685
"host1" | live long and prosper
686686
;
687+
688+
testMatchWithNonPushableDisjunctions
689+
required_capability: match_operator_colon
690+
required_capability: full_text_functions_disjunctions_compute_engine
691+
692+
from books
693+
| where title:"lord" or length(title) > 130
694+
| keep book_no
695+
;
696+
ignoreOrder: true
697+
698+
book_no:keyword
699+
2675
700+
2714
701+
4023
702+
7140
703+
8678
704+
;
705+
706+
testMatchWithNonPushableDisjunctionsOnComplexExpressions
707+
required_capability: match_operator_colon
708+
required_capability: full_text_functions_disjunctions_compute_engine
709+
710+
from books
711+
| where (title:"lord" and ratings > 4.5) or (author:"dostoevsky" and length(title) > 50)
712+
| keep book_no
713+
;
714+
ignoreOrder: true
715+
716+
book_no:keyword
717+
2675
718+
2924
719+
4023
720+
1937
721+
7140
722+
2714
723+
;

x-pack/plugin/esql/qa/testFixtures/src/main/resources/qstr-function.csv-spec

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -152,3 +152,40 @@ emp_no:integer | first_name:keyword | last_name:keyword
152152
10053 | Sanjiv | Zschoche
153153
10069 | Margareta | Bierman
154154
;
155+
156+
testQstrWithNonPushableDisjunctions
157+
required_capability: qstr_function
158+
required_capability: full_text_functions_disjunctions_compute_engine
159+
160+
from books
161+
| where qstr("title:lord") or length(title) > 130
162+
| keep book_no
163+
;
164+
ignoreOrder: true
165+
166+
book_no:keyword
167+
2675
168+
2714
169+
4023
170+
7140
171+
8678
172+
;
173+
174+
testQstrWithNonPushableDisjunctionsOnComplexExpressions
175+
required_capability: qstr_function
176+
required_capability: full_text_functions_disjunctions_compute_engine
177+
178+
from books
179+
| where (qstr("title:lord") and ratings > 4.5) or (qstr("author:dostoevsky") and length(title) > 50)
180+
| keep book_no
181+
;
182+
ignoreOrder: true
183+
184+
book_no:keyword
185+
2675
186+
2924
187+
4023
188+
1937
189+
7140
190+
2714
191+
;

0 commit comments

Comments
 (0)