Skip to content

Commit 2688d86

Browse files
committed
Add clarifications to semantic text documentation
1 parent 4880245 commit 2688d86

File tree

3 files changed

+129
-49
lines changed

3 files changed

+129
-49
lines changed

docs/reference/elasticsearch/mapping-reference/semantic-text.md

Lines changed: 15 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,11 @@ service.
2828

2929
Using `semantic_text`, you won’t need to specify how to generate embeddings for
3030
your data, or how to index it. The {{infer}} endpoint automatically determines
31-
the embedding generation, indexing, and query to use.
31+
the embedding generation, indexing, and query to use. Newly created
32+
`semantic_text`
33+
indices created in {{es}} with dense embeddings will be
34+
[quantized](/reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-quantization)
35+
to `bbq_hnsw` automatically.
3236

3337
If you use the preconfigured `.elser-2-elasticsearch` endpoint, you can set up
3438
`semantic_text` with the following API request:
@@ -246,10 +250,15 @@ is not supported for querying the field data.
246250

247251
## Updates to `semantic_text` fields [update-script]
248252

249-
For indices containing `semantic_text` fields, updates that use scripts have the following behavior:
253+
For indices containing `semantic_text` fields, updates that use scripts have the
254+
following behavior:
250255

251-
* Are supported through the [Update API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-update).
252-
* Are not supported through the [Bulk API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-bulk-1) and will fail. Even if the script targets non-`semantic_text` fields, the update will fail when the index contains a `semantic_text` field.
256+
* Are supported through
257+
the [Update API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-update).
258+
* Are not supported through
259+
the [Bulk API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-bulk-1)
260+
and will fail. Even if the script targets non-`semantic_text` fields, the
261+
update will fail when the index contains a `semantic_text` field.
253262

254263
## `copy_to` and multi-fields support [copy-to-support]
255264

@@ -311,4 +320,5 @@ PUT test-index
311320
of [nested fields](/reference/elasticsearch/mapping-reference/nested.md).
312321
* `semantic_text` fields can’t currently be set as part
313322
of [Dynamic templates](docs-content://manage-data/data-store/mapping/dynamic-templates.md).
314-
* `semantic_text` fields are not supported with Cross-Cluster Search (CCS) or Cross-Cluster Replication (CCR).
323+
* `semantic_text` fields are not supported with Cross-Cluster Search (CCS) or
324+
Cross-Cluster Replication (CCR).

docs/reference/query-languages/query-dsl/query-dsl-match-query.md

Lines changed: 113 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -6,12 +6,19 @@ mapped_pages:
66

77
# Match query [query-dsl-match-query]
88

9+
Returns documents that match a provided text, number, date or boolean value. The
10+
provided text is analyzed before matching.
911

10-
Returns documents that match a provided text, number, date or boolean value. The provided text is analyzed before matching.
12+
The `match` query is the standard query for performing a full-text search,
13+
including options for fuzzy matching.
1114

12-
The `match` query is the standard query for performing a full-text search, including options for fuzzy matching.
13-
14-
`Match` will also work against [semantic_text](/reference/elasticsearch/mapping-reference/semantic-text.md) fields, however when performing `match` queries against `semantic_text` fields options that specifically target lexical search such as `fuzziness` or `analyzer` will be ignored.
15+
`Match` will also work
16+
against [semantic_text](/reference/elasticsearch/mapping-reference/semantic-text.md)
17+
fields. As `semantic_text` does not support lexical text search,
18+
`match` queries against `semantic_text` fields will automatically perform the
19+
correct semantic search.
20+
Because of this, options that specifically target lexical search such as
21+
`fuzziness` or `analyzer` will be ignored.
1522

1623
## Example request [match-query-ex-request]
1724

@@ -28,89 +35,127 @@ GET /_search
2835
}
2936
```
3037

31-
3238
## Top-level parameters for `match` [match-top-level-params]
3339

3440
`<field>`
3541
: (Required, object) Field you wish to search.
3642

37-
3843
## Parameters for `<field>` [match-field-params]
3944

4045
`query`
41-
: (Required) Text, number, boolean value or date you wish to find in the provided `<field>`.
46+
: (Required) Text, number, boolean value or date you wish to find in the
47+
provided `<field>`.
4248

43-
The `match` query [analyzes](docs-content://manage-data/data-store/text-analysis.md) any provided text before performing a search. This means the `match` query can search [`text`](/reference/elasticsearch/mapping-reference/text.md) fields for analyzed tokens rather than an exact term.
49+
The `match`
50+
query [analyzes](docs-content://manage-data/data-store/text-analysis.md) any
51+
provided text before performing a search. This means the `match` query can
52+
search [`text`](/reference/elasticsearch/mapping-reference/text.md) fields for
53+
analyzed tokens rather than an exact term.
4454

4555

4656
`analyzer`
47-
: (Optional, string) [Analyzer](docs-content://manage-data/data-store/text-analysis.md) used to convert the text in the `query` value into tokens. Defaults to the [index-time analyzer](docs-content://manage-data/data-store/text-analysis/specify-an-analyzer.md#specify-index-time-analyzer) mapped for the `<field>`. If no analyzer is mapped, the index’s default analyzer is used.
57+
: (Optional,
58+
string) [Analyzer](docs-content://manage-data/data-store/text-analysis.md) used
59+
to convert the text in the `query` value into tokens. Defaults to
60+
the [index-time analyzer](docs-content://manage-data/data-store/text-analysis/specify-an-analyzer.md#specify-index-time-analyzer)
61+
mapped for the `<field>`. If no analyzer is mapped, the index’s default analyzer
62+
is used.
4863

4964
`auto_generate_synonyms_phrase_query`
50-
: (Optional, Boolean) If `true`, [match phrase](/reference/query-languages/query-dsl/query-dsl-match-query-phrase.md) queries are automatically created for multi-term synonyms. Defaults to `true`.
65+
: (Optional, Boolean) If
66+
`true`, [match phrase](/reference/query-languages/query-dsl/query-dsl-match-query-phrase.md)
67+
queries are automatically created for multi-term synonyms. Defaults to `true`.
5168

52-
See [Use synonyms with match query](#query-dsl-match-query-synonyms) for an example.
69+
See [Use synonyms with match query](#query-dsl-match-query-synonyms) for an
70+
example.
5371

5472

5573
`boost`
56-
: (Optional, float) Floating point number used to decrease or increase the [relevance scores](/reference/query-languages/query-dsl/query-filter-context.md#relevance-scores) of the query. Defaults to `1.0`.
74+
: (Optional, float) Floating point number used to decrease or increase
75+
the [relevance scores](/reference/query-languages/query-dsl/query-filter-context.md#relevance-scores)
76+
of the query. Defaults to `1.0`.
5777

58-
Boost values are relative to the default value of `1.0`. A boost value between `0` and `1.0` decreases the relevance score. A value greater than `1.0` increases the relevance score.
78+
Boost values are relative to the default value of `1.0`. A boost value between
79+
`0` and `1.0` decreases the relevance score. A value greater than `1.0`
80+
increases the relevance score.
5981

6082

6183
`fuzziness`
62-
: (Optional, string) Maximum edit distance allowed for matching. See [Fuzziness](/reference/elasticsearch/rest-apis/common-options.md#fuzziness) for valid values and more information. See [Fuzziness in the match query](#query-dsl-match-query-fuzziness) for an example.
84+
: (Optional, string) Maximum edit distance allowed for matching.
85+
See [Fuzziness](/reference/elasticsearch/rest-apis/common-options.md#fuzziness)
86+
for valid values and more information.
87+
See [Fuzziness in the match query](#query-dsl-match-query-fuzziness) for an
88+
example.
6389

6490
`max_expansions`
65-
: (Optional, integer) Maximum number of terms to which the query will expand. Defaults to `50`.
91+
: (Optional, integer) Maximum number of terms to which the query will expand.
92+
Defaults to `50`.
6693

6794
`prefix_length`
68-
: (Optional, integer) Number of beginning characters left unchanged for fuzzy matching. Defaults to `0`.
95+
: (Optional, integer) Number of beginning characters left unchanged for fuzzy
96+
matching. Defaults to `0`.
6997

7098
`fuzzy_transpositions`
71-
: (Optional, Boolean) If `true`, edits for fuzzy matching include transpositions of two adjacent characters (ab → ba). Defaults to `true`.
99+
: (Optional, Boolean) If `true`, edits for fuzzy matching include
100+
transpositions of two adjacent characters (ab → ba). Defaults to `true`.
72101

73102
`fuzzy_rewrite`
74-
: (Optional, string) Method used to rewrite the query. See the [`rewrite` parameter](/reference/query-languages/query-dsl/query-dsl-multi-term-rewrite.md) for valid values and more information.
103+
: (Optional, string) Method used to rewrite the query. See the [
104+
`rewrite` parameter](/reference/query-languages/query-dsl/query-dsl-multi-term-rewrite.md)
105+
for valid values and more information.
75106

76-
If the `fuzziness` parameter is not `0`, the `match` query uses a `fuzzy_rewrite` method of `top_terms_blended_freqs_${max_expansions}` by default.
107+
If the `fuzziness` parameter is not `0`, the `match` query uses a
108+
`fuzzy_rewrite` method of `top_terms_blended_freqs_${max_expansions}` by
109+
default.
77110

78111

79112
`lenient`
80-
: (Optional, Boolean) If `true`, format-based errors, such as providing a text `query` value for a [numeric](/reference/elasticsearch/mapping-reference/number.md) field, are ignored. Defaults to `false`.
113+
: (Optional, Boolean) If `true`, format-based errors, such as providing a text
114+
`query` value for
115+
a [numeric](/reference/elasticsearch/mapping-reference/number.md) field, are
116+
ignored. Defaults to `false`.
81117

82118
`operator`
83-
: (Optional, string) Boolean logic used to interpret text in the `query` value. Valid values are:
119+
: (Optional, string) Boolean logic used to interpret text in the `query`
120+
value. Valid values are:
84121

85122
`OR` (Default)
86-
: For example, a `query` value of `capital of Hungary` is interpreted as `capital OR of OR Hungary`.
123+
: For example, a `query` value of `capital of Hungary` is interpreted as
124+
`capital OR of OR Hungary`.
87125

88126
`AND`
89-
: For example, a `query` value of `capital of Hungary` is interpreted as `capital AND of AND Hungary`.
127+
: For example, a `query` value of `capital of Hungary` is interpreted as
128+
`capital AND of AND Hungary`.
90129

91130

92131
`minimum_should_match`
93-
: (Optional, string) Minimum number of clauses that must match for a document to be returned. See the [`minimum_should_match` parameter](/reference/query-languages/query-dsl/query-dsl-minimum-should-match.md) for valid values and more information.
132+
: (Optional, string) Minimum number of clauses that must match for a document
133+
to be returned. See the [
134+
`minimum_should_match` parameter](/reference/query-languages/query-dsl/query-dsl-minimum-should-match.md)
135+
for valid values and more information.
94136

95137

96138
`zero_terms_query`
97-
: (Optional, string) Indicates whether no documents are returned if the `analyzer` removes all tokens, such as when using a `stop` filter. Valid values are:
139+
: (Optional, string) Indicates whether no documents are returned if the
140+
`analyzer` removes all tokens, such as when using a `stop` filter. Valid values
141+
are:
98142

99143
`none` (Default)
100144
: No documents are returned if the `analyzer` removes all tokens.
101145

102146
`all`
103-
: Returns all documents, similar to a [`match_all`](/reference/query-languages/query-dsl/query-dsl-match-all-query.md) query.
147+
: Returns all documents, similar to a [
148+
`match_all`](/reference/query-languages/query-dsl/query-dsl-match-all-query.md)
149+
query.
104150

105151
See [Zero terms query](#query-dsl-match-query-zero) for an example.
106152

107-
108-
109153
## Notes [match-query-notes]
110154

111155
### Short request example [query-dsl-match-query-short-ex]
112156

113-
You can simplify the match query syntax by combining the `<field>` and `query` parameters. For example:
157+
You can simplify the match query syntax by combining the `<field>` and `query`
158+
parameters. For example:
114159

115160
```console
116161
GET /_search
@@ -123,10 +168,15 @@ GET /_search
123168
}
124169
```
125170

126-
127171
### How the match query works [query-dsl-match-query-boolean]
128172

129-
The `match` query is of type `boolean`. It means that the text provided is analyzed and the analysis process constructs a boolean query from the provided text. The `operator` parameter can be set to `or` or `and` to control the boolean clauses (defaults to `or`). The minimum number of optional `should` clauses to match can be set using the [`minimum_should_match`](/reference/query-languages/query-dsl/query-dsl-minimum-should-match.md) parameter.
173+
The `match` query is of type `boolean`. It means that the text provided is
174+
analyzed and the analysis process constructs a boolean query from the provided
175+
text. The `operator` parameter can be set to `or` or `and` to control the
176+
boolean clauses (defaults to `or`). The minimum number of optional `should`
177+
clauses to match can be set using the [
178+
`minimum_should_match`](/reference/query-languages/query-dsl/query-dsl-minimum-should-match.md)
179+
parameter.
130180

131181
Here is an example with the `operator` parameter:
132182

@@ -144,24 +194,37 @@ GET /_search
144194
}
145195
```
146196

147-
The `analyzer` can be set to control which analyzer will perform the analysis process on the text. It defaults to the field explicit mapping definition, or the default search analyzer.
148-
149-
The `lenient` parameter can be set to `true` to ignore exceptions caused by data-type mismatches, such as trying to query a numeric field with a text query string. Defaults to `false`.
197+
The `analyzer` can be set to control which analyzer will perform the analysis
198+
process on the text. It defaults to the field explicit mapping definition, or
199+
the default search analyzer.
150200

201+
The `lenient` parameter can be set to `true` to ignore exceptions caused by
202+
data-type mismatches, such as trying to query a numeric field with a text query
203+
string. Defaults to `false`.
151204

152205
### Fuzziness in the match query [query-dsl-match-query-fuzziness]
153206

154-
`fuzziness` allows *fuzzy matching* based on the type of field being queried. See [Fuzziness](/reference/elasticsearch/rest-apis/common-options.md#fuzziness) for allowed settings.
207+
`fuzziness` allows *fuzzy matching* based on the type of field being queried.
208+
See [Fuzziness](/reference/elasticsearch/rest-apis/common-options.md#fuzziness)
209+
for allowed settings.
155210

156-
The `prefix_length` and `max_expansions` can be set in this case to control the fuzzy process. If the fuzzy option is set the query will use `top_terms_blended_freqs_${max_expansions}` as its [rewrite method](/reference/query-languages/query-dsl/query-dsl-multi-term-rewrite.md) the `fuzzy_rewrite` parameter allows to control how the query will get rewritten.
211+
The `prefix_length` and `max_expansions` can be set in this case to control the
212+
fuzzy process. If the fuzzy option is set the query will use
213+
`top_terms_blended_freqs_${max_expansions}` as
214+
its [rewrite method](/reference/query-languages/query-dsl/query-dsl-multi-term-rewrite.md)
215+
the `fuzzy_rewrite` parameter allows to control how the query will get
216+
rewritten.
157217

158-
Fuzzy transpositions (`ab``ba`) are allowed by default but can be disabled by setting `fuzzy_transpositions` to `false`.
218+
Fuzzy transpositions (`ab``ba`) are allowed by default but can be disabled by
219+
setting `fuzzy_transpositions` to `false`.
159220

160221
::::{note}
161-
Fuzzy matching is not applied to terms with synonyms or in cases where the analysis process produces multiple tokens at the same position. Under the hood these terms are expanded to a special synonym query that blends term frequencies, which does not support fuzzy expansion.
222+
Fuzzy matching is not applied to terms with synonyms or in cases where the
223+
analysis process produces multiple tokens at the same position. Under the hood
224+
these terms are expanded to a special synonym query that blends term
225+
frequencies, which does not support fuzzy expansion.
162226
::::
163227

164-
165228
```console
166229
GET /_search
167230
{
@@ -176,10 +239,12 @@ GET /_search
176239
}
177240
```
178241

179-
180242
### Zero terms query [query-dsl-match-query-zero]
181243

182-
If the analyzer used removes all tokens in a query like a `stop` filter does, the default behavior is to match no documents at all. In order to change that the `zero_terms_query` option can be used, which accepts `none` (default) and `all` which corresponds to a `match_all` query.
244+
If the analyzer used removes all tokens in a query like a `stop` filter does,
245+
the default behavior is to match no documents at all. In order to change that
246+
the `zero_terms_query` option can be used, which accepts `none` (default) and
247+
`all` which corresponds to a `match_all` query.
183248

184249
```console
185250
GET /_search
@@ -196,10 +261,13 @@ GET /_search
196261
}
197262
```
198263

199-
200264
### Synonyms [query-dsl-match-query-synonyms]
201265

202-
The `match` query supports multi-terms synonym expansion with the [synonym_graph](/reference/text-analysis/analysis-synonym-graph-tokenfilter.md) token filter. When this filter is used, the parser creates a phrase query for each multi-terms synonyms. For example, the following synonym: `"ny, new york"` would produce:
266+
The `match` query supports multi-terms synonym expansion with
267+
the [synonym_graph](/reference/text-analysis/analysis-synonym-graph-tokenfilter.md)
268+
token filter. When this filter is used, the parser creates a phrase query for
269+
each multi-terms synonyms. For example, the following synonym: `"ny, new york"`
270+
would produce:
203271

204272
`(ny OR ("new york"))`
205273

@@ -223,7 +291,8 @@ The example above creates a boolean query:
223291

224292
`(ny OR (new AND york)) city`
225293

226-
that matches documents with the term `ny` or the conjunction `new AND york`. By default the parameter `auto_generate_synonyms_phrase_query` is set to `true`.
294+
that matches documents with the term `ny` or the conjunction `new AND york`. By
295+
default the parameter `auto_generate_synonyms_phrase_query` is set to `true`.
227296

228297

229298

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/fulltext/Match.java

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -142,6 +142,7 @@ public class Match extends FullTextFunction implements OptionalArgument, PostAna
142142
143143
Match can be used on fields from the text family like <<text, text>> and <<semantic-text, semantic_text>>,
144144
as well as other field types like keyword, boolean, dates, and numeric types.
145+
When Match is used on a <<semantic-text, semantic_text>> field, it will perform a semantic query on the field.
145146
146147
Match can use <<esql-function-named-params,function named parameters>> to specify additional options for the match query.
147148
All <<match-field-params,match query parameters>> are supported.

0 commit comments

Comments
 (0)