You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/reference/mapping/types/semantic-text.asciidoc
+41-21Lines changed: 41 additions & 21 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,7 @@
1
1
[role="xpack"]
2
2
[[semantic-text]]
3
3
=== Semantic text field type
4
+
4
5
++++
5
6
<titleabbrev>Semantic text</titleabbrev>
6
7
++++
@@ -94,6 +95,35 @@ You can update this parameter by using the <<indices-put-mapping, Update mapping
94
95
Use the <<put-inference-api>> to create the endpoint.
95
96
If not specified, the {infer} endpoint defined by `inference_id` will be used at both index and query time.
96
97
98
+
`chunking_settings`::
99
+
(Optional, object) Settings for chunking text into smaller passages.
100
+
If specified, these will override the chunking settings set in the {infer-cap} endpoint associated with `inference_id`.
101
+
If chunking settings are updated, they will not be applied to existing documents until they are reindexed.
102
+
103
+
.Valid values for `chunking_settings`
104
+
[%collapsible%open]
105
+
====
106
+
`type`:::
107
+
Indicates the type of chunking strategy to use.
108
+
Valid values are `word` or `sentence`.
109
+
Required.
110
+
111
+
`max_chunk_size`:::
112
+
The maximum number of works in a chunk.
113
+
Required.
114
+
115
+
`overlap`:::
116
+
The number of overlapping words allowed in chunks.
117
+
This cannot be defined as more than half of the `max_chunk_size`.
118
+
Required for `word` type chunking settings.
119
+
120
+
`sentence_overlap`:::
121
+
The number of overlapping words allowed in chunks.
122
+
Valid values are `0` or `1`.
123
+
Required for `sentence` type chunking settings.
124
+
125
+
====
126
+
97
127
[discrete]
98
128
[[infer-endpoint-validation]]
99
129
==== {infer-cap} endpoint validation
@@ -104,7 +134,6 @@ When the first document is indexed, the `inference_id` will be used to generate
104
134
WARNING: Removing an {infer} endpoint will cause ingestion of documents and semantic queries to fail on indices that define `semantic_text` fields with that {infer} endpoint as their `inference_id`.
105
135
Trying to <<delete-inference-api,delete an {infer} endpoint>> that is used on a `semantic_text` field will result in an error.
106
136
107
-
108
137
[discrete]
109
138
[[auto-text-chunking]]
110
139
==== Text chunking
@@ -117,8 +146,7 @@ When querying, the individual passages will be automatically searched for each d
117
146
118
147
For more details on chunking and how to configure chunking settings, see <<infer-chunking-config, Configuring chunking>> in the Inference API documentation.
119
148
120
-
Refer to <<semantic-search-semantic-text,this tutorial>> to learn more about
121
-
semantic search using `semantic_text` and the `semantic` query.
149
+
Refer to <<semantic-search-semantic-text,this tutorial>> to learn more about semantic search using `semantic_text` and the `semantic` query.
<1> Specifies the maximum number of fragments to return.
150
-
<2> Sorts highlighted fragments by score when set to `score`. By default, fragments will be output in the order they appear in the field (order: none).
178
+
<2> Sorts highlighted fragments by score when set to `score`.
179
+
By default, fragments will be output in the order they appear in the field (order: none).
151
180
152
181
Highlighting is supported on fields other than semantic_text.
153
-
However, if you want to restrict highlighting to the semantic highlighter and return no fragments when the field is not of type semantic_text,
154
-
you can explicitly enforce the `semantic` highlighter in the query:
182
+
However, if you want to restrict highlighting to the semantic highlighter and return no fragments when the field is not of type semantic_text, you can explicitly enforce the `semantic` highlighter in the query:
`semantic_text` uses defaults for indexing data based on the {infer} endpoint
184
-
specified. It enables you to quickstart your semantic search by providing
185
-
automatic {infer} and a dedicated query so you don't need to provide further
186
-
details.
211
+
`semantic_text` uses defaults for indexing data based on the {infer} endpoint specified.
212
+
It enables you to quickstart your semantic search by providing automatic {infer} and a dedicated query so you don't need to provide further details.
187
213
188
214
In case you want to customize data indexing, use the
189
-
<<sparse-vector,`sparse_vector`>> or <<dense-vector,`dense_vector`>> field
190
-
types and create an ingest pipeline with an
215
+
<<sparse-vector,`sparse_vector`>> or <<dense-vector,`dense_vector`>> field types and create an ingest pipeline with an
191
216
<<inference-processor, {infer} processor>> to generate the embeddings.
192
-
<<semantic-search-inference,This tutorial>> walks you through the process. In
193
-
these cases - when you use `sparse_vector` or `dense_vector` field types instead
194
-
of the `semantic_text` field type to customize indexing - using the
195
-
<<query-dsl-semantic-query,`semantic_query`>> is not supported for querying the
196
-
field data.
197
-
217
+
<<semantic-search-inference,This tutorial>> walks you through the process.
218
+
In these cases - when you use `sparse_vector` or `dense_vector` field types instead of the `semantic_text` field type to customize indexing - using the
219
+
<<query-dsl-semantic-query,`semantic_query`>> is not supported for querying the field data.
198
220
199
221
[discrete]
200
222
[[update-script]]
@@ -203,13 +225,11 @@ field data.
203
225
Updates that use scripts are not supported for an index contains a `semantic_text` field.
204
226
Even if the script targets non-`semantic_text` fields, the update will fail when the index contains a `semantic_text` field.
205
227
206
-
207
228
[discrete]
208
229
[[copy-to-support]]
209
230
==== `copy_to` and multi-fields support
210
231
211
-
The semantic_text field type can serve as the target of <<copy-to,copy_to fields>>,
212
-
be part of a <<multi-fields,multi-field>> structure, or contain <<multi-fields,multi-fields>> internally.
232
+
The semantic_text field type can serve as the target of <<copy-to,copy_to fields>>, be part of a <<multi-fields,multi-field>> structure, or contain <<multi-fields,multi-fields>> internally.
213
233
This means you can use a single field to collect the values of other fields for semantic search.
0 commit comments