Skip to content

Commit cf74d28

Browse files
authored
[8.x] Docs: Update chunking_settings information for semantic_text field (elastic#126631)
* Update chunking_settings docs for 8.x * Remove redundancy
1 parent 262d215 commit cf74d28

File tree

1 file changed

+41
-21
lines changed

1 file changed

+41
-21
lines changed

docs/reference/mapping/types/semantic-text.asciidoc

Lines changed: 41 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
[role="xpack"]
22
[[semantic-text]]
33
=== Semantic text field type
4+
45
++++
56
<titleabbrev>Semantic text</titleabbrev>
67
++++
@@ -94,6 +95,35 @@ You can update this parameter by using the <<indices-put-mapping, Update mapping
9495
Use the <<put-inference-api>> to create the endpoint.
9596
If not specified, the {infer} endpoint defined by `inference_id` will be used at both index and query time.
9697

98+
`chunking_settings`::
99+
(Optional, object) Settings for chunking text into smaller passages.
100+
If specified, these will override the chunking settings set in the {infer-cap} endpoint associated with `inference_id`.
101+
If chunking settings are updated, they will not be applied to existing documents until they are reindexed.
102+
103+
.Valid values for `chunking_settings`
104+
[%collapsible%open]
105+
====
106+
`type`:::
107+
Indicates the type of chunking strategy to use.
108+
Valid values are `word` or `sentence`.
109+
Required.
110+
111+
`max_chunk_size`:::
112+
The maximum number of works in a chunk.
113+
Required.
114+
115+
`overlap`:::
116+
The number of overlapping words allowed in chunks.
117+
This cannot be defined as more than half of the `max_chunk_size`.
118+
Required for `word` type chunking settings.
119+
120+
`sentence_overlap`:::
121+
The number of overlapping words allowed in chunks.
122+
Valid values are `0` or `1`.
123+
Required for `sentence` type chunking settings.
124+
125+
====
126+
97127
[discrete]
98128
[[infer-endpoint-validation]]
99129
==== {infer-cap} endpoint validation
@@ -104,7 +134,6 @@ When the first document is indexed, the `inference_id` will be used to generate
104134
WARNING: Removing an {infer} endpoint will cause ingestion of documents and semantic queries to fail on indices that define `semantic_text` fields with that {infer} endpoint as their `inference_id`.
105135
Trying to <<delete-inference-api,delete an {infer} endpoint>> that is used on a `semantic_text` field will result in an error.
106136

107-
108137
[discrete]
109138
[[auto-text-chunking]]
110139
==== Text chunking
@@ -117,8 +146,7 @@ When querying, the individual passages will be automatically searched for each d
117146

118147
For more details on chunking and how to configure chunking settings, see <<infer-chunking-config, Configuring chunking>> in the Inference API documentation.
119148

120-
Refer to <<semantic-search-semantic-text,this tutorial>> to learn more about
121-
semantic search using `semantic_text` and the `semantic` query.
149+
Refer to <<semantic-search-semantic-text,this tutorial>> to learn more about semantic search using `semantic_text` and the `semantic` query.
122150

123151
[discrete]
124152
[[semantic-text-highlighting]]
@@ -147,11 +175,11 @@ POST test-index/_search
147175
------------------------------------------------------------
148176
// TEST[skip:Requires inference endpoint]
149177
<1> Specifies the maximum number of fragments to return.
150-
<2> Sorts highlighted fragments by score when set to `score`. By default, fragments will be output in the order they appear in the field (order: none).
178+
<2> Sorts highlighted fragments by score when set to `score`.
179+
By default, fragments will be output in the order they appear in the field (order: none).
151180

152181
Highlighting is supported on fields other than semantic_text.
153-
However, if you want to restrict highlighting to the semantic highlighter and return no fragments when the field is not of type semantic_text,
154-
you can explicitly enforce the `semantic` highlighter in the query:
182+
However, if you want to restrict highlighting to the semantic highlighter and return no fragments when the field is not of type semantic_text, you can explicitly enforce the `semantic` highlighter in the query:
155183

156184
[source,console]
157185
------------------------------------------------------------
@@ -180,21 +208,15 @@ PUT test-index
180208
[[custom-indexing]]
181209
==== Customizing `semantic_text` indexing
182210

183-
`semantic_text` uses defaults for indexing data based on the {infer} endpoint
184-
specified. It enables you to quickstart your semantic search by providing
185-
automatic {infer} and a dedicated query so you don't need to provide further
186-
details.
211+
`semantic_text` uses defaults for indexing data based on the {infer} endpoint specified.
212+
It enables you to quickstart your semantic search by providing automatic {infer} and a dedicated query so you don't need to provide further details.
187213

188214
In case you want to customize data indexing, use the
189-
<<sparse-vector,`sparse_vector`>> or <<dense-vector,`dense_vector`>> field
190-
types and create an ingest pipeline with an
215+
<<sparse-vector,`sparse_vector`>> or <<dense-vector,`dense_vector`>> field types and create an ingest pipeline with an
191216
<<inference-processor, {infer} processor>> to generate the embeddings.
192-
<<semantic-search-inference,This tutorial>> walks you through the process. In
193-
these cases - when you use `sparse_vector` or `dense_vector` field types instead
194-
of the `semantic_text` field type to customize indexing - using the
195-
<<query-dsl-semantic-query,`semantic_query`>> is not supported for querying the
196-
field data.
197-
217+
<<semantic-search-inference,This tutorial>> walks you through the process.
218+
In these cases - when you use `sparse_vector` or `dense_vector` field types instead of the `semantic_text` field type to customize indexing - using the
219+
<<query-dsl-semantic-query,`semantic_query`>> is not supported for querying the field data.
198220

199221
[discrete]
200222
[[update-script]]
@@ -203,13 +225,11 @@ field data.
203225
Updates that use scripts are not supported for an index contains a `semantic_text` field.
204226
Even if the script targets non-`semantic_text` fields, the update will fail when the index contains a `semantic_text` field.
205227

206-
207228
[discrete]
208229
[[copy-to-support]]
209230
==== `copy_to` and multi-fields support
210231

211-
The semantic_text field type can serve as the target of <<copy-to,copy_to fields>>,
212-
be part of a <<multi-fields,multi-field>> structure, or contain <<multi-fields,multi-fields>> internally.
232+
The semantic_text field type can serve as the target of <<copy-to,copy_to fields>>, be part of a <<multi-fields,multi-field>> structure, or contain <<multi-fields,multi-fields>> internally.
213233
This means you can use a single field to collect the values of other fields for semantic search.
214234

215235
For example, the following mapping:

0 commit comments

Comments
 (0)