[DOCS] Documents configurable chunking (#115300)

szabosteve · davidkyle · szabosteve · commit dbf4c5d926e4 · 2024-10-25T15:37:05.000Z
Co-authored-by: David Kyle &lt;david.kyle@elastic.co&gt;
diff --git a/docs/reference/inference/inference-apis.asciidoc b/docs/reference/inference/inference-apis.asciidoc
@@ -35,7 +35,6 @@ Elastic –, then create an {infer} endpoint by the <<put-inference-api>>.
 Now use <<semantic-search-semantic-text, semantic text>> to perform
 <<semantic-search, semantic search>> on your data.
 
-
 [discrete]
 [[default-enpoints]]
 === Default {infer} endpoints
@@ -53,6 +52,67 @@ For these models, the minimum number of allocations is `0`.
 If there is no {infer} activity that uses the endpoint, the number of allocations will scale down to `0` automatically after 15 minutes.
 
 
+[discrete]
+[[infer-chunking-config]]
+=== Configuring chunking
+
+{infer-cap} endpoints have a limit on the amount of text they can process at once, determined by the model's input capacity.
+Chunking is the process of splitting the input text into pieces that remain within these limits.
+It occurs when ingesting documents into <<semantic-text,`semantic_text` fields>>.
+Chunking also helps produce sections that are digestible for humans.
+Returning a long document in search results is less useful than providing the most relevant chunk of text.
+
+Each chunk will include the text subpassage and the corresponding embedding generated from it.
+
+By default, documents are split into sentences and grouped in sections up to 250 words with 1 sentence overlap so that each chunk shares a sentence with the previous chunk.
+Overlapping ensures continuity and prevents vital contextual information in the input text from being lost by a hard break. 
+
+{es} uses the https://unicode-org.github.io/icu-docs/[ICU4J] library to detect word and sentence boundaries for chunking.
+https://unicode-org.github.io/icu/userguide/boundaryanalysis/#word-boundary[Word boundaries] are identified by following a series of rules, not just the presence of a whitespace character.
+For written languages that do use whitespace such as Chinese or Japanese dictionary lookups are used to detect word boundaries.
+
+
+[discrete]
+==== Chunking strategies
+
+Two strategies are available for chunking: `sentence` and `word`.
+
+The `sentence` strategy splits the input text at sentence boundaries.
+Each chunk contains one or more complete sentences ensuring that the integrity of sentence-level context is preserved, except when a sentence causes a chunk to exceed a word count of `max_chunk_size`, in which case it will be split across chunks.
+The `sentence_overlap` option defines the number of sentences from the previous chunk to include in the current chunk which is either `0` or `1`.
+
+The `word` strategy splits the input text on individual words up to the `max_chunk_size` limit.
+The `overlap` option is the number of words from the previous chunk to include in the current chunk.
+
+The default chunking strategy is `sentence`.
+
+NOTE: The default chunking strategy for {infer} endpoints created before 8.16 is `word`.
+
+
+[discrete]
+==== Example of configuring the chunking behavior
+
+The following example creates an {infer} endpoint with the `elasticsearch` service that deploys the ELSER model by default and configures the chunking behavior.
+
+[source,console]
+------------------------------------------------------------
+PUT _inference/sparse_embedding/small_chunk_size
+{
+  "service": "elasticsearch",
+  "service_settings": {
+    "num_allocations": 1,
+    "num_threads": 1
+  },
+  "chunking_settings": {
+    "strategy": "sentence",
+    "max_chunk_size": 100,
+    "sentence_overlap": 0
+  }
+}
+------------------------------------------------------------
+// TEST[skip:TBD]
+
+
 include::delete-inference.asciidoc[]
 include::get-inference.asciidoc[]
 include::post-inference.asciidoc[]
diff --git a/docs/reference/inference/inference-shared.asciidoc b/docs/reference/inference/inference-shared.asciidoc
@@ -31,4 +31,36 @@ end::task-settings[]
 
 tag::task-type[]
 The type of the {infer} task that the model will perform.
-end::task-type[]
+end::task-type[]
+
+tag::chunking-settings[]
+Chunking configuration object.
+Refer to <<infer-chunking-config>> to learn more about chunking.
+end::chunking-settings[]
+
+tag::chunking-settings-max-chunking-size[]
+Specifies the maximum size of a chunk in words.
+Defaults to `250`.
+This value cannot be higher than `300` or lower than `20` (for `sentence` strategy) or `10` (for `word` strategy). 
+end::chunking-settings-max-chunking-size[]
+
+tag::chunking-settings-overlap[]
+Only for `word` chunking strategy.
+Specifies the number of overlapping words for chunks.
+Defaults to `100`.
+This value cannot be higher than the half of `max_chunking_size`.
+end::chunking-settings-overlap[]
+
+tag::chunking-settings-sentence-overlap[]
+Only for `sentence` chunking strategy.
+Specifies the numnber of overlapping sentences for chunks.
+It can be either `1` or `0`.
+Defaults to `1`.
+end::chunking-settings-sentence-overlap[]
+
+tag::chunking-settings-strategy[]
+Specifies the chunking strategy.
+It could be either `sentence` or `word`.
+end::chunking-settings-strategy[]
+
+
diff --git a/docs/reference/inference/service-alibabacloud-ai-search.asciidoc b/docs/reference/inference/service-alibabacloud-ai-search.asciidoc
@@ -34,6 +34,26 @@ Available task types:
 [[infer-service-alibabacloud-ai-search-api-request-body]]
 ==== {api-request-body-title}
 
+`chunking_settings`::
+(Optional, object)
+include::inference-shared.asciidoc[tag=chunking-settings]
+
+`max_chunking_size`:::
+(Optional, integer)
+include::inference-shared.asciidoc[tag=chunking-settings-max-chunking-size]
+
+`overlap`:::
+(Optional, integer)
+include::inference-shared.asciidoc[tag=chunking-settings-overlap]
+
+`sentence_overlap`:::
+(Optional, integer)
+include::inference-shared.asciidoc[tag=chunking-settings-sentence-overlap]
+
+`strategy`:::
+(Optional, string)
+include::inference-shared.asciidoc[tag=chunking-settings-strategy]
+
 `service`::
 (Required, string) The type of service supported for the specified task type.
 In this case,
@@ -108,7 +128,6 @@ To modify this, set the `requests_per_minute` setting of this object in your ser
 include::inference-shared.asciidoc[tag=request-per-minute-example]
 --
 
-
 `task_settings`::
 (Optional, object)
 include::inference-shared.asciidoc[tag=task-settings]
diff --git a/docs/reference/inference/service-amazon-bedrock.asciidoc b/docs/reference/inference/service-amazon-bedrock.asciidoc
@@ -32,6 +32,26 @@ Available task types:
 [[infer-service-amazon-bedrock-api-request-body]]
 ==== {api-request-body-title}
 
+`chunking_settings`::
+(Optional, object)
+include::inference-shared.asciidoc[tag=chunking-settings]
+
+`max_chunking_size`:::
+(Optional, integer)
+include::inference-shared.asciidoc[tag=chunking-settings-max-chunking-size]
+
+`overlap`:::
+(Optional, integer)
+include::inference-shared.asciidoc[tag=chunking-settings-overlap]
+
+`sentence_overlap`:::
+(Optional, integer)
+include::inference-shared.asciidoc[tag=chunking-settings-sentence-overlap]
+
+`strategy`:::
+(Optional, string)
+include::inference-shared.asciidoc[tag=chunking-settings-strategy]
+
 `service`::
 (Required, string) The type of service supported for the specified task type.
 In this case,
diff --git a/docs/reference/inference/service-anthropic.asciidoc b/docs/reference/inference/service-anthropic.asciidoc
@@ -32,6 +32,26 @@ Available task types:
 [[infer-service-anthropic-api-request-body]]
 ==== {api-request-body-title}
 
+`chunking_settings`::
+(Optional, object)
+include::inference-shared.asciidoc[tag=chunking-settings]
+
+`max_chunking_size`:::
+(Optional, integer)
+include::inference-shared.asciidoc[tag=chunking-settings-max-chunking-size]
+
+`overlap`:::
+(Optional, integer)
+include::inference-shared.asciidoc[tag=chunking-settings-overlap]
+
+`sentence_overlap`:::
+(Optional, integer)
+include::inference-shared.asciidoc[tag=chunking-settings-sentence-overlap]
+
+`strategy`:::
+(Optional, string)
+include::inference-shared.asciidoc[tag=chunking-settings-strategy]
+
 `service`::
 (Required, string)
 The type of service supported for the specified task type. In this case,
diff --git a/docs/reference/inference/service-azure-ai-studio.asciidoc b/docs/reference/inference/service-azure-ai-studio.asciidoc
@@ -33,6 +33,26 @@ Available task types:
 [[infer-service-azure-ai-studio-api-request-body]]
 ==== {api-request-body-title}
 
+`chunking_settings`::
+(Optional, object)
+include::inference-shared.asciidoc[tag=chunking-settings]
+
+`max_chunking_size`:::
+(Optional, integer)
+include::inference-shared.asciidoc[tag=chunking-settings-max-chunking-size]
+
+`overlap`:::
+(Optional, integer)
+include::inference-shared.asciidoc[tag=chunking-settings-overlap]
+
+`sentence_overlap`:::
+(Optional, integer)
+include::inference-shared.asciidoc[tag=chunking-settings-sentence-overlap]
+
+`strategy`:::
+(Optional, string)
+include::inference-shared.asciidoc[tag=chunking-settings-strategy]
+
 `service`::
 (Required, string)
 The type of service supported for the specified task type. In this case,
diff --git a/docs/reference/inference/service-azure-openai.asciidoc b/docs/reference/inference/service-azure-openai.asciidoc
@@ -33,6 +33,26 @@ Available task types:
 [[infer-service-azure-openai-api-request-body]]
 ==== {api-request-body-title}
 
+`chunking_settings`::
+(Optional, object)
+include::inference-shared.asciidoc[tag=chunking-settings]
+
+`max_chunking_size`:::
+(Optional, integer)
+include::inference-shared.asciidoc[tag=chunking-settings-max-chunking-size]
+
+`overlap`:::
+(Optional, integer)
+include::inference-shared.asciidoc[tag=chunking-settings-overlap]
+
+`sentence_overlap`:::
+(Optional, integer)
+include::inference-shared.asciidoc[tag=chunking-settings-sentence-overlap]
+
+`strategy`:::
+(Optional, string)
+include::inference-shared.asciidoc[tag=chunking-settings-strategy]
+
 `service`::
 (Required, string)
 The type of service supported for the specified task type. In this case, 
diff --git a/docs/reference/inference/service-cohere.asciidoc b/docs/reference/inference/service-cohere.asciidoc
@@ -34,6 +34,26 @@ Available task types:
 [[infer-service-cohere-api-request-body]]
 ==== {api-request-body-title}
 
+`chunking_settings`::
+(Optional, object)
+include::inference-shared.asciidoc[tag=chunking-settings]
+
+`max_chunking_size`:::
+(Optional, integer)
+include::inference-shared.asciidoc[tag=chunking-settings-max-chunking-size]
+
+`overlap`:::
+(Optional, integer)
+include::inference-shared.asciidoc[tag=chunking-settings-overlap]
+
+`sentence_overlap`:::
+(Optional, integer)
+include::inference-shared.asciidoc[tag=chunking-settings-sentence-overlap]
+
+`strategy`:::
+(Optional, string)
+include::inference-shared.asciidoc[tag=chunking-settings-strategy]
+
 `service`::
 (Required, string)
 The type of service supported for the specified task type. In this case, 
diff --git a/docs/reference/inference/service-elasticsearch.asciidoc b/docs/reference/inference/service-elasticsearch.asciidoc
@@ -36,6 +36,26 @@ Available task types:
 [[infer-service-elasticsearch-api-request-body]]
 ==== {api-request-body-title}
 
+`chunking_settings`::
+(Optional, object)
+include::inference-shared.asciidoc[tag=chunking-settings]
+
+`max_chunking_size`:::
+(Optional, integer)
+include::inference-shared.asciidoc[tag=chunking-settings-max-chunking-size]
+
+`overlap`:::
+(Optional, integer)
+include::inference-shared.asciidoc[tag=chunking-settings-overlap]
+
+`sentence_overlap`:::
+(Optional, integer)
+include::inference-shared.asciidoc[tag=chunking-settings-sentence-overlap]
+
+`strategy`:::
+(Optional, string)
+include::inference-shared.asciidoc[tag=chunking-settings-strategy]
+
 `service`::
 (Required, string)
 The type of service supported for the specified task type. In this case,
diff --git a/docs/reference/inference/service-elser.asciidoc b/docs/reference/inference/service-elser.asciidoc
@@ -36,6 +36,26 @@ Available task types:
 [[infer-service-elser-api-request-body]]
 ==== {api-request-body-title}
 
+`chunking_settings`::
+(Optional, object)
+include::inference-shared.asciidoc[tag=chunking-settings]
+
+`max_chunking_size`:::
+(Optional, integer)
+include::inference-shared.asciidoc[tag=chunking-settings-max-chunking-size]
+
+`overlap`:::
+(Optional, integer)
+include::inference-shared.asciidoc[tag=chunking-settings-overlap]
+
+`sentence_overlap`:::
+(Optional, integer)
+include::inference-shared.asciidoc[tag=chunking-settings-sentence-overlap]
+
+`strategy`:::
+(Optional, string)
+include::inference-shared.asciidoc[tag=chunking-settings-strategy]
+
 `service`::
 (Required, string)
 The type of service supported for the specified task type. In this case,
diff --git a/docs/reference/inference/service-google-ai-studio.asciidoc b/docs/reference/inference/service-google-ai-studio.asciidoc
@@ -33,6 +33,26 @@ Available task types:
 [[infer-service-google-ai-studio-api-request-body]]
 ==== {api-request-body-title}
 
+`chunking_settings`::
+(Optional, object)
+include::inference-shared.asciidoc[tag=chunking-settings]
+
+`max_chunking_size`:::
+(Optional, integer)
+include::inference-shared.asciidoc[tag=chunking-settings-max-chunking-size]
+
+`overlap`:::
+(Optional, integer)
+include::inference-shared.asciidoc[tag=chunking-settings-overlap]
+
+`sentence_overlap`:::
+(Optional, integer)
+include::inference-shared.asciidoc[tag=chunking-settings-sentence-overlap]
+
+`strategy`:::
+(Optional, string)
+include::inference-shared.asciidoc[tag=chunking-settings-strategy]
+
 `service`::
 (Required, string)
 The type of service supported for the specified task type. In this case, 
diff --git a/docs/reference/inference/service-google-vertex-ai.asciidoc b/docs/reference/inference/service-google-vertex-ai.asciidoc
@@ -33,6 +33,26 @@ Available task types:
 [[infer-service-google-vertex-ai-api-request-body]]
 ==== {api-request-body-title}
 
+`chunking_settings`::
+(Optional, object)
+include::inference-shared.asciidoc[tag=chunking-settings]
+
+`max_chunking_size`:::
+(Optional, integer)
+include::inference-shared.asciidoc[tag=chunking-settings-max-chunking-size]
+
+`overlap`:::
+(Optional, integer)
+include::inference-shared.asciidoc[tag=chunking-settings-overlap]
+
+`sentence_overlap`:::
+(Optional, integer)
+include::inference-shared.asciidoc[tag=chunking-settings-sentence-overlap]
+
+`strategy`:::
+(Optional, string)
+include::inference-shared.asciidoc[tag=chunking-settings-strategy]
+
 `service`::
 (Required, string)
 The type of service supported for the specified task type. In this case,
diff --git a/docs/reference/inference/service-hugging-face.asciidoc b/docs/reference/inference/service-hugging-face.asciidoc
diff --git a/docs/reference/inference/service-mistral.asciidoc b/docs/reference/inference/service-mistral.asciidoc
diff --git a/docs/reference/inference/service-openai.asciidoc b/docs/reference/inference/service-openai.asciidoc