diff --git a/explore-analyze/elastic-inference/inference-api.md b/explore-analyze/elastic-inference/inference-api.md index 0ae5bffa0b..a185fe9d04 100644 --- a/explore-analyze/elastic-inference/inference-api.md +++ b/explore-analyze/elastic-inference/inference-api.md @@ -107,37 +107,132 @@ By default, documents are split into sentences and grouped in sections up to 250 Several strategies are available for chunking: -`sentence` -: The `sentence` strategy splits the input text at sentence boundaries. Each chunk contains one or more complete sentences ensuring that the integrity of sentence-level context is preserved, except when a sentence causes a chunk to exceed a word count of `max_chunk_size`, in which case it will be split across chunks. The `sentence_overlap` option defines the number of sentences from the previous chunk to include in the current chunk which is either `0` or `1`. +#### `sentence` -`word` -: The `word` strategy splits the input text on individual words up to the `max_chunk_size` limit. The `overlap` option is the number of words from the previous chunk to include in the current chunk. +The `sentence` strategy splits the input text at sentence boundaries. Each chunk contains one or more complete sentences ensuring that the integrity of sentence-level context is preserved, except when a sentence causes a chunk to exceed a word count of `max_chunk_size`, in which case it will be split across chunks. The `sentence_overlap` option defines the number of sentences from the previous chunk to include in the current chunk which is either `0` or `1`. -`recursive`{applies_to}`stack: ga 9.1` -: The `recursive` strategy splits the input text based on a configurable list of separator patterns (for example, newlines or Markdown headers). The chunker applies these separators in order, recursively splitting any chunk that exceeds the `max_chunk_size` word limit. If no separator produces a small enough chunk, the strategy falls back to sentence-level splitting. +The following example creates an {{infer}} endpoint with the `elasticsearch` service that deploys the ELSER model and configures the chunking behavior with the `sentence` strategy. -`none` {applies_to}`stack: ga 9.1` - -: The `none` strategy disables chunking and processes the entire input text as a single block, without any splitting or overlap. When using this strategy, you can instead [pre-chunk](https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/semantic-text#auto-text-chunking) the input by providing an array of strings, where each element acts as a separate chunk to be sent directly to the inference service without further chunking. +```console +PUT _inference/sparse_embedding/sentence_chunks +{ + "service": "elasticsearch", + "service_settings": { + "model_id": ".elser_model_2", + "num_allocations": 1, + "num_threads": 1 + }, + "chunking_settings": { + "strategy": "sentence", + "max_chunk_size": 100, + "sentence_overlap": 0 + } +} +``` The default chunking strategy is `sentence`. -#### Example of configuring the chunking behavior +#### `word` + +The `word` strategy splits the input text on individual words up to the `max_chunk_size` limit. The `overlap` option is the number of words from the previous chunk to include in the current chunk. -The following example creates an {{infer}} endpoint with the `elasticsearch` service that deploys the ELSER model by default and configures the chunking behavior. +The following example creates an {{infer}} endpoint with the `elasticsearch` service that deploys the ELSER model and configures the chunking behavior with the `word` strategy, setting a maximum of 120 words per chunk and an overlap of 40 words between chunks. ```console -PUT _inference/sparse_embedding/small_chunk_size +PUT _inference/sparse_embedding/word_chunks { "service": "elasticsearch", "service_settings": { + "model_id": ".elser_model_2", "num_allocations": 1, "num_threads": 1 }, "chunking_settings": { - "strategy": "sentence", - "max_chunk_size": 100, - "sentence_overlap": 0 + "strategy": "word", + "max_chunk_size": 120, + "overlap": 40 + } +} +``` + +#### `recursive` + +```{applies_to} +stack: ga 9.1` +``` + +The `recursive` strategy splits the input text based on a configurable list of separator patterns (for example, newlines or Markdown headers). The chunker applies these separators in order, recursively splitting any chunk that exceeds the `max_chunk_size` word limit. If no separator produces a small enough chunk, the strategy falls back to sentence-level splitting. + +##### Markdown separator group + +The following example creates an {{infer}} endpoint with the `elasticsearch` service that deploys the ELSER model and configures chunking with the `recursive` strategy using the markdown separator group and a maximum of 200 words per chunk. + +```console +PUT _inference/sparse_embedding/recursive_markdown_chunks +{ + "service": "elasticsearch", + "service_settings": { + "model_id": ".elser_model_2", + "num_allocations": 1, + "num_threads": 1 + }, + "chunking_settings": { + "strategy": "recursive", + "max_chunk_size": 200, + "separator_group": "markdown" + } +} +``` + +##### Custom separator group + +The following example creates an {{infer}} endpoint with the `elasticsearch` service that deploys the ELSER model and configures chunking with the `recursive` strategy. It uses a custom list of separators to split plaintext into chunks of up to 180 words. + + +```console +PUT _inference/sparse_embedding/recursive_custom_chunks +{ + "service": "elasticsearch", + "service_settings": { + "model_id": ".elser_model_2", + "num_allocations": 1, + "num_threads": 1 + }, + "chunking_settings": { + "strategy": "recursive", + "max_chunk_size": 180, + "separators": [ + "^(#{1,6})\\s", + "\\n\\n", + "\\n[-*]\\s", + "\\n\\d+\\.\\s", + "\\n" + ] + } +} +``` + +#### `none` + +```{applies_to} +stack: ga 9.1` +``` + +The `none` strategy disables chunking and processes the entire input text as a single block, without any splitting or overlap. When using this strategy, you can instead [pre-chunk](https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/semantic-text#auto-text-chunking) the input by providing an array of strings, where each element acts as a separate chunk to be sent directly to the inference service without further chunking. + +The following example creates an {{infer}} endpoint with the `elasticsearch` service that deploys the ELSER model and disables chunking by setting the strategy to `none`. + +```console +PUT _inference/sparse_embedding/none_chunking +{ + "service": "elasticsearch", + "service_settings": { + "model_id": ".elser_model_2", + "num_allocations": 1, + "num_threads": 1 + }, + "chunking_settings": { + "strategy": "none" } } ```