Skip to content
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions specification/inference/_types/CommonTypes.ts
Original file line number Diff line number Diff line change
Expand Up @@ -1306,6 +1306,23 @@ export class ElasticsearchServiceSettings {
* The maximum value is 32.
*/
num_threads: integer
/**
* Only for the `rerank` task type.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A quick clarification. For 9.2, these two values are only configurable for rerank endpoints using the elastic reranker model.

* Controls the strategy used for processing long documents during inference.
*
* Possible values:
* - `truncate` (default): Processes only the beginning of each document.
* - `chunk`: Splits long documents into smaller parts (chunks) before inference.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure where it's best to clarify this but with chunking enabled we will return to the user a single score per document (same as we do for truncating) with the score correlating to the highest score of any chunk. I just want to make it clear that the structure of the response to the user will not change, only the rerank relevance scores.

*
* To enable chunking, set this value to `chunk`.
*/
long_document_strategy?: string
/**
* Only for the `rerank` task type.
* Limits the number of chunks per document that are sent for inference when chunking is enabled.
* If not set, all chunks generated for the document are processed.
*/
max_chunks_per_doc?: integer
}

export class ElasticsearchTaskSettings {
Expand Down
61 changes: 61 additions & 0 deletions specification/inference/_types/Services.ts
Original file line number Diff line number Diff line change
Expand Up @@ -322,6 +322,67 @@ export class InferenceEndpointInfoWatsonx extends InferenceEndpoint {
task_type: TaskTypeWatsonx
}

/**
* Chunking configuration object
*/
export class ElasticsearchInferenceChunkingSettings {
/**
* The maximum size of a chunk in words.
* This value cannot be lower than `20` (for `sentence` strategy) or `10` (for `word` strategy).
* This value should not exceed the window size for the associated model.
* @server_default 250
*/
max_chunk_size?: integer
/**
* The number of overlapping words for chunks.
* It is applicable only to a `word` chunking strategy.
* This value cannot be higher than half the `max_chunk_size` value.
* @server_default 100
*/
overlap?: integer
/**
* The number of overlapping sentences for chunks.
* It is applicable only for a `sentence` chunking strategy.
* It can be either `1` or `0`.
* @server_default 1
*/
sentence_overlap?: integer
/**
* Only applicable to the `recursive` strategy and required when using it.
*
* Sets a predefined list of separators in the saved chunking settings based on the selected text type.
* Values can be `markdown` or `plaintext`.
*
* Using this parameter is an alternative to manually specifying a custom `separators` list.
*/
separator_group?: string
/**
* Only applicable to the `recursive` strategy and required when using it.
*
* A list of strings used as possible split points when chunking text.
*
* Each string can be a plain string or a regular expression (regex) pattern.
* The system tries each separator in order to split the text, starting from the first item in the list.
*
* After splitting, it attempts to recombine smaller pieces into larger chunks that stay within
* the `max_chunk_size` limit, to reduce the total number of chunks generated.
*/
separators?: string[]
/**
* The chunking strategy: `sentence`, `word`, `none` or `recursive`.
*
* * If `strategy` is set to `recursive`, you must also specify:
*
* - `max_chunk_size`
* - either `separators` or`separator_group`
*
* Learn more about different chunking strategies in the linked documentation.
* @server_default sentence
* @ext_doc_id chunking-strategies
*/
strategy?: string
}

/**
* Chunking configuration object
*/
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ import {
ElasticsearchTaskSettings,
ElasticsearchTaskType
} from '@inference/_types/CommonTypes'
import { InferenceChunkingSettings } from '@inference/_types/Services'
import { ElasticsearchInferenceChunkingSettings } from '@inference/_types/Services'

/**
* Create an Elasticsearch inference endpoint.
Expand Down Expand Up @@ -78,10 +78,10 @@ export interface Request extends RequestBase {
}
body: {
/**
* The chunking configuration object.
* The chunking configuration object. For the `rerank` task type, you can enable chunking by setting the `long_document_strategy` parameter to `chunk` in the `service_settings` object.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we need to be more specific about this anywhere but for this new method of chunking the user can not set chunking_settings the way that they would for embeddings. We handle building the chunking settings for them. If we want to clarify how we build the chunking settings somewhere we can.

* @ext_doc_id inference-chunking
*/
chunking_settings?: InferenceChunkingSettings
chunking_settings?: ElasticsearchInferenceChunkingSettings
/**
* The type of service supported for the specified task type. In this case, `elasticsearch`.
*/
Expand Down