Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 2 additions & 3 deletions docs/reference/mapping/types/semantic-text.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -87,16 +87,15 @@ Trying to <<delete-inference-api,delete an {infer} endpoint>> that is used on a

[discrete]
[[auto-text-chunking]]
==== Automatic text chunking
==== Text chunking

{infer-cap} endpoints have a limit on the amount of text they can process.
To allow for large amounts of text to be used in semantic search, `semantic_text` automatically generates smaller passages if needed, called _chunks_.

Each chunk will include the text subpassage and the corresponding embedding generated from it.
When querying, the individual passages will be automatically searched for each document, and the most relevant passage will be used to compute a score.

Documents are split into 250-word sections with a 100-word overlap so that each section shares 100 words with the previous section.
This overlap ensures continuity and prevents vital contextual information in the input text from being lost by a hard break.
For more details on chunking and how to configure chunking settings, see <<infer-chunking-config, Configuring chunking>> in the Inference API documentation.


[discrete]
Expand Down
Loading