Skip to content

Commit 664405d

Browse files
[ML] Add ML limitation for ingesting large documents (#2877) (#2882)
1 parent 5740148 commit 664405d

File tree

1 file changed

+7
-1
lines changed

1 file changed

+7
-1
lines changed

docs/en/stack/ml/nlp/ml-nlp-limitations.asciidoc

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,12 @@
99
The following limitations and known problems apply to the {version} release of
1010
the Elastic {nlp} trained models feature.
1111

12+
[discrete]
13+
[[ml-nlp-large-documents-limit-10k-10mb]]
14+
== Document size limitations when using `semantic_text` fields
15+
16+
When using semantic text to ingest documents, chunking takes place automatically. The number of chunks is limited by the {ref}/mapping-settings-limit.html[`index.mapping.nested_objects.limit`] cluster setting, which defaults to 10k. Documents that are too large will cause errors during ingestion. To avoid this issue, please split your documents into roughly 1MB parts before ingestion.
17+
1218
[discrete]
1319
[[ml-nlp-elser-v1-limit-512]]
1420
== ELSER semantic search is limited to 512 tokens per field that inference is applied to
@@ -17,4 +23,4 @@ When you use ELSER for semantic search, only the first 512 extracted tokens from
1723
each field of the ingested documents that ELSER is applied to are taken into
1824
account for the search process. If your data set contains long documents, divide
1925
them into smaller segments before ingestion if you need the full text to be
20-
searchable.
26+
searchable.

0 commit comments

Comments
 (0)