[ML] Add ML limitation for ingesting large documents (#2877) (#2882)

maxhniebergall · web-flow · commit 664405d61c4f · 2024-11-28T08:21:31.000+01:00
diff --git a/docs/en/stack/ml/nlp/ml-nlp-limitations.asciidoc b/docs/en/stack/ml/nlp/ml-nlp-limitations.asciidoc
@@ -9,6 +9,12 @@
 The following limitations and known problems apply to the {version} release of 
 the Elastic {nlp} trained models feature.
 
+[discrete]
+[[ml-nlp-large-documents-limit-10k-10mb]]
+== Document size limitations when using `semantic_text` fields
+
+When using semantic text to ingest documents, chunking takes place automatically. The number of chunks is limited by the {ref}/mapping-settings-limit.html[`index.mapping.nested_objects.limit`] cluster setting, which defaults to 10k. Documents that are too large will cause errors during ingestion. To avoid this issue, please split your documents into roughly 1MB parts before ingestion.
+
 [discrete]
 [[ml-nlp-elser-v1-limit-512]]
 == ELSER semantic search is limited to 512 tokens per field that inference is applied to
@@ -17,4 +23,4 @@ When you use ELSER for semantic search, only the first 512 extracted tokens from
 each field of the ingested documents that ELSER is applied to are taken into 
 account for the search process. If your data set contains long documents, divide 
 them into smaller segments before ingestion if you need the full text to be 
-searchable.
+searchable.