Adding unified docs to main page

jonathan-buttner · jonathan-buttner · commit 45b8fa4af781 · 2024-12-16T09:48:19.000-05:00
diff --git a/docs/reference/inference/inference-apis.asciidoc b/docs/reference/inference/inference-apis.asciidoc
@@ -20,6 +20,7 @@ the following APIs to manage {infer} models and perform {infer}:
 * <<post-inference-api>>
 * <<put-inference-api>>
 * <<stream-inference-api>>
+* <<unified-inference-api>>
 * <<update-inference-api>>
 
 [[inference-landscape]]
@@ -28,9 +29,9 @@ image::images/inference-landscape.jpg[A representation of the Elastic inference
 
 An {infer} endpoint enables you to use the corresponding {ml} model without
 manual deployment and apply it to your data at ingestion time through
-<<semantic-search-semantic-text, semantic text>>. 
+<<semantic-search-semantic-text, semantic text>>.
 
-Choose a model from your provider or use ELSER – a retrieval model trained by 
+Choose a model from your provider or use ELSER – a retrieval model trained by
 Elastic –, then create an {infer} endpoint by the <<put-inference-api>>.
 Now use <<semantic-search-semantic-text, semantic text>> to perform
 <<semantic-search, semantic search>> on your data.
@@ -61,7 +62,7 @@ The following list contains the default {infer} endpoints listed by `inference_i
 Use the `inference_id` of the endpoint in a <<semantic-text,`semantic_text`>> field definition or when creating an <<inference-processor,{infer} processor>>.
 The API call will automatically download and deploy the model which might take a couple of minutes.
 Default {infer} enpoints have {ml-docs}/ml-nlp-auto-scale.html#nlp-model-adaptive-allocations[adaptive allocations] enabled.
-For these models, the minimum number of allocations is `0`. 
+For these models, the minimum number of allocations is `0`.
 If there is no {infer} activity that uses the endpoint, the number of allocations will scale down to `0` automatically after 15 minutes.
 
 
@@ -78,7 +79,7 @@ Returning a long document in search results is less useful than providing the mo
 Each chunk will include the text subpassage and the corresponding embedding generated from it.
 
 By default, documents are split into sentences and grouped in sections up to 250 words with 1 sentence overlap so that each chunk shares a sentence with the previous chunk.
-Overlapping ensures continuity and prevents vital contextual information in the input text from being lost by a hard break. 
+Overlapping ensures continuity and prevents vital contextual information in the input text from being lost by a hard break.
 
 {es} uses the https://unicode-org.github.io/icu-docs/[ICU4J] library to detect word and sentence boundaries for chunking.
 https://unicode-org.github.io/icu/userguide/boundaryanalysis/#word-boundary[Word boundaries] are identified by following a series of rules, not just the presence of a whitespace character.
@@ -129,6 +130,7 @@ PUT _inference/sparse_embedding/small_chunk_size
 include::delete-inference.asciidoc[]
 include::get-inference.asciidoc[]
 include::post-inference.asciidoc[]
+include::unified-inference.asciidoc[]
 include::put-inference.asciidoc[]
 include::stream-inference.asciidoc[]
 include::update-inference.asciidoc[]