You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/reference/inference/inference-apis.asciidoc
+6-4Lines changed: 6 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,6 +20,7 @@ the following APIs to manage {infer} models and perform {infer}:
20
20
* <<post-inference-api>>
21
21
* <<put-inference-api>>
22
22
* <<stream-inference-api>>
23
+
* <<unified-inference-api>>
23
24
* <<update-inference-api>>
24
25
25
26
[[inference-landscape]]
@@ -28,9 +29,9 @@ image::images/inference-landscape.jpg[A representation of the Elastic inference
28
29
29
30
An {infer} endpoint enables you to use the corresponding {ml} model without
30
31
manual deployment and apply it to your data at ingestion time through
31
-
<<semantic-search-semantic-text, semantic text>>.
32
+
<<semantic-search-semantic-text, semantic text>>.
32
33
33
-
Choose a model from your provider or use ELSER – a retrieval model trained by
34
+
Choose a model from your provider or use ELSER – a retrieval model trained by
34
35
Elastic –, then create an {infer} endpoint by the <<put-inference-api>>.
35
36
Now use <<semantic-search-semantic-text, semantic text>> to perform
36
37
<<semantic-search, semantic search>> on your data.
@@ -61,7 +62,7 @@ The following list contains the default {infer} endpoints listed by `inference_i
61
62
Use the `inference_id` of the endpoint in a <<semantic-text,`semantic_text`>> field definition or when creating an <<inference-processor,{infer} processor>>.
62
63
The API call will automatically download and deploy the model which might take a couple of minutes.
63
64
Default {infer} enpoints have {ml-docs}/ml-nlp-auto-scale.html#nlp-model-adaptive-allocations[adaptive allocations] enabled.
64
-
For these models, the minimum number of allocations is `0`.
65
+
For these models, the minimum number of allocations is `0`.
65
66
If there is no {infer} activity that uses the endpoint, the number of allocations will scale down to `0` automatically after 15 minutes.
66
67
67
68
@@ -78,7 +79,7 @@ Returning a long document in search results is less useful than providing the mo
78
79
Each chunk will include the text subpassage and the corresponding embedding generated from it.
79
80
80
81
By default, documents are split into sentences and grouped in sections up to 250 words with 1 sentence overlap so that each chunk shares a sentence with the previous chunk.
81
-
Overlapping ensures continuity and prevents vital contextual information in the input text from being lost by a hard break.
82
+
Overlapping ensures continuity and prevents vital contextual information in the input text from being lost by a hard break.
82
83
83
84
{es} uses the https://unicode-org.github.io/icu-docs/[ICU4J] library to detect word and sentence boundaries for chunking.
84
85
https://unicode-org.github.io/icu/userguide/boundaryanalysis/#word-boundary[Word boundaries] are identified by following a series of rules, not just the presence of a whitespace character.
@@ -129,6 +130,7 @@ PUT _inference/sparse_embedding/small_chunk_size
0 commit comments