Skip to content

Commit a73e972

Browse files
szabosteveprwhelan
andauthored
[DOCS] Adds stream inference API docs (#115333) (#115623)
Co-authored-by: Pat Whelan <[email protected]>
1 parent 1db03c4 commit a73e972

File tree

2 files changed

+124
-0
lines changed

2 files changed

+124
-0
lines changed

docs/reference/inference/inference-apis.asciidoc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ the following APIs to manage {infer} models and perform {infer}:
1919
* <<get-inference-api>>
2020
* <<post-inference-api>>
2121
* <<put-inference-api>>
22+
* <<stream-inference-api>>
2223
* <<update-inference-api>>
2324

2425
[[inference-landscape]]
@@ -56,6 +57,7 @@ include::delete-inference.asciidoc[]
5657
include::get-inference.asciidoc[]
5758
include::post-inference.asciidoc[]
5859
include::put-inference.asciidoc[]
60+
include::stream-inference.asciidoc[]
5961
include::update-inference.asciidoc[]
6062
include::service-alibabacloud-ai-search.asciidoc[]
6163
include::service-amazon-bedrock.asciidoc[]
Lines changed: 122 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,122 @@
1+
[role="xpack"]
2+
[[stream-inference-api]]
3+
=== Stream inference API
4+
5+
Streams a chat completion response.
6+
7+
IMPORTANT: The {infer} APIs enable you to use certain services, such as built-in {ml} models (ELSER, E5), models uploaded through Eland, Cohere, OpenAI, Azure, Google AI Studio, Google Vertex AI, Anthropic, Watsonx.ai, or Hugging Face.
8+
For built-in models and models uploaded through Eland, the {infer} APIs offer an alternative way to use and manage trained models.
9+
However, if you do not plan to use the {infer} APIs to use these models or if you want to use non-NLP models, use the <<ml-df-trained-models-apis>>.
10+
11+
12+
[discrete]
13+
[[stream-inference-api-request]]
14+
==== {api-request-title}
15+
16+
`POST /_inference/<inference_id>/_stream`
17+
18+
`POST /_inference/<task_type>/<inference_id>/_stream`
19+
20+
21+
[discrete]
22+
[[stream-inference-api-prereqs]]
23+
==== {api-prereq-title}
24+
25+
* Requires the `monitor_inference` <<privileges-list-cluster,cluster privilege>>
26+
(the built-in `inference_admin` and `inference_user` roles grant this privilege)
27+
* You must use a client that supports streaming.
28+
29+
30+
[discrete]
31+
[[stream-inference-api-desc]]
32+
==== {api-description-title}
33+
34+
The stream {infer} API enables real-time responses for completion tasks by delivering answers incrementally, reducing response times during computation.
35+
It only works with the `completion` task type.
36+
37+
38+
[discrete]
39+
[[stream-inference-api-path-params]]
40+
==== {api-path-parms-title}
41+
42+
`<inference_id>`::
43+
(Required, string)
44+
The unique identifier of the {infer} endpoint.
45+
46+
47+
`<task_type>`::
48+
(Optional, string)
49+
The type of {infer} task that the model performs.
50+
51+
52+
[discrete]
53+
[[stream-inference-api-request-body]]
54+
==== {api-request-body-title}
55+
56+
`input`::
57+
(Required, string or array of strings)
58+
The text on which you want to perform the {infer} task.
59+
`input` can be a single string or an array.
60+
+
61+
--
62+
[NOTE]
63+
====
64+
Inference endpoints for the `completion` task type currently only support a
65+
single string as input.
66+
====
67+
--
68+
69+
70+
[discrete]
71+
[[stream-inference-api-example]]
72+
==== {api-examples-title}
73+
74+
The following example performs a completion on the example question with streaming.
75+
76+
77+
[source,console]
78+
------------------------------------------------------------
79+
POST _inference/completion/openai-completion/_stream
80+
{
81+
"input": "What is Elastic?"
82+
}
83+
------------------------------------------------------------
84+
// TEST[skip:TBD]
85+
86+
87+
The API returns the following response:
88+
89+
90+
[source,txt]
91+
------------------------------------------------------------
92+
event: message
93+
data: {
94+
"completion":[{
95+
"delta":"Elastic"
96+
}]
97+
}
98+
99+
event: message
100+
data: {
101+
"completion":[{
102+
"delta":" is"
103+
},
104+
{
105+
"delta":" a"
106+
}
107+
]
108+
}
109+
110+
event: message
111+
data: {
112+
"completion":[{
113+
"delta":" software"
114+
},
115+
{
116+
"delta":" company"
117+
}]
118+
}
119+
120+
(...)
121+
------------------------------------------------------------
122+
// NOTCONSOLE

0 commit comments

Comments
 (0)