|
| 1 | +[role="xpack"] |
| 2 | +[[stream-inference-api]] |
| 3 | +=== Stream inference API |
| 4 | + |
| 5 | +Streams a chat completion response. |
| 6 | + |
| 7 | +IMPORTANT: The {infer} APIs enable you to use certain services, such as built-in {ml} models (ELSER, E5), models uploaded through Eland, Cohere, OpenAI, Azure, Google AI Studio, Google Vertex AI, Anthropic, Watsonx.ai, or Hugging Face. |
| 8 | +For built-in models and models uploaded through Eland, the {infer} APIs offer an alternative way to use and manage trained models. |
| 9 | +However, if you do not plan to use the {infer} APIs to use these models or if you want to use non-NLP models, use the <<ml-df-trained-models-apis>>. |
| 10 | + |
| 11 | + |
| 12 | +[discrete] |
| 13 | +[[stream-inference-api-request]] |
| 14 | +==== {api-request-title} |
| 15 | + |
| 16 | +`POST /_inference/<inference_id>/_stream` |
| 17 | + |
| 18 | +`POST /_inference/<task_type>/<inference_id>/_stream` |
| 19 | + |
| 20 | + |
| 21 | +[discrete] |
| 22 | +[[stream-inference-api-prereqs]] |
| 23 | +==== {api-prereq-title} |
| 24 | + |
| 25 | +* Requires the `monitor_inference` <<privileges-list-cluster,cluster privilege>> |
| 26 | +(the built-in `inference_admin` and `inference_user` roles grant this privilege) |
| 27 | +* You must use a client that supports streaming. |
| 28 | + |
| 29 | + |
| 30 | +[discrete] |
| 31 | +[[stream-inference-api-desc]] |
| 32 | +==== {api-description-title} |
| 33 | + |
| 34 | +The stream {infer} API enables real-time responses for completion tasks by delivering answers incrementally, reducing response times during computation. |
| 35 | +It only works with the `completion` task type. |
| 36 | + |
| 37 | + |
| 38 | +[discrete] |
| 39 | +[[stream-inference-api-path-params]] |
| 40 | +==== {api-path-parms-title} |
| 41 | + |
| 42 | +`<inference_id>`:: |
| 43 | +(Required, string) |
| 44 | +The unique identifier of the {infer} endpoint. |
| 45 | + |
| 46 | + |
| 47 | +`<task_type>`:: |
| 48 | +(Optional, string) |
| 49 | +The type of {infer} task that the model performs. |
| 50 | + |
| 51 | + |
| 52 | +[discrete] |
| 53 | +[[stream-inference-api-request-body]] |
| 54 | +==== {api-request-body-title} |
| 55 | + |
| 56 | +`input`:: |
| 57 | +(Required, string or array of strings) |
| 58 | +The text on which you want to perform the {infer} task. |
| 59 | +`input` can be a single string or an array. |
| 60 | ++ |
| 61 | +-- |
| 62 | +[NOTE] |
| 63 | +==== |
| 64 | +Inference endpoints for the `completion` task type currently only support a |
| 65 | +single string as input. |
| 66 | +==== |
| 67 | +-- |
| 68 | + |
| 69 | + |
| 70 | +[discrete] |
| 71 | +[[stream-inference-api-example]] |
| 72 | +==== {api-examples-title} |
| 73 | + |
| 74 | +The following example performs a completion on the example question with streaming. |
| 75 | + |
| 76 | + |
| 77 | +[source,console] |
| 78 | +------------------------------------------------------------ |
| 79 | +POST _inference/completion/openai-completion/_stream |
| 80 | +{ |
| 81 | + "input": "What is Elastic?" |
| 82 | +} |
| 83 | +------------------------------------------------------------ |
| 84 | +// TEST[skip:TBD] |
| 85 | + |
| 86 | + |
| 87 | +The API returns the following response: |
| 88 | + |
| 89 | + |
| 90 | +[source,txt] |
| 91 | +------------------------------------------------------------ |
| 92 | +event: message |
| 93 | +data: { |
| 94 | + "completion":[{ |
| 95 | + "delta":"Elastic" |
| 96 | + }] |
| 97 | +} |
| 98 | +
|
| 99 | +event: message |
| 100 | +data: { |
| 101 | + "completion":[{ |
| 102 | + "delta":" is" |
| 103 | + }, |
| 104 | + { |
| 105 | + "delta":" a" |
| 106 | + } |
| 107 | + ] |
| 108 | +} |
| 109 | +
|
| 110 | +event: message |
| 111 | +data: { |
| 112 | + "completion":[{ |
| 113 | + "delta":" software" |
| 114 | + }, |
| 115 | + { |
| 116 | + "delta":" company" |
| 117 | + }] |
| 118 | +} |
| 119 | +
|
| 120 | +(...) |
| 121 | +------------------------------------------------------------ |
| 122 | +// NOTCONSOLE |
0 commit comments