diff --git a/docs/reference/inference/inference-apis.asciidoc b/docs/reference/inference/inference-apis.asciidoc index b291b464be498..c8550912f313e 100644 --- a/docs/reference/inference/inference-apis.asciidoc +++ b/docs/reference/inference/inference-apis.asciidoc @@ -19,6 +19,7 @@ the following APIs to manage {infer} models and perform {infer}: * <> * <> * <> +* <> * <> [[inference-landscape]] @@ -38,6 +39,7 @@ include::delete-inference.asciidoc[] include::get-inference.asciidoc[] include::post-inference.asciidoc[] include::put-inference.asciidoc[] +include::stream-inference.asciidoc[] include::update-inference.asciidoc[] include::service-alibabacloud-ai-search.asciidoc[] include::service-amazon-bedrock.asciidoc[] diff --git a/docs/reference/inference/stream-inference.asciidoc b/docs/reference/inference/stream-inference.asciidoc new file mode 100644 index 0000000000000..e66acd630cb3e --- /dev/null +++ b/docs/reference/inference/stream-inference.asciidoc @@ -0,0 +1,122 @@ +[role="xpack"] +[[stream-inference-api]] +=== Stream inference API + +Streams a chat completion response. + +IMPORTANT: The {infer} APIs enable you to use certain services, such as built-in {ml} models (ELSER, E5), models uploaded through Eland, Cohere, OpenAI, Azure, Google AI Studio, Google Vertex AI, Anthropic, Watsonx.ai, or Hugging Face. +For built-in models and models uploaded through Eland, the {infer} APIs offer an alternative way to use and manage trained models. +However, if you do not plan to use the {infer} APIs to use these models or if you want to use non-NLP models, use the <>. + + +[discrete] +[[stream-inference-api-request]] +==== {api-request-title} + +`POST /_inference//_stream` + +`POST /_inference///_stream` + + +[discrete] +[[stream-inference-api-prereqs]] +==== {api-prereq-title} + +* Requires the `monitor_inference` <> +(the built-in `inference_admin` and `inference_user` roles grant this privilege) +* You must use a client that supports streaming. + + +[discrete] +[[stream-inference-api-desc]] +==== {api-description-title} + +The stream {infer} API enables real-time responses for completion tasks by delivering answers incrementally, reducing response times during computation. +It only works with the `completion` task type. + + +[discrete] +[[stream-inference-api-path-params]] +==== {api-path-parms-title} + +``:: +(Required, string) +The unique identifier of the {infer} endpoint. + + +``:: +(Optional, string) +The type of {infer} task that the model performs. + + +[discrete] +[[stream-inference-api-request-body]] +==== {api-request-body-title} + +`input`:: +(Required, string or array of strings) +The text on which you want to perform the {infer} task. +`input` can be a single string or an array. ++ +-- +[NOTE] +==== +Inference endpoints for the `completion` task type currently only support a +single string as input. +==== +-- + + +[discrete] +[[stream-inference-api-example]] +==== {api-examples-title} + +The following example performs a completion on the example question with streaming. + + +[source,console] +------------------------------------------------------------ +POST _inference/completion/openai-completion/_stream +{ + "input": "What is Elastic?" +} +------------------------------------------------------------ +// TEST[skip:TBD] + + +The API returns the following response: + + +[source,txt] +------------------------------------------------------------ +event: message +data: { + "completion":[{ + "delta":"Elastic" + }] +} + +event: message +data: { + "completion":[{ + "delta":" is" + }, + { + "delta":" a" + } + ] +} + +event: message +data: { + "completion":[{ + "delta":" software" + }, + { + "delta":" company" + }] +} + +(...) +------------------------------------------------------------ +// NOTCONSOLE