elastic · szabosteve · Jan 23, 2025 · Jan 22, 2025 · Jan 22, 2025 · Jan 22, 2025
diff --git a/docs/reference/inference/chat-completion-inference.asciidoc b/docs/reference/inference/chat-completion-inference.asciidoc
@@ -34,9 +34,11 @@ However, if you do not plan to use the {infer} APIs to use these models or if yo
 The chat completion {infer} API enables real-time responses for chat completion tasks by delivering answers incrementally, reducing response times during computation.
 It only works with the `chat_completion` task type for `openai` and `elastic` {infer} services.
 
+
 [NOTE]
 ====
-The `chat_completion` task type is only available within the _unified API and only supports streaming.
+* The `chat_completion` task type is only available within the _unified API and only supports streaming.
+* The Chat completion {infer} API and the Stream {infer} API differ in their response structure. If you use the `openai` service or the `elastic` service, use the Chat completion {infer} API.
 ====
 
 [discrete]

diff --git a/docs/reference/inference/stream-inference.asciidoc b/docs/reference/inference/stream-inference.asciidoc
@@ -40,6 +40,8 @@ However, if you do not plan to use the {infer} APIs to use these models or if yo
 The stream {infer} API enables real-time responses for completion tasks by delivering answers incrementally, reducing response times during computation.
 It only works with the `completion` and `chat_completion` task types.
 
+The Chat completion {infer} API and the Stream {infer} API differ in their response structure. If you use the `openai` service or the `elastic` service, use the Chat completion {infer} API.
+
 [NOTE]
 ====
 include::inference-shared.asciidoc[tag=chat-completion-docs]