SAP · rpanackal · Feb 27, 2025 · Feb 27, 2025 · Feb 27, 2025 · Feb 27, 2025
diff --git a/docs/guides/OPENAI_CHAT_COMPLETION.md b/docs/guides/OPENAI_CHAT_COMPLETION.md
@@ -3,6 +3,7 @@
 ## Table of Contents
 
 - [Introduction](#introduction)
+    - [New User Interface (v1.4.0)](#new-user-interface-v140)
 - [Prerequisites](#prerequisites)
 - [Maven Dependencies](#maven-dependencies)
 - [Usage](#usage)
@@ -18,6 +19,20 @@
 
 This guide demonstrates how to use the SAP AI SDK for Java to perform chat completion tasks using OpenAI models deployed on SAP AI Core.
 
+### New User Interface (v1.4.0)
+
+We're excited to introduce a new user interface for OpenAI chat completions starting with **version 1.4.0**. This update is designed to improve the SDK by:
+
+- **Decoupling Layers:** Separating the convenience layer from the model classes to deliver a more stable and maintainable experience.
+- **Staying Current:** Making it easier for the SDK to adapt to the latest changes in the OpenAI API specification.
+- **Consistent Design:** Aligning with the Orchestrator convenience API for a smoother transition and easier adoption.
+
+**Please Note:**
+
+- The new interface is gradually being rolled out across the SDK.
+- We welcome your feedback to help us refine this interface.
+- The existing interface (v1.0.0) remains available for compatibility.
+
 ## Prerequisites
 
 Before using the AI Core module, ensure that you have met all the general requirements outlined in the [README.md](../../README.md#general-requirements).
@@ -109,6 +124,20 @@ OpenAiClient.withCustomDestination(destination);
 
 ## Message history
 
+**Since v1.4.0**
+
+```java
+var request =
+    new OpenAiChatCompletionRequest(
+        OpenAiMessage.system("You are a helpful assistant"),
+        OpenAiMessage.user("Hello World! Why is this phrase so famous?"));
+
+var response = OpenAiClient.forModel(GPT_4O).chatCompletion(request).getContent();
+```
+
+<details>
+<summary><b>Since v1.0.0</b></summary>
+
 ```java
 var systemMessage =
     new OpenAiChatSystemMessage().setContent("You are a helpful assistant");
@@ -124,6 +153,8 @@ String resultMessage = result.getContent();
 
 See [an example in our Spring Boot application](../../sample-code/spring-app/src/main/java/com/sap/ai/sdk/app/services/OpenAiService.java)
 
+</details>
+
 ## Chat Completion with Specific Model Version
 
 By default, when no version is specified, the system selects one of the available deployments of the specified model, regardless of its version.
@@ -149,7 +180,7 @@ Ensure that the custom model is deployed in SAP AI Core.
 
 It's possible to pass a stream of chat completion delta elements, e.g. from the application backend to the frontend in real-time.
 
-### Asynchronous Streaming
+### Asynchronous Streaming - Blocking
 
 This is a blocking example for streaming and printing directly to the console:
 
@@ -168,16 +199,58 @@ try (Stream<String> stream = client.streamChatCompletion(msg)) {
 }
 ```
 
-### Aggregating Total Output
+### Asynchronous Streaming - Non-blocking
+
+**Since v1.4.0**
+
+The following example demonstrate how you can leverage a concurrency-safe container (like an AtomicReference) to "listen" for usage information in any incoming delta.
+
+```java
+String question = "Can you give me the first 100 numbers of the Fibonacci sequence?";
+var userMessage = OpenAiMessage.user(question);
+var request = new OpenAiChatCompletionRequest(userMessage);
+
+OpenAiClient client = OpenAiClient.forModel(GPT_4O);
+var usageRef = new AtomicReference<CompletionUsage>();
+
+// Prepare the stream before starting the thread to handle any initialization exceptions
+Stream<OpenAiChatCompletionDelta> stream = client.streamChatCompletionDeltas(request);
+
+// Create a new thread for asynchronous, non-blocking processing
+Thread streamProcessor =
+    new Thread(
+        () -> {
+            // try-with-resources ensures the stream is closed after processing
+            try (stream) {
+                stream.forEach(
+                    delta -> {
+                        usageRef.compareAndExchange(null, delta.getCompletionUsage());
+                        System.out.println("Content: " + delta.getDeltaContent());
+                    });
+            }
+        });
+
+// Start the processing thread; the main thread remains free (non-blocking)
+streamProcessor.start();
+// Wait for the thread to finish (blocking)
+streamProcessor.join();
+
+// Access information caught from completion usage
+Integer tokensUsed = usageRef.get().getCompletionTokens();
+System.out.println("Tokens used: " + tokensUsed);
+```
+
+<details>
+<summary><b>Since v1.0.0</b></summary>
 
 The following example is non-blocking and demonstrates how to aggregate the complete response.
 Any asynchronous library can be used, such as the classic Thread API.
 
 ```java
-var message = "Can you give me the first 100 numbers of the Fibonacci sequence?";
+var question = "Can you give me the first 100 numbers of the Fibonacci sequence?";
 
 var userMessage =
-    new OpenAiChatMessage.OpenAiChatUserMessage().addText(message);
+    new OpenAiChatMessage.OpenAiChatUserMessage().addText(question);
 var requestParameters =
     new OpenAiChatCompletionParameters().addMessages(userMessage);
 
@@ -208,14 +281,32 @@ System.out.println("Tokens used: " + tokensUsed);
 Please find [an example in our Spring Boot application](../../sample-code/spring-app/src/main/java/com/sap/ai/sdk/app/services/OpenAiService.java). It shows the usage of Spring
 Boot's `ResponseBodyEmitter` to stream the chat completion delta messages to the frontend in real-time.
 
+</details>
+
 ## Embedding
 
+**Since v1.4.0**
+
 Get the embeddings of a text input in list of float values:
 
+```java
+var request = new OpenAiEmbeddingRequest(List.of("Hello World"));
+
+OpenAiEmbeddingResponse response = OpenAiClient.forModel(TEXT_EMBEDDING_ADA_002).embedding(request);
+float[] embedding = embedding.getEmbeddings().get(0);
+```
+
+<details>
+<summary><b>Since v1.0.0</b></summary>
+
 ```java
 var request = new OpenAiEmbeddingParameters().setInput("Hello World");
 
 OpenAiEmbeddingOutput embedding = OpenAiClient.forModel(TEXT_EMBEDDING_ADA_002).embedding(request);
+
+float[] embedding = embedding.getData().get(0).getEmbedding();
 ```
 
 See [an example in our Spring Boot application](../../sample-code/spring-app/src/main/java/com/sap/ai/sdk/app/services/OpenAiService.java)
+
+</details>
diff --git a/docs/release-notes/release_notes.md b/docs/release-notes/release_notes.md
@@ -8,16 +8,23 @@
 
 ### 🔧 Compatibility Notes
 
-- The constructors `UserMessage(MessageContent)` and `SystemMessage(MessageContent)` are removed. Use `Message.user(String)`, `Message.user(ImageItem)`, or `Message.system(String)` instead.
+- [Orchestration] The constructors `UserMessage(MessageContent)` and `SystemMessage(MessageContent)` are removed. Use `Message.user(String)`, `Message.user(ImageItem)`, or `Message.system(String)` instead.
 - Deprecate `getCustomField(String)` in favor of `toMap()` on generated model classes.
-  - `com.sap.ai.sdk.core.model.*`
-  - `com.sap.ai.sdk.orchestration.model.*`
+    - `com.sap.ai.sdk.core.model.*`
+    - `com.sap.ai.sdk.orchestration.model.*`
 
 ### ✨ New Functionality
 
-- [Add Spring AI tool calling](../guides/SPRING_AI_INTEGRATION.md#tool-calling).
-- [Add Document Grounding Client](https://github.com/SAP/ai-sdk-java/tree/main/docs/guides/GROUNDING.md)
-  - `com.sap.ai.sdk:document-grounding:1.4.0` 
+- [Orchestration] [Add Spring AI tool calling](../guides/SPRING_AI_INTEGRATION.md#tool-calling).
+- [Document Grounding] [Add Document Grounding Client](https://github.com/SAP/ai-sdk-java/tree/main/docs/guides/GROUNDING.md)
+    - `com.sap.ai.sdk:document-grounding:1.4.0`
+- [OpenAI]
+    - New generated model classes introduced for _AzureOpenAI_ specification dated 2024-10-21.
+    - Introducing [new user interface](../guides/OPENAI_CHAT_COMPLETION.md/#new-user-interface-v140) for chat completion wrapping the generated model classes.
+        - `OpenAiChatCompletionRequest` and `OpenAiChatCompletionResponse`' for high level request and response handling.
+        - `OpenAiUserMessage`, `OpenAiSystemMessage`, `OpenAiAssistantMessage` and `OpenAiToolMessage` for message creation for different content types.
+        - `OpenAiToolChoice` for configuring chat completion requests with tool selection strategy.
+    - Introducing new user interface for embedding calls using `OpenAiEmbeddingRequest` and `OpenAiEmbeddingResponse`.
 
 ### 📈 Improvements
 

diff --git a/sample-code/spring-app/src/test/java/com/sap/ai/sdk/app/controllers/OpenAiV2Test.java b/sample-code/spring-app/src/test/java/com/sap/ai/sdk/app/controllers/OpenAiV2Test.java
@@ -48,29 +48,27 @@ void streamChatCompletion() {
     final var userMessage = OpenAiMessage.user("Who is the prettiest?");
     final var prompt = new OpenAiChatCompletionRequest(userMessage);
 
-    final var totalOutput = new AtomicReference<CompletionUsage>();
+    final var usageRef = new AtomicReference<CompletionUsage>();
     final var filledDeltaCount = new AtomicInteger(0);
+
     OpenAiClient.forModel(GPT_35_TURBO)
         .streamChatCompletionDeltas(prompt)
         // foreach consumes all elements, closing the stream at the end
         .forEach(
             delta -> {
-              final var usage = delta.getCompletionUsage();
-              totalOutput.compareAndExchange(null, usage);
+              usageRef.compareAndExchange(null, delta.getCompletionUsage());
               final String deltaContent = delta.getDeltaContent();
               log.info("delta: {}", delta);
               if (!deltaContent.isEmpty()) {
                 filledDeltaCount.incrementAndGet();
               }
             });
 
-    // the first two and the last delta don't have any content
-    // see OpenAiChatCompletionDelta#getDeltaContent
     assertThat(filledDeltaCount.get()).isGreaterThan(0);
 
-    assertThat(totalOutput.get().getTotalTokens()).isGreaterThan(0);
-    assertThat(totalOutput.get().getPromptTokens()).isEqualTo(14);
-    assertThat(totalOutput.get().getCompletionTokens()).isGreaterThan(0);
+    assertThat(usageRef.get().getTotalTokens()).isGreaterThan(0);
+    assertThat(usageRef.get().getPromptTokens()).isEqualTo(14);
+    assertThat(usageRef.get().getCompletionTokens()).isGreaterThan(0);
   }
 
   @Test