Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
99 changes: 95 additions & 4 deletions docs/guides/OPENAI_CHAT_COMPLETION.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
## Table of Contents

- [Introduction](#introduction)
- [New User Interface (v1.4.0)](#new-user-interface-v140)
- [Prerequisites](#prerequisites)
- [Maven Dependencies](#maven-dependencies)
- [Usage](#usage)
Expand All @@ -18,6 +19,20 @@

This guide demonstrates how to use the SAP AI SDK for Java to perform chat completion tasks using OpenAI models deployed on SAP AI Core.

### New User Interface (v1.4.0)

We're excited to introduce a new user interface for OpenAI chat completions starting with **version 1.4.0**. This update is designed to improve the SDK by:

- **Decoupling Layers:** Separating the convenience layer from the model classes to deliver a more stable and maintainable experience.
- **Staying Current:** Making it easier for the SDK to adapt to the latest changes in the OpenAI API specification.
- **Consistent Design:** Aligning with the Orchestrator convenience API for a smoother transition and easier adoption.

**Please Note:**

- The new interface is gradually being rolled out across the SDK.
- We welcome your feedback to help us refine this interface.
- The existing interface (v1.0.0) remains available for compatibility.

## Prerequisites

Before using the AI Core module, ensure that you have met all the general requirements outlined in the [README.md](../../README.md#general-requirements).
Expand Down Expand Up @@ -109,6 +124,20 @@ OpenAiClient.withCustomDestination(destination);

## Message history

**Since v1.4.0**

```java
var request =
new OpenAiChatCompletionRequest(
OpenAiMessage.system("You are a helpful assistant"),
OpenAiMessage.user("Hello World! Why is this phrase so famous?"));

var response = OpenAiClient.forModel(GPT_4O).chatCompletion(request).getContent();
```

<details>
<summary><b>Since v1.0.0</b></summary>

```java
var systemMessage =
new OpenAiChatSystemMessage().setContent("You are a helpful assistant");
Expand All @@ -124,6 +153,8 @@ String resultMessage = result.getContent();

See [an example in our Spring Boot application](../../sample-code/spring-app/src/main/java/com/sap/ai/sdk/app/services/OpenAiService.java)

</details>

## Chat Completion with Specific Model Version

By default, when no version is specified, the system selects one of the available deployments of the specified model, regardless of its version.
Expand All @@ -149,7 +180,7 @@ Ensure that the custom model is deployed in SAP AI Core.

It's possible to pass a stream of chat completion delta elements, e.g. from the application backend to the frontend in real-time.

### Asynchronous Streaming
### Asynchronous Streaming - Blocking

This is a blocking example for streaming and printing directly to the console:

Expand All @@ -168,16 +199,58 @@ try (Stream<String> stream = client.streamChatCompletion(msg)) {
}
```

### Aggregating Total Output
### Asynchronous Streaming - Non-blocking

**Since v1.4.0**

The following example demonstrate how you can leverage a concurrency-safe container (like an AtomicReference) to "listen" for usage information in any incoming delta.

```java
String question = "Can you give me the first 100 numbers of the Fibonacci sequence?";
var userMessage = OpenAiMessage.user(question);
var request = new OpenAiChatCompletionRequest(userMessage);

OpenAiClient client = OpenAiClient.forModel(GPT_4O);
var usageRef = new AtomicReference<CompletionUsage>();

// Prepare the stream before starting the thread to handle any initialization exceptions
Stream<OpenAiChatCompletionDelta> stream = client.streamChatCompletionDeltas(request);

// Create a new thread for asynchronous, non-blocking processing
Thread streamProcessor =
new Thread(
() -> {
// try-with-resources ensures the stream is closed after processing
try (stream) {
stream.forEach(
delta -> {
usageRef.compareAndExchange(null, delta.getCompletionUsage());
System.out.println("Content: " + delta.getDeltaContent());
});
}
});

// Start the processing thread; the main thread remains free (non-blocking)
streamProcessor.start();
// Wait for the thread to finish (blocking)
streamProcessor.join();

// Access information caught from completion usage
Integer tokensUsed = usageRef.get().getCompletionTokens();
System.out.println("Tokens used: " + tokensUsed);
```

<details>
<summary><b>Since v1.0.0</b></summary>

The following example is non-blocking and demonstrates how to aggregate the complete response.
Any asynchronous library can be used, such as the classic Thread API.

```java
var message = "Can you give me the first 100 numbers of the Fibonacci sequence?";
var question = "Can you give me the first 100 numbers of the Fibonacci sequence?";

var userMessage =
new OpenAiChatMessage.OpenAiChatUserMessage().addText(message);
new OpenAiChatMessage.OpenAiChatUserMessage().addText(question);
var requestParameters =
new OpenAiChatCompletionParameters().addMessages(userMessage);

Expand Down Expand Up @@ -208,14 +281,32 @@ System.out.println("Tokens used: " + tokensUsed);
Please find [an example in our Spring Boot application](../../sample-code/spring-app/src/main/java/com/sap/ai/sdk/app/services/OpenAiService.java). It shows the usage of Spring
Boot's `ResponseBodyEmitter` to stream the chat completion delta messages to the frontend in real-time.

</details>

## Embedding

**Since v1.4.0**

Get the embeddings of a text input in list of float values:

```java
var request = new OpenAiEmbeddingRequest(List.of("Hello World"));

OpenAiEmbeddingResponse response = OpenAiClient.forModel(TEXT_EMBEDDING_ADA_002).embedding(request);
float[] embedding = embedding.getEmbeddings().get(0);
```

<details>
<summary><b>Since v1.0.0</b></summary>

```java
var request = new OpenAiEmbeddingParameters().setInput("Hello World");

OpenAiEmbeddingOutput embedding = OpenAiClient.forModel(TEXT_EMBEDDING_ADA_002).embedding(request);

float[] embedding = embedding.getData().get(0).getEmbedding();
```

See [an example in our Spring Boot application](../../sample-code/spring-app/src/main/java/com/sap/ai/sdk/app/services/OpenAiService.java)

</details>
19 changes: 13 additions & 6 deletions docs/release-notes/release_notes.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,16 +8,23 @@

### 🔧 Compatibility Notes

- The constructors `UserMessage(MessageContent)` and `SystemMessage(MessageContent)` are removed. Use `Message.user(String)`, `Message.user(ImageItem)`, or `Message.system(String)` instead.
- [Orchestration] The constructors `UserMessage(MessageContent)` and `SystemMessage(MessageContent)` are removed. Use `Message.user(String)`, `Message.user(ImageItem)`, or `Message.system(String)` instead.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its time we have a format to clarify which module a change is about?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say that it makes sense to put [module] in front if it is just one line. For multiple changes in the same line we should probably do what we did before:

  • Module
    - change 1
    - change 2

- Deprecate `getCustomField(String)` in favor of `toMap()` on generated model classes.
- `com.sap.ai.sdk.core.model.*`
- `com.sap.ai.sdk.orchestration.model.*`
- `com.sap.ai.sdk.core.model.*`
- `com.sap.ai.sdk.orchestration.model.*`

### ✨ New Functionality

- [Add Spring AI tool calling](../guides/SPRING_AI_INTEGRATION.md#tool-calling).
- [Add Document Grounding Client](https://github.com/SAP/ai-sdk-java/tree/main/docs/guides/GROUNDING.md)
- `com.sap.ai.sdk:document-grounding:1.4.0`
- [Orchestration] [Add Spring AI tool calling](../guides/SPRING_AI_INTEGRATION.md#tool-calling).
- [Document Grounding] [Add Document Grounding Client](https://github.com/SAP/ai-sdk-java/tree/main/docs/guides/GROUNDING.md)
- `com.sap.ai.sdk:document-grounding:1.4.0`
- [OpenAI]
- New generated model classes introduced for _AzureOpenAI_ specification dated 2024-10-21.
- Introducing [new user interface](../guides/OPENAI_CHAT_COMPLETION.md/#new-user-interface-v140) for chat completion wrapping the generated model classes.
- `OpenAiChatCompletionRequest` and `OpenAiChatCompletionResponse`' for high level request and response handling.
- `OpenAiUserMessage`, `OpenAiSystemMessage`, `OpenAiAssistantMessage` and `OpenAiToolMessage` for message creation for different content types.
- `OpenAiToolChoice` for configuring chat completion requests with tool selection strategy.
- Introducing new user interface for embedding calls using `OpenAiEmbeddingRequest` and `OpenAiEmbeddingResponse`.

### 📈 Improvements

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,29 +48,27 @@ void streamChatCompletion() {
final var userMessage = OpenAiMessage.user("Who is the prettiest?");
final var prompt = new OpenAiChatCompletionRequest(userMessage);

final var totalOutput = new AtomicReference<CompletionUsage>();
final var usageRef = new AtomicReference<CompletionUsage>();
final var filledDeltaCount = new AtomicInteger(0);

OpenAiClient.forModel(GPT_35_TURBO)
.streamChatCompletionDeltas(prompt)
// foreach consumes all elements, closing the stream at the end
.forEach(
delta -> {
final var usage = delta.getCompletionUsage();
totalOutput.compareAndExchange(null, usage);
usageRef.compareAndExchange(null, delta.getCompletionUsage());
final String deltaContent = delta.getDeltaContent();
log.info("delta: {}", delta);
if (!deltaContent.isEmpty()) {
filledDeltaCount.incrementAndGet();
}
});

// the first two and the last delta don't have any content
// see OpenAiChatCompletionDelta#getDeltaContent
assertThat(filledDeltaCount.get()).isGreaterThan(0);

assertThat(totalOutput.get().getTotalTokens()).isGreaterThan(0);
assertThat(totalOutput.get().getPromptTokens()).isEqualTo(14);
assertThat(totalOutput.get().getCompletionTokens()).isGreaterThan(0);
assertThat(usageRef.get().getTotalTokens()).isGreaterThan(0);
assertThat(usageRef.get().getPromptTokens()).isEqualTo(14);
assertThat(usageRef.get().getCompletionTokens()).isGreaterThan(0);
}

@Test
Expand Down
Loading