Streaming chat client does not seem to work out of the box for Azure OpenAI, but, does for Bedrock Anthropic3

**Bug description**

_(Note: this is all a local test for now, but, the actual models being called are "production-ready", so to say so its a representative example)_

When calling the below code from an API client, like Postman:

```
 @GetMapping(value = "/streamed-completions-get", produces = TEXT_EVENT_STREAM_VALUE)
    public ResponseEntity<Flux<ChatResponse>> liveCompletionsStreamed(HttpServletResponse httpServletResponse) {
        httpServletResponse.setHeader("Content-Security-Policy",
            "default-src 'self'; connect-src 'self' http://localhost:9841");
        httpServletResponse.setStatus(HttpServletResponse.SC_OK);
        httpServletResponse.setContentType(TEXT_EVENT_STREAM_VALUE);

        return ResponseEntity.ok(((StreamingModel)chatModel).stream(new Prompt("Explain me how to configure the server port in springboot with examples")));
    }
```

the expected behavior would be that the response is always streamed, meaning that chunks are received over time instead of all at once.

This works as expected when connecting to Anthropic models:
```
spring:
  ai:
    bedrock:
      anthropic3:
        chat:
          enabled: true
          model: anthropic.claude-3-haiku-20240307-v1:0
```

See below the screenshot for evidence that the response is streamed back:

<img width="1392" alt="Screenshot 2024-05-24 at 17 15 08" src="https://github.com/spring-projects/spring-ai/assets/4722412/d11528c1-9023-4c0a-a1c3-75d1d6d4f657">

However, then, I simply stop the app, "swap" the credentials to use the Azure OpenAI Chat Client and the answer is instead, all "streamed at once":

<img width="1393" alt="Screenshot 2024-05-24 at 17 17 33" src="https://github.com/spring-projects/spring-ai/assets/4722412/6fc34b5b-8bc1-45e0-8497-5878d071756f">

What I expected is that the answer would also be streamed just like it is when using Anthropic models.

**Environment**

Using:
Spring AI: 1.0.0-SNAPSHOT (bleeding edge release, received yesterday's breaking changes)
SpringBoot: 3.2.4
Java: 21 

**Steps to reproduce**
A streaming call with a ChatClient configured for Bedrock Anthropic3 models will stream.
The exact same code for an Azure OpenAI client will not stream in chunks, but all at once.

**Expected behavior**
Essentially, by only tweaking an environment variable (think model being used for example) that the streaming mechanics works out of the box as expected without any need for "custom code".

**Minimal Complete Reproducible example**
Example should be easy to reproduce, but, for further clarity, here are the beans Im using in a configuration class:

```
@Configuration
@Getter
@Setter
@EnableAutoConfiguration(exclude = {AzureOpenAiAutoConfiguration.class, BedrockAnthropic3ChatAutoConfiguration.class})
public class LargeLangModelConfiguration {
    private static final float TEMPERATURE = 0.0f;
    @Value(value = "${spring.ai.azure.openai.endpoint:#{null}}")
    private String endpoint;
    @Value(value = "${spring.ai.azure.openai.api-key:#{null}}")
    private String apiKey;

    @Value(value = "${spring.ai.azure.openai.chat.options.deployment-name:#{null}}")
    private String azureModel;
    @Value(value = "${spring.ai.bedrock.anthropic3.chat.model:#{null}}")
    private String model;
   
    @Value(value = "${spring.ai.bedrock.region}")
    private String region;

    @Bean
    @Primary
    @Qualifier("chatClientGpt")
    @ConditionalOnProperty("spring.ai.azure.openai.chat.options.deployment-name")
    public ChatModel azureOpenAiChatClientGpt() {
        return getAzureOpenAIModel(getAzureModel());
    }

    @Bean
    @Primary
    @Qualifier("chatClientAnthropic")
    @ConditionalOnProperty("spring.ai.bedrock.anthropic3.chat.model")
    public ChatModel chatClientAnthropic() {
        return getAnthropicModel(getModel());
    }

    private ChatModel getAnthropicModel(String model) {
        return new BedrockAnthropic3ChatModel(new Anthropic3ChatBedrockApi(
            model,
            DefaultCredentialsProvider.create(),
            getRegion(),
            new ObjectMapper(),
            Duration.ofMillis(20000)),
            Anthropic3ChatOptions.builder()
                .withTemperature(TEMPERATURE)
                .withMaxTokens(1000)
                .withAnthropicVersion(DEFAULT_ANTHROPIC_VERSION)
                .build());
    }

    private ChatModel getAzureOpenAIModel(String model) {
        return new AzureOpenAiChatModel(
            new OpenAIClientBuilder().endpoint(endpoint).credential(new KeyCredential(apiKey)).buildClient()
            , AzureOpenAiChatOptions.builder()
            .withTemperature(TEMPERATURE)
            .withDeploymentName(model)
            .build());
    }
}
```

I wonder if this is a bug in the stream library call for the underlying Azure OpenAI client or a mistake on the configuration on my side.

Any pointers appreciated!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Streaming chat client does not seem to work out of the box for Azure OpenAI, but, does for Bedrock Anthropic3 #764

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Streaming chat client does not seem to work out of the box for Azure OpenAI, but, does for Bedrock Anthropic3 #764

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions