Skip to content

Streaming chat client does not seem to work out of the box for Azure OpenAI, but, does for Bedrock Anthropic3Β #764

@bruno-oliveira

Description

@bruno-oliveira

Bug description

(Note: this is all a local test for now, but, the actual models being called are "production-ready", so to say so its a representative example)

When calling the below code from an API client, like Postman:

 @GetMapping(value = "/streamed-completions-get", produces = TEXT_EVENT_STREAM_VALUE)
    public ResponseEntity<Flux<ChatResponse>> liveCompletionsStreamed(HttpServletResponse httpServletResponse) {
        httpServletResponse.setHeader("Content-Security-Policy",
            "default-src 'self'; connect-src 'self' http://localhost:9841");
        httpServletResponse.setStatus(HttpServletResponse.SC_OK);
        httpServletResponse.setContentType(TEXT_EVENT_STREAM_VALUE);

        return ResponseEntity.ok(((StreamingModel)chatModel).stream(new Prompt("Explain me how to configure the server port in springboot with examples")));
    }

the expected behavior would be that the response is always streamed, meaning that chunks are received over time instead of all at once.

This works as expected when connecting to Anthropic models:

spring:
  ai:
    bedrock:
      anthropic3:
        chat:
          enabled: true
          model: anthropic.claude-3-haiku-20240307-v1:0

See below the screenshot for evidence that the response is streamed back:

Screenshot 2024-05-24 at 17 15 08

However, then, I simply stop the app, "swap" the credentials to use the Azure OpenAI Chat Client and the answer is instead, all "streamed at once":

Screenshot 2024-05-24 at 17 17 33

What I expected is that the answer would also be streamed just like it is when using Anthropic models.

Environment

Using:
Spring AI: 1.0.0-SNAPSHOT (bleeding edge release, received yesterday's breaking changes)
SpringBoot: 3.2.4
Java: 21

Steps to reproduce
A streaming call with a ChatClient configured for Bedrock Anthropic3 models will stream.
The exact same code for an Azure OpenAI client will not stream in chunks, but all at once.

Expected behavior
Essentially, by only tweaking an environment variable (think model being used for example) that the streaming mechanics works out of the box as expected without any need for "custom code".

Minimal Complete Reproducible example
Example should be easy to reproduce, but, for further clarity, here are the beans Im using in a configuration class:

@Configuration
@Getter
@Setter
@EnableAutoConfiguration(exclude = {AzureOpenAiAutoConfiguration.class, BedrockAnthropic3ChatAutoConfiguration.class})
public class LargeLangModelConfiguration {
    private static final float TEMPERATURE = 0.0f;
    @Value(value = "${spring.ai.azure.openai.endpoint:#{null}}")
    private String endpoint;
    @Value(value = "${spring.ai.azure.openai.api-key:#{null}}")
    private String apiKey;

    @Value(value = "${spring.ai.azure.openai.chat.options.deployment-name:#{null}}")
    private String azureModel;
    @Value(value = "${spring.ai.bedrock.anthropic3.chat.model:#{null}}")
    private String model;
   
    @Value(value = "${spring.ai.bedrock.region}")
    private String region;

    @Bean
    @Primary
    @Qualifier("chatClientGpt")
    @ConditionalOnProperty("spring.ai.azure.openai.chat.options.deployment-name")
    public ChatModel azureOpenAiChatClientGpt() {
        return getAzureOpenAIModel(getAzureModel());
    }

    @Bean
    @Primary
    @Qualifier("chatClientAnthropic")
    @ConditionalOnProperty("spring.ai.bedrock.anthropic3.chat.model")
    public ChatModel chatClientAnthropic() {
        return getAnthropicModel(getModel());
    }

    private ChatModel getAnthropicModel(String model) {
        return new BedrockAnthropic3ChatModel(new Anthropic3ChatBedrockApi(
            model,
            DefaultCredentialsProvider.create(),
            getRegion(),
            new ObjectMapper(),
            Duration.ofMillis(20000)),
            Anthropic3ChatOptions.builder()
                .withTemperature(TEMPERATURE)
                .withMaxTokens(1000)
                .withAnthropicVersion(DEFAULT_ANTHROPIC_VERSION)
                .build());
    }

    private ChatModel getAzureOpenAIModel(String model) {
        return new AzureOpenAiChatModel(
            new OpenAIClientBuilder().endpoint(endpoint).credential(new KeyCredential(apiKey)).buildClient()
            , AzureOpenAiChatOptions.builder()
            .withTemperature(TEMPERATURE)
            .withDeploymentName(model)
            .build());
    }
}

I wonder if this is a bug in the stream library call for the underlying Azure OpenAI client or a mistake on the configuration on my side.

Any pointers appreciated!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions