-
Notifications
You must be signed in to change notification settings - Fork 2k
Description
Bug description
(Note: this is all a local test for now, but, the actual models being called are "production-ready", so to say so its a representative example)
When calling the below code from an API client, like Postman:
@GetMapping(value = "/streamed-completions-get", produces = TEXT_EVENT_STREAM_VALUE)
public ResponseEntity<Flux<ChatResponse>> liveCompletionsStreamed(HttpServletResponse httpServletResponse) {
httpServletResponse.setHeader("Content-Security-Policy",
"default-src 'self'; connect-src 'self' http://localhost:9841");
httpServletResponse.setStatus(HttpServletResponse.SC_OK);
httpServletResponse.setContentType(TEXT_EVENT_STREAM_VALUE);
return ResponseEntity.ok(((StreamingModel)chatModel).stream(new Prompt("Explain me how to configure the server port in springboot with examples")));
}
the expected behavior would be that the response is always streamed, meaning that chunks are received over time instead of all at once.
This works as expected when connecting to Anthropic models:
spring:
ai:
bedrock:
anthropic3:
chat:
enabled: true
model: anthropic.claude-3-haiku-20240307-v1:0
See below the screenshot for evidence that the response is streamed back:
However, then, I simply stop the app, "swap" the credentials to use the Azure OpenAI Chat Client and the answer is instead, all "streamed at once":
What I expected is that the answer would also be streamed just like it is when using Anthropic models.
Environment
Using:
Spring AI: 1.0.0-SNAPSHOT (bleeding edge release, received yesterday's breaking changes)
SpringBoot: 3.2.4
Java: 21
Steps to reproduce
A streaming call with a ChatClient configured for Bedrock Anthropic3 models will stream.
The exact same code for an Azure OpenAI client will not stream in chunks, but all at once.
Expected behavior
Essentially, by only tweaking an environment variable (think model being used for example) that the streaming mechanics works out of the box as expected without any need for "custom code".
Minimal Complete Reproducible example
Example should be easy to reproduce, but, for further clarity, here are the beans Im using in a configuration class:
@Configuration
@Getter
@Setter
@EnableAutoConfiguration(exclude = {AzureOpenAiAutoConfiguration.class, BedrockAnthropic3ChatAutoConfiguration.class})
public class LargeLangModelConfiguration {
private static final float TEMPERATURE = 0.0f;
@Value(value = "${spring.ai.azure.openai.endpoint:#{null}}")
private String endpoint;
@Value(value = "${spring.ai.azure.openai.api-key:#{null}}")
private String apiKey;
@Value(value = "${spring.ai.azure.openai.chat.options.deployment-name:#{null}}")
private String azureModel;
@Value(value = "${spring.ai.bedrock.anthropic3.chat.model:#{null}}")
private String model;
@Value(value = "${spring.ai.bedrock.region}")
private String region;
@Bean
@Primary
@Qualifier("chatClientGpt")
@ConditionalOnProperty("spring.ai.azure.openai.chat.options.deployment-name")
public ChatModel azureOpenAiChatClientGpt() {
return getAzureOpenAIModel(getAzureModel());
}
@Bean
@Primary
@Qualifier("chatClientAnthropic")
@ConditionalOnProperty("spring.ai.bedrock.anthropic3.chat.model")
public ChatModel chatClientAnthropic() {
return getAnthropicModel(getModel());
}
private ChatModel getAnthropicModel(String model) {
return new BedrockAnthropic3ChatModel(new Anthropic3ChatBedrockApi(
model,
DefaultCredentialsProvider.create(),
getRegion(),
new ObjectMapper(),
Duration.ofMillis(20000)),
Anthropic3ChatOptions.builder()
.withTemperature(TEMPERATURE)
.withMaxTokens(1000)
.withAnthropicVersion(DEFAULT_ANTHROPIC_VERSION)
.build());
}
private ChatModel getAzureOpenAIModel(String model) {
return new AzureOpenAiChatModel(
new OpenAIClientBuilder().endpoint(endpoint).credential(new KeyCredential(apiKey)).buildClient()
, AzureOpenAiChatOptions.builder()
.withTemperature(TEMPERATURE)
.withDeploymentName(model)
.build());
}
}
I wonder if this is a bug in the stream library call for the underlying Azure OpenAI client or a mistake on the configuration on my side.
Any pointers appreciated!