You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: spring-ai-docs/src/main/antora/modules/ROOT/pages/api/chatclient.adoc
+53-24Lines changed: 53 additions & 24 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -62,7 +62,9 @@ There are several scenarios where you might need to work with multiple chat mode
62
62
* Providing users with a choice of models based on their preferences
63
63
* Combining specialized models (one for code generation, another for creative content, etc.)
64
64
65
-
By default, Spring AI autoconfigures a single `ChatClient.Builder` bean. However, you may need to work with multiple chat models in your application. Here's how to handle this scenario:
65
+
By default, Spring AI autoconfigures a single `ChatClient.Builder` bean.
66
+
However, you may need to work with multiple chat models in your application.
67
+
Here's how to handle this scenario:
66
68
67
69
In all cases, you need to disable the `ChatClient.Builder` autoconfiguration by setting the property `spring.ai.chat.client.enabled=false`.
68
70
@@ -157,7 +159,8 @@ public class ChatClientExample {
157
159
158
160
==== Multiple OpenAI-Compatible API Endpoints
159
161
160
-
The `OpenAiApi` and `OpenAiChatModel` classes provide a `mutate()` method that allows you to create variations of existing instances with different properties. This is particularly useful when you need to work with multiple OpenAI-compatible APIs.
162
+
The `OpenAiApi` and `OpenAiChatModel` classes provide a `mutate()` method that allows you to create variations of existing instances with different properties.
163
+
This is particularly useful when you need to work with multiple OpenAI-compatible APIs.
161
164
162
165
[source,java]
163
166
----
@@ -341,7 +344,8 @@ It does *not* affect templates used internally by xref:api/retrieval-augmented-g
341
344
342
345
If you'd rather use a different template engine, you can provide a custom implementation of the `TemplateRenderer` interface directly to the ChatClient. You can also keep using the default `StTemplateRenderer`, but with a custom configuration.
343
346
344
-
For example, by default, template variables are identified by the `{}` syntax. If you're planning to include JSON in your prompt, you might want to use a different syntax to avoid conflicts with JSON syntax. For example, you can use the `<` and `>` delimiters.
347
+
For example, by default, template variables are identified by the `{}` syntax.
348
+
If you're planning to include JSON in your prompt, you might want to use a different syntax to avoid conflicts with JSON syntax. For example, you can use the `<` and `>` delimiters.
345
349
346
350
[source,java]
347
351
----
@@ -361,15 +365,17 @@ After specifying the `call()` method on `ChatClient`, there are a few different
361
365
* `String content()`: returns the String content of the response
362
366
* `ChatResponse chatResponse()`: returns the `ChatResponse` object that contains multiple generations and also metadata about the response, for example how many token were used to create the response.
363
367
* `ChatClientResponse chatClientResponse()`: returns a `ChatClientResponse` object that contains the `ChatResponse` object and the ChatClient execution context, giving you access to additional data used during the execution of advisors (e.g. the relevant documents retrieved in a RAG flow).
364
-
* `ResponseEntity<?> responseEntity()`: returns a `ResponseEntity` containing the full HTTP response, including status code, headers, and body. This is useful when you need access to low-level HTTP details of the response.
368
+
* `ResponseEntity<?> responseEntity()`: returns a `ResponseEntity` containing the full HTTP response, including status code, headers, and body.
369
+
This is useful when you need access to low-level HTTP details of the response.
365
370
* `entity()` to return a Java type
366
371
** `entity(ParameterizedTypeReference<T> type)`: used to return a `Collection` of entity types.
367
372
** `entity(Class<T> type)`: used to return a specific entity type.
368
373
** `entity(StructuredOutputConverter<T> structuredOutputConverter)`: used to specify an instance of a `StructuredOutputConverter` to convert a `String` to an entity type.
369
374
370
375
You can also invoke the `stream()` method instead of `call()`.
371
376
372
-
NOTE: Calling the `call()` method does not actually trigger the AI model execution. Instead, it only instructs Spring AI whether to use synchronous or streaming calls. The actual AI model invocation occurs when methods such as `content()`, `chatResponse()`, and `responseEntity()` are called.
377
+
NOTE: Calling the `call()` method does not actually trigger the AI model execution. Instead, it only instructs Spring AI whether to use synchronous or streaming calls.
378
+
The actual AI model invocation occurs when methods such as `content()`, `chatResponse()`, and `responseEntity()` are called.
At the `ChatClient.Builder` level, you can specify the default prompt configuration.
589
595
590
-
* `defaultOptions(ChatOptions chatOptions)`: Pass in either portable options defined in the `ChatOptions` class or model-specific options such as those in `OpenAiChatOptions`. For more information on model-specific `ChatOptions` implementations, refer to the JavaDocs.
596
+
* `defaultOptions(ChatOptions chatOptions)`: Pass in either portable options defined in the `ChatOptions` class or model-specific options such as those in `OpenAiChatOptions`.
597
+
For more information on model-specific `ChatOptions` implementations, refer to the JavaDocs.
591
598
592
-
* `defaultFunction(String name, String description, java.util.function.Function<I, O> function)`: The `name` is used to refer to the function in user text. The `description` explains the function's purpose and helps the AI model choose the correct function for an accurate response. The `function` argument is a Java function instance that the model will execute when necessary.
599
+
* `defaultFunction(String name, String description, java.util.function.Function<I, O> function)`: The `name` is used to refer to the function in user text.
600
+
The `description` explains the function's purpose and helps the AI model choose the correct function for an accurate response.
601
+
The `function` argument is a Java function instance that the model will execute when necessary.
593
602
594
603
* `defaultFunctions(String... functionNames)`: The bean names of `java.util.Function`s defined in the application context.
595
604
596
-
* `defaultUser(String text)`, `defaultUser(Resource text)`, `defaultUser(Consumer<UserSpec> userSpecConsumer)`: These methods let you define the user text. The `Consumer<UserSpec>` allows you to use a lambda to specify the user text and any default parameters.
605
+
* `defaultUser(String text)`, `defaultUser(Resource text)`, `defaultUser(Consumer<UserSpec> userSpecConsumer)`: These methods let you define the user text.
606
+
The `Consumer<UserSpec>` allows you to use a lambda to specify the user text and any default parameters.
597
607
598
-
* `defaultAdvisors(Advisor... advisor)`: Advisors allow modification of the data used to create the `Prompt`. The `QuestionAnswerAdvisor` implementation enables the pattern of `Retrieval Augmented Generation` by appending the prompt with context information related to the user text.
608
+
* `defaultAdvisors(Advisor... advisor)`: Advisors allow modification of the data used to create the `Prompt`.
609
+
The `QuestionAnswerAdvisor` implementation enables the pattern of `Retrieval Augmented Generation` by appending the prompt with context information related to the user text.
599
610
600
-
* `defaultAdvisors(Consumer<AdvisorSpec> advisorSpecConsumer)`: This method allows you to define a `Consumer` to configure multiple advisors using the `AdvisorSpec`. Advisors can modify the data used to create the final `Prompt`. The `Consumer<AdvisorSpec>` lets you specify a lambda to add advisors, such as `QuestionAnswerAdvisor`, which supports `Retrieval Augmented Generation` by appending the prompt with relevant context information based on the user text.
611
+
* `defaultAdvisors(Consumer<AdvisorSpec> advisorSpecConsumer)`: This method allows you to define a `Consumer` to configure multiple advisors using the `AdvisorSpec`. Advisors can modify the data used to create the final `Prompt`.
612
+
The `Consumer<AdvisorSpec>` lets you specify a lambda to add advisors, such as `QuestionAnswerAdvisor`, which supports `Retrieval Augmented Generation` by appending the prompt with relevant context information based on the user text.
601
613
602
614
You can override these defaults at runtime using the corresponding methods without the `default` prefix.
603
615
@@ -622,14 +634,18 @@ A common pattern when calling an AI model with user text is to append or augment
622
634
623
635
This contextual data can be of different types. Common types include:
624
636
625
-
* **Your own data**: This is data the AI model hasn't been trained on. Even if the model has seen similar data, the appended contextual data takes precedence in generating the response.
637
+
* **Your own data**: This is data the AI model hasn't been trained on.
638
+
Even if the model has seen similar data, the appended contextual data takes precedence in generating the response.
626
639
627
-
* **Conversational history**: The chat model's API is stateless. If you tell the AI model your name, it won't remember it in subsequent interactions. Conversational history must be sent with each request to ensure previous interactions are considered when generating a response.
640
+
* **Conversational history**: The chat model's API is stateless.
641
+
If you tell the AI model your name, it won't remember it in subsequent interactions.
642
+
Conversational history must be sent with each request to ensure previous interactions are considered when generating a response.
628
643
629
644
630
645
=== Advisor Configuration in ChatClient
631
646
632
-
The ChatClient fluent API provides an `AdvisorSpec` interface for configuring advisors. This interface offers methods to add parameters, set multiple parameters at once, and add one or more advisors to the chain.
647
+
The ChatClient fluent API provides an `AdvisorSpec` interface for configuring advisors.
648
+
This interface offers methods to add parameters, set multiple parameters at once, and add one or more advisors to the chain.
633
649
634
650
[source,java]
635
651
----
@@ -641,7 +657,8 @@ interface AdvisorSpec {
641
657
}
642
658
----
643
659
644
-
IMPORTANT: The order in which advisors are added to the chain is crucial, as it determines the sequence of their execution. Each advisor modifies the prompt or the context in some way, and the changes made by one advisor are passed on to the next in the chain.
660
+
IMPORTANT: The order in which advisors are added to the chain is crucial, as it determines the sequence of their execution.
661
+
Each advisor modifies the prompt or the context in some way, and the changes made by one advisor are passed on to the next in the chain.
645
662
646
663
[source,java]
647
664
----
@@ -657,7 +674,8 @@ ChatClient.builder(chatModel)
657
674
.content();
658
675
----
659
676
660
-
In this configuration, the `MessageChatMemoryAdvisor` will be executed first, adding the conversation history to the prompt. Then, the `QuestionAnswerAdvisor` will perform its search based on the user's question and the added conversation history, potentially providing more relevant results.
677
+
In this configuration, the `MessageChatMemoryAdvisor` will be executed first, adding the conversation history to the prompt.
678
+
Then, the `QuestionAnswerAdvisor` will perform its search based on the user's question and the added conversation history, potentially providing more relevant results.
661
679
662
680
xref:ROOT:api/retrieval-augmented-generation.adoc#_questionansweradvisor[Learn about Question Answer Advisor]
663
681
@@ -670,7 +688,8 @@ Refer to the xref:ROOT:api/retrieval-augmented-generation.adoc[Retrieval Augment
670
688
The `SimpleLoggerAdvisor` is an advisor that logs the `request` and `response` data of the `ChatClient`.
671
689
This can be useful for debugging and monitoring your AI interactions.
672
690
673
-
TIP: Spring AI supports observability for LLM and vector store interactions. Refer to the xref:observability/index.adoc[Observability] guide for more information.
691
+
TIP: Spring AI supports observability for LLM and vector store interactions.
692
+
Refer to the xref:observability/index.adoc[Observability] guide for more information.
674
693
675
694
To enable logging, add the `SimpleLoggerAdvisor` to the advisor chain when creating your ChatClient.
676
695
It's recommended to add it toward the end of the chain:
@@ -720,13 +739,18 @@ TIP: Be cautious about logging sensitive information in production environments.
720
739
721
740
== Chat Memory
722
741
723
-
The interface `ChatMemory` represents a storage for chat conversation memory. It provides methods to add messages to a conversation, retrieve messages from a conversation, and clear the conversation history.
742
+
The interface `ChatMemory` represents a storage for chat conversation memory.
743
+
It provides methods to add messages to a conversation, retrieve messages from a conversation, and clear the conversation history.
724
744
725
745
There is currently one built-in implementation: `MessageWindowChatMemory`.
726
746
727
-
`MessageWindowChatMemory` is a chat memory implementation that maintains a window of messages up to a specified maximum size (default: 20 messages). When the number of messages exceeds this limit, older messages are evicted, but system messages are preserved. If a new system message is added, all previous system messages are removed from memory. This ensures that the most recent context is always available for the conversation while keeping memory usage bounded.
747
+
`MessageWindowChatMemory` is a chat memory implementation that maintains a window of messages up to a specified maximum size (default: 20 messages).
748
+
When the number of messages exceeds this limit, older messages are evicted, but system messages are preserved.
749
+
If a new system message is added, all previous system messages are removed from memory.
750
+
This ensures that the most recent context is always available for the conversation while keeping memory usage bounded.
728
751
729
-
The `MessageWindowChatMemory` is backed by the `ChatMemoryRepository` abstraction which provides storage implementations for the chat conversation memory. There are several implementations available, including the `InMemoryChatMemoryRepository`, `JdbcChatMemoryRepository`, `CassandraChatMemoryRepository` and `Neo4jChatMemoryRepository`.
752
+
The `MessageWindowChatMemory` is backed by the `ChatMemoryRepository` abstraction which provides storage implementations for the chat conversation memory.
753
+
There are several implementations available, including the `InMemoryChatMemoryRepository`, `JdbcChatMemoryRepository`, `CassandraChatMemoryRepository` and `Neo4jChatMemoryRepository`.
730
754
731
755
For more details and usage examples, see the xref:api/chat-memory.adoc[Chat Memory] documentation.
732
756
@@ -740,10 +764,15 @@ Often an application will be either reactive or imperative, but not both.
740
764
741
765
[IMPORTANT]
742
766
====
743
-
Due to a bug in Spring Boot 3.4, the "spring.http.client.factory=jdk" property must be set. Otherwise, it's set to "reactor" by default, which breaks certain AI workflows like the ImageModel.
767
+
Due to a bug in Spring Boot 3.4, the "spring.http.client.factory=jdk" property must be set.
768
+
Otherwise, it's set to "reactor" by default, which breaks certain AI workflows like the ImageModel.
744
769
====
745
770
746
-
* Streaming is only supported via the Reactive stack. Imperative applications must include the Reactive stack for this reason (e.g. spring-boot-starter-webflux).
747
-
* Non-streaming is only supportive via the Servlet stack. Reactive applications must include the Servlet stack for this reason (e.g. spring-boot-starter-web) and expect some calls to be blocking.
748
-
* Tool calling is imperative, leading to blocking workflows. This also results in partial/interrupted Micrometer observations (e.g. the ChatClient spans and the tool calling spans are not connected, with the first one remaining incomplete for that reason).
749
-
* The built-in advisors perform blocking operations for standards calls, and non-blocking operations for streaming calls. The Reactor Scheduler used for the advisor streaming calls can be configured via the Builder on each Advisor class.
771
+
* Streaming is only supported via the Reactive stack.
772
+
Imperative applications must include the Reactive stack for this reason (e.g. spring-boot-starter-webflux).
773
+
* Non-streaming is only supportive via the Servlet stack.
774
+
Reactive applications must include the Servlet stack for this reason (e.g. spring-boot-starter-web) and expect some calls to be blocking.
775
+
* Tool calling is imperative, leading to blocking workflows.
776
+
This also results in partial/interrupted Micrometer observations (e.g. the ChatClient spans and the tool calling spans are not connected, with the first one remaining incomplete for that reason).
777
+
* The built-in advisors perform blocking operations for standards calls, and non-blocking operations for streaming calls.
778
+
The Reactor Scheduler used for the advisor streaming calls can be configured via the Builder on each Advisor class.
0 commit comments