Skip to content

Commit ab4f5d1

Browse files
committed
Add cache documentation
1 parent 774216e commit ab4f5d1

File tree

1 file changed

+169
-4
lines changed

1 file changed

+169
-4
lines changed

docs/modules/ROOT/pages/ai-services.adoc

Lines changed: 169 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -163,6 +163,174 @@ quarkus.langchain4j.openai.m1.api-key=sk-...
163163
quarkus.langchain4j.huggingface.m2.api-key=sk-...
164164
----
165165

166+
[#cache]
167+
== Configuring the Cache
168+
169+
If necessary, a semantic cache can be enabled to maintain a fixed number of questions and answers previously asked to the LLM, thus reducing the number of API calls.
170+
171+
The `@CacheResult` annotation enables semantic caching and can be used at the class or method level. When used at the class level, it indicates that all methods of the AiService will perform a cache lookup before making a call to the LLM. This approach provides a convenient way to enable the caching for all methods of a `@RegisterAiService`.
172+
173+
[source,java]
174+
----
175+
@RegisterAiService
176+
@CacheResult
177+
@SystemMessage("...")
178+
public interface LLMService {
179+
// Cache is enabled for all methods
180+
...
181+
}
182+
183+
----
184+
185+
On the other hand, using `@CacheResult` at the method level allows fine-grained control over where the cache is enabled.
186+
187+
[source,java]
188+
----
189+
@RegisterAiService
190+
@SystemMessage("...")
191+
public interface LLMService {
192+
193+
@CacheResult
194+
@UserMessage("...")
195+
public String method1(...); // Cache is enabled for this method
196+
197+
@UserMessage("...")
198+
public String method2(...); // Cache is not enabled for this method
199+
}
200+
201+
----
202+
203+
[IMPORTANT]
204+
====
205+
Each method annotated with `@CacheResult` will have its own cache shared by all users.
206+
====
207+
208+
=== Cache properties
209+
210+
The following properties can be used to customize the cache configuration:
211+
212+
- `quarkus.langchain4j.cache.threshold`: Specifies the threshold used during semantic search to determine whether a cached result should be returned. This threshold defines the similarity measure between new queries and cached entries. (`default 1`)
213+
- `quarkus.langchain4j.cache.max-size`: Sets the maximum number of messages to cache. This property helps control memory usage by limiting the size of each cache. (`default 10`)
214+
- `quarkus.langchain4j.cache.ttl`: Defines the time-to-live for messages stored in the cache. Messages that exceed the TTL are automatically removed. (`default 5m`)
215+
- `quarkus.langchain4j.cache.embedding.name`: Specifies the name of the embedding model to use.
216+
- `quarkus.langchain4j.cache.embedding.query-prefix`: Adds a prefix to each "query" value before performing the embedding operation.
217+
- `quarkus.langchain4j.cache.embedding.response-prefix`: Adds a prefix to each "response" value before performing the embedding operation.
218+
219+
By default, the cache uses the default embedding model provided by the LLM. If there are multiple embedding providers, the `quarkus.langchain4j.cache.embedding.name` property can be used to choose which one to use.
220+
221+
In the following example, there are two different embedding providers
222+
223+
`pom.xml`:
224+
225+
[source,xml,subs=attributes+]
226+
----
227+
...
228+
<dependencies>
229+
<dependency>
230+
<groupId>io.quarkiverse.langchain4j</groupId>
231+
<artifactId>quarkus-langchain4j-openai</artifactId>
232+
<version>{project-version}</version>
233+
</dependency>
234+
<dependency>
235+
<groupId>io.quarkiverse.langchain4j</groupId>
236+
<artifactId>quarkus-langchain4j-watsonx</artifactId>
237+
<version>{project-version}</version>
238+
</dependency>
239+
<dependencies>
240+
...
241+
----
242+
243+
`application.properties`:
244+
245+
[source,properties]
246+
----
247+
# OpenAI configuration
248+
quarkus.langchain4j.service1.chat-model.provider=openai
249+
quarkus.langchain4j.service1.embedding-model.provider=openai
250+
quarkus.langchain4j.openai.service1.api-key=sk-...
251+
252+
# Watsonx configuration
253+
quarkus.langchain4j.service2.chat-model.provider=watsonx
254+
quarkus.langchain4j.service2.embedding-model.provider=watsonx
255+
quarkus.langchain4j.watsonx.service2.base-url=...
256+
quarkus.langchain4j.watsonx.service2.api-key=...
257+
quarkus.langchain4j.watsonx.service2.project-id=...
258+
quarkus.langchain4j.watsonx.service2.embedding-model.model-id=...
259+
260+
# The cache will use the embedding model provided by watsonx
261+
quarkus.langchain4j.cache.embedding.name=service2
262+
----
263+
264+
When an xref:in-process-embedding.adoc[in-process embedding model] must to be used:
265+
266+
`pom.xml`:
267+
268+
[source,xml,subs=attributes+]
269+
----
270+
...
271+
<dependencies>
272+
<dependency>
273+
<groupId>io.quarkiverse.langchain4j</groupId>
274+
<artifactId>quarkus-langchain4j-openai</artifactId>
275+
<version>{project-version}</version>
276+
</dependency>
277+
<dependency>
278+
<groupId>io.quarkiverse.langchain4j</groupId>
279+
<artifactId>quarkus-langchain4j-watsonx</artifactId>
280+
<version>{project-version}</version>
281+
</dependency>
282+
<dependency>
283+
<groupId>dev.langchain4j</groupId>
284+
<artifactId>langchain4j-embeddings-all-minilm-l6-v2</artifactId>
285+
<version>0.31.0</version>
286+
<exclusions>
287+
<exclusion>
288+
<groupId>dev.langchain4j</groupId>
289+
<artifactId>langchain4j-core</artifactId>
290+
</exclusion>
291+
</exclusions>
292+
</dependency>
293+
<dependencies>
294+
...
295+
----
296+
297+
`application.properties`:
298+
299+
[source,properties]
300+
----
301+
# OpenAI configuration
302+
quarkus.langchain4j.service1.chat-model.provider=openai
303+
quarkus.langchain4j.service1.embedding-model.provider=openai
304+
quarkus.langchain4j.openai.service1.api-key=sk-...
305+
306+
# Watsonx configuration
307+
quarkus.langchain4j.service2.chat-model.provider=watsonx
308+
quarkus.langchain4j.service2.embedding-model.provider=watsonx
309+
quarkus.langchain4j.watsonx.service2.base-url=...
310+
quarkus.langchain4j.watsonx.service2.api-key=...
311+
quarkus.langchain4j.watsonx.service2.project-id=...
312+
quarkus.langchain4j.watsonx.service2.embedding-model.model-id=...
313+
314+
# The cache will use the in-process embedding model AllMiniLmL6V2EmbeddingModel
315+
quarkus.langchain4j.embedding-model.provider=dev.langchain4j.model.embedding.AllMiniLmL6V2EmbeddingModel
316+
----
317+
318+
=== Advanced usage
319+
The `cacheProviderSupplier` attribute of the `@RegisterAiService` annotation enables configuring the `AiCacheProvider`. The default value of this annotation is `RegisterAiService.BeanAiCacheProviderSupplier.class` which means that the AiService will use whatever `AiCacheProvider` bean is configured by the application or the default one provided by the extension.
320+
321+
The extension provides a default implementation of `AiCacheProvider` which does two things:
322+
323+
* It uses whatever bean `AiCacheStore` bean is configured, as the cache store. The default implementation is `InMemoryAiCacheStore`.
324+
** If the application provides its own `AiCacheStore` bean, that will be used instead of the default `InMemoryAiCacheStore`.
325+
326+
* It leverages the available configuration options under `quarkus.langchain4j.cache` to construct the `AiCacheProvider`.
327+
** The default configuration values result in the usage of `FixedAiCache` with a size of ten.
328+
329+
[source,java]
330+
----
331+
@RegisterAiService(cacheProviderSupplier = CustomAiCacheProvider.class)
332+
----
333+
166334
[#memory]
167335
== Configuring the Context (Memory)
168336

@@ -280,10 +448,7 @@ This guidance aims to cover all crucial aspects of designing AI services with Qu
280448
By default, @RegisterAiService annotated interfaces don't moderate content. However, users can opt in to having the LLM moderate
281449
content by annotating the method with `@Moderate`.
282450

283-
For moderation to work, the following criteria need to be met:
284-
285-
* A CDI bean for `dev.langchain4j.model.moderation.ModerationModel` must be configured (the `quarkus-langchain4j-openai` and `quarkus-langchain4j-azure-openai` provide one out of the box)
286-
* The interface must be configured with `@RegisterAiService(moderationModelSupplier = RegisterAiService.BeanModerationModelSupplier.class)`
451+
For moderation to work, a CDI bean for `dev.langchain4j.model.moderation.ModerationModel` must be configured (the `quarkus-langchain4j-openai` and `quarkus-langchain4j-azure-openai` provide one out of the box).
287452

288453
=== Advanced usage
289454
An alternative to providing a CDI bean is to configure the interface with `@RegisterAiService(moderationModelSupplier = MyCustomSupplier.class)`

0 commit comments

Comments
 (0)