You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If necessary, a semantic cache can be enabled to maintain a fixed number of questions and answers previously asked to the LLM, thus reducing the number of API calls.
170
+
171
+
The `@CacheResult` annotation enables semantic caching and can be used at the class or method level. When used at the class level, it indicates that all methods of the AiService will perform a cache lookup before making a call to the LLM. This approach provides a convenient way to enable the caching for all methods of a `@RegisterAiService`.
172
+
173
+
[source,java]
174
+
----
175
+
@RegisterAiService
176
+
@CacheResult
177
+
@SystemMessage("...")
178
+
public interface LLMService {
179
+
// Cache is enabled for all methods
180
+
...
181
+
}
182
+
183
+
----
184
+
185
+
On the other hand, using `@CacheResult` at the method level allows fine-grained control over where the cache is enabled.
186
+
187
+
[source,java]
188
+
----
189
+
@RegisterAiService
190
+
@SystemMessage("...")
191
+
public interface LLMService {
192
+
193
+
@CacheResult
194
+
@UserMessage("...")
195
+
public String method1(...); // Cache is enabled for this method
196
+
197
+
@UserMessage("...")
198
+
public String method2(...); // Cache is not enabled for this method
199
+
}
200
+
201
+
----
202
+
203
+
[IMPORTANT]
204
+
====
205
+
Each method annotated with `@CacheResult` will have its own cache shared by all users.
206
+
====
207
+
208
+
=== Cache properties
209
+
210
+
The following properties can be used to customize the cache configuration:
211
+
212
+
- `quarkus.langchain4j.cache.threshold`: Specifies the threshold used during semantic search to determine whether a cached result should be returned. This threshold defines the similarity measure between new queries and cached entries. (`default 1`)
213
+
- `quarkus.langchain4j.cache.max-size`: Sets the maximum number of messages to cache. This property helps control memory usage by limiting the size of each cache. (`default 10`)
214
+
- `quarkus.langchain4j.cache.ttl`: Defines the time-to-live for messages stored in the cache. Messages that exceed the TTL are automatically removed. (`default 5m`)
215
+
- `quarkus.langchain4j.cache.embedding.name`: Specifies the name of the embedding model to use.
216
+
- `quarkus.langchain4j.cache.embedding.query-prefix`: Adds a prefix to each "query" value before performing the embedding operation.
217
+
- `quarkus.langchain4j.cache.embedding.response-prefix`: Adds a prefix to each "response" value before performing the embedding operation.
218
+
219
+
By default, the cache uses the default embedding model provided by the LLM. If there are multiple embedding providers, the `quarkus.langchain4j.cache.embedding.name` property can be used to choose which one to use.
220
+
221
+
In the following example, there are two different embedding providers
The `cacheProviderSupplier` attribute of the `@RegisterAiService` annotation enables configuring the `AiCacheProvider`. The default value of this annotation is `RegisterAiService.BeanAiCacheProviderSupplier.class` which means that the AiService will use whatever `AiCacheProvider` bean is configured by the application or the default one provided by the extension.
320
+
321
+
The extension provides a default implementation of `AiCacheProvider` which does two things:
322
+
323
+
* It uses whatever bean `AiCacheStore` bean is configured, as the cache store. The default implementation is `InMemoryAiCacheStore`.
324
+
** If the application provides its own `AiCacheStore` bean, that will be used instead of the default `InMemoryAiCacheStore`.
325
+
326
+
* It leverages the available configuration options under `quarkus.langchain4j.cache` to construct the `AiCacheProvider`.
327
+
** The default configuration values result in the usage of `FixedAiCache` with a size of ten.
@@ -280,10 +448,7 @@ This guidance aims to cover all crucial aspects of designing AI services with Qu
280
448
By default, @RegisterAiService annotated interfaces don't moderate content. However, users can opt in to having the LLM moderate
281
449
content by annotating the method with `@Moderate`.
282
450
283
-
For moderation to work, the following criteria need to be met:
284
-
285
-
* A CDI bean for `dev.langchain4j.model.moderation.ModerationModel` must be configured (the `quarkus-langchain4j-openai` and `quarkus-langchain4j-azure-openai` provide one out of the box)
286
-
* The interface must be configured with `@RegisterAiService(moderationModelSupplier = RegisterAiService.BeanModerationModelSupplier.class)`
451
+
For moderation to work, a CDI bean for `dev.langchain4j.model.moderation.ModerationModel` must be configured (the `quarkus-langchain4j-openai` and `quarkus-langchain4j-azure-openai` provide one out of the box).
287
452
288
453
=== Advanced usage
289
454
An alternative to providing a CDI bean is to configure the interface with `@RegisterAiService(moderationModelSupplier = MyCustomSupplier.class)`
0 commit comments