You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/content/docs/autorag/configuration/cache.mdx
+7-6Lines changed: 7 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,20 +20,21 @@ To see if a response came from the cache, check the `cf-aig-cache-status` header
20
20
## What to consider when using similarity cache
21
21
22
22
Consider these behaviors when using similarity caching:
23
+
23
24
-**Volatile Cache**: If two similar requests hit at the same time, the first might not cache in time for the second to use it, resulting in a `MISS`.
24
25
-**30-Day Cache**: Cached responses last 30 days, then expire automatically. No custom durations for now.
25
26
-**Data Dependency**: Cached responses are tied to specific document chunks. If those chunks change or get deleted, the cache clears to keep answers fresh.
26
27
27
28
## How similarity matching works
28
29
29
-
Similarity caching in AutoRAG uses **MinHash with Locality-Sensitive Hashing (LSH)** to detect prompts that are lexically similar.
30
+
AutoRAG’s similarity cache uses **MinHash and Locality-Sensitive Hashing (LSH)** to find and reuse responses for prompts that are worded similarly.
30
31
31
-
When a new prompt is received:
32
+
Here’s how it works when a new prompt comes in:
32
33
33
-
1. The prompt is broken into overlapping token sequences (called _shingles_), typically 2–3 words each.
34
-
2. These shingles are hashed into a compact fingerprint using the MinHash algorithm. Prompts with more overlapping shingles will havemore similar fingerprints.
35
-
3. Fingerprints are grouped into LSH buckets, which allow AutoRAG to quickly find past prompts that are likely to be similar without scanning every cached prompt.
36
-
4. If a prompt in the same bucket meets the configured similarity threshold, its cached response is reused.
34
+
1. The prompt is split into small overlapping chunks of words (called shingles), like “what’s the” or “the weather.”
35
+
2. These shingles are turned into a “fingerprint” using MinHash. The more overlap two prompts have, the more similar their fingerprints will be.
36
+
3. Fingerprints are placed into LSH buckets, which help AutoRAG quickly find similar prompts without comparing every single one.
37
+
4. If a past prompt in the same bucket is similar enough (based on your configured threshold), AutoRAG reuses its cached response.
0 commit comments