Skip to content

Commit 8d6ada8

Browse files
aninibreadToriLindsayirvinebroquekathaylkodster28
authored
WIP Autorag (#20865)
* New autrag product section * Added placeholder file to new autorag folder * Added Overview and Get Started files with frontmatter * made autorag not capitalized * lowercase * Delete src/content/docs/AutoRAG/autorag.mdx * Delete src/content/docs/AutoRAG/index.mdx * Update src/content/docs/autorag/autorag.mdx * Update src/content/docs/autorag/autorag.mdx * Rename autorag.mdx to get-started.mdx * removed externals and algolia from yaml * Update src/content/docs/autorag/index.mdx * getting started + bindings * Update src/content/docs/autorag/get-started.mdx Co-authored-by: Brendan Irvine-Broque <[email protected]> * Update src/content/docs/autorag/get-started.mdx Co-authored-by: Brendan Irvine-Broque <[email protected]> * progress * progress 2 * progress 3 * completed content * mostly there * fix link * deep link and small fixes * added response structure * small fix * fix * pricing * Update how-autorag-works.mdx fixed spelling mistake * fix general structure and small issues * add references * added image fix links * Update src/content/docs/autorag/platform/release-note.mdx Co-authored-by: ToriLindsay <[email protected]> * small fix * remove extra link * Apply suggestions from code review Co-authored-by: ToriLindsay <[email protected]> Co-authored-by: Kody Jackson <[email protected]> * Apply suggestions from code review Co-authored-by: ToriLindsay <[email protected]> Co-authored-by: Kody Jackson <[email protected]> * index edit * small fixes * Update src/content/docs/autorag/how-autorag-works.mdx Co-authored-by: ToriLindsay <[email protected]> * Update how-autorag-works.mdx * edits for new content / structure * add tutorials * Update src/content/docs/autorag/platform/limits-pricing.mdx * binding fix and changelog addition * fix doc recommendation * better wording in cache * autorag changelog * final fixes --------- Co-authored-by: ToriLindsay <[email protected]> Co-authored-by: Brendan Irvine-Broque <[email protected]> Co-authored-by: Kathy <[email protected]> Co-authored-by: kodster28 <[email protected]>
1 parent 9bc1757 commit 8d6ada8

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

42 files changed

+1687
-242
lines changed

src/assets/images/autorag/RAG.png

37.6 KB
Loading
187 KB
Loading
192 KB
Loading
131 KB
Loading
10.6 MB
Loading
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
---
2+
title: Create fully-managed RAG pipelines for your AI applications with AutoRAG
3+
description: AutoRAG lets you create fully-managed, retrieval-augmented generation (RAG) pipelines that continuously updates and scales on Cloudflare.
4+
date: 2025-04-07T6:00:00Z
5+
hidden: true
6+
---
7+
8+
[AutoRAG](/autorag) is now in open beta, making it easy for you to build fully-managed retrieval-augmented generation (RAG) pipelines without managing infrasturcture. Just upload your docs to [R2](/r2/get-started/), and AutoRAG handles the rest: embeddings, indexing, retrieval, and response generation via API.
9+
10+
![AutoRAG open beta demo](~/assets/images/changelog/autorag/autorag-open-beta.gif)
11+
12+
With AutoRAG, you can:
13+
14+
- **Customize your pipeline:** Choose from [Workers AI](/workers-ai) models, configure chunking strategies, edit system prompts, and more.
15+
- **Instant setup:** AutoRAG provisions everything you need from [Vectorize](/vectorize), [AI gateway](/ai-gateway), to pipeline logic for you, so you can go from zero to a working RAG pipeline in seconds.
16+
- **Keep your index fresh:** AutoRAG continuously syncs your index with your data source to ensure responses stay accurate and up to date.
17+
- **Ask questions:** Query your data and receive grounded responses via a [Workers binding](/autorag/usage/workers-binding/) or [API](/autorag/usage/rest-api/).
18+
19+
Whether you're building internal tools, AI-powered search, or a support assistant, AutoRAG gets you from idea to deployment in minutes.
20+
21+
Get started in the [Cloudflare dashboard](https://dash.cloudflare.com/?to=/:account/ai/autorag) or check out the [guide](/autorag/get-started/) for instructions on how to build your RAG pipeline today.
Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
---
2+
pcx_content_type: concept
3+
title: How AutoRAG works
4+
sidebar:
5+
order: 2
6+
---
7+
8+
AutoRAG sets up and manages your RAG pipeline for you. It connects the tools needed for indexing, retrieval, and generation, and keeps everything up to date by syncing with your data with the index regularly. Once set up, AutoRAG indexes your content in the background and responds to queries in real time.
9+
10+
AutoRAG consists of two core processes:
11+
12+
- **Indexing:** An asynchronous background process that monitors your data source for changes and converts your data into vectors for search.
13+
- **Querying:** A synchronous process triggered by user queries. It retrieves the most relevant content and generates context-aware responses.
14+
15+
## How indexing works
16+
17+
Indexing begins automatically when you create an AutoRAG instance and connect a data source.
18+
19+
Here is what happens during indexing:
20+
21+
1. **Data ingestion:** AutoRAG reads from your connected data source.
22+
2. **Markdown conversion:** AutoRAG uses [Workers AI’s Markdown Conversion](/workers-ai/markdown-conversion/) to convert [supported data types](/autorag/configuration/data-source/) into structured Markdown. This ensures consistency across diverse file types. For images, Workers AI is used to perform object detection followed by vision-to-language transformation to convert images into Markdown text.
23+
3. **Chunking:** The extracted text is [chunked](/autorag/configuration/chunking/) into smaller pieces to improve retrieval granularity.
24+
4. **Embedding:** Each chunk is embedded using Workers AI’s embedding model to transform the content into vectors.
25+
5. **Vector storage:** The resulting vectors, along with metadata like file name, are stored in a the [Vectorize](/vectorize/) database created on your Cloudflare account.
26+
27+
After the initial data set is indexed, AutoRAG will regularly check for updates in your data source (e.g. additions, updates, or deletes) and index changes to ensure your vector database is up to date.
28+
29+
![Indexing](~/assets/images/autorag/indexing.png)
30+
31+
## How querying works
32+
33+
Once indexing is complete, AutoRAG is ready to respond to end-user queries in real time.
34+
35+
Here is how the querying pipeline works:
36+
37+
1. **Receive query from AutoRAG API:** The query workflow begins when you send a request to either the AutoRAG’s [AI Search](/autorag/usage/rest-api/#ai-search) or [Search](/autorag/usage/rest-api/#search) endpoints.
38+
2. **Query rewriting (optional):** AutoRAG provides the option to [rewrite the input query](/autorag/configuration/query-rewriting/) using one of Workers AI’s LLMs to improve retrieval quality by transforming the original query into a more effective search query.
39+
3. **Embedding the query:** The rewritten (or original) query is transformed into a vector via the same embedding model used to embed your data so that it can be compared against your vectorized data to find the most relevant matches.
40+
4. **Querying Vectorize index:** The query vector is [queried](/vectorize/best-practices/query-vectors/) against stored vectors in the associated Vectorize database for your AutoRAG.
41+
5. **Content retrieval:** Vectorize returns the metadata of the most relevant chunks, and the original content is retrieved from the R2 bucket. If you are using the Search endpoint, the content is returned at this point.
42+
6. **Response generation:** If you are using the AI Search endpoint, then a text-generation model from Workers AI is used to generate a response using the retrieved content and the original user’s query, combined via a [system prompt](/autorag/configuration/system-prompt/). The context-aware response from the model is returned.
43+
44+
![Querying](~/assets/images/autorag/querying.png)
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
---
2+
pcx_content_type: navigation
3+
title: Concepts
4+
sidebar:
5+
order: 3
6+
group:
7+
hideIndex: true
8+
---
9+
10+
import { DirectoryListing } from "~/components";
11+
12+
<DirectoryListing />
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
---
2+
pcx_content_type: concept
3+
title: What is RAG
4+
sidebar:
5+
order: 1
6+
---
7+
8+
Retrieval-Augmented Generation (RAG) is a way to use your own data with a large language model (LLM). Instead of relying only on what the model was trained on, RAG searches for relevant information from your data source and uses it to help answer questions.
9+
10+
## How RAG works
11+
12+
Here’s a simplified overview of the RAG pipeline:
13+
14+
1. **Indexing:** Your content (e.g. docs, wikis, product information) is split into smaller chunks and converted into vectors using an embedding model. These vectors are stored in a vector database.
15+
2. **Retrieval:** When a user asks a question, it’s also embedded into a vector and used to find the most relevant chunks from the vector database.
16+
3. **Generation:** The retrieved content and the user’s original question are combined into a single prompt. An LLM uses that prompt to generate a response.
17+
18+
The resulting response should be accurate, relevant, and based on your own data.
19+
20+
![What is RAG](~/assets/images/autorag/RAG.png)
21+
22+
:::note[How does AutoRAG work]
23+
To learn more details about how AutoRAG uses RAG under the hood, reference [How AutoRAG works](/autorag/concepts/how-autorag-works/).
24+
:::
25+
26+
## Why use RAG?
27+
28+
RAG lets you bring your own data into LLM generation without retraining or fine-tuning a model. It improves both accuracy and trust by retrieving relevant content at query time and using that as the basis for a response.
29+
30+
Benefits of using RAG:
31+
32+
- **Accurate and current answers:** Responses are based on your latest content, not outdated training data.
33+
- **Control over information sources:** You define the knowledge base so answers come from content you trust.
34+
- **Fewer hallucinations:** Responses are grounded in real, retrieved data, reducing made-up or misleading answers.
35+
- **No model training required:** You can get high-quality results without building or fine-tuning your own LLM which can be time consuming and costly.
36+
37+
RAG is ideal for building AI-powered apps like:
38+
39+
- AI assistants for internal knowledge
40+
- Support chatbots connected to your latest content
41+
- Enterprise search across documentation and files
Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
---
2+
pcx_content_type: concept
3+
title: Similarity cache
4+
sidebar:
5+
order: 6
6+
---
7+
8+
Similarity-based caching in AutoRAG lets you serve responses from Cloudflare’s cache for queries that are similar to previous requests, rather than creating new, unique responses for every request. This speeds up response times and cuts costs by reusing answers for questions that are close in meaning.
9+
10+
## How It Works
11+
12+
Unlike with basic caching, which creates a new response with every request, this is what happens when a request is received using similarity-based caching:
13+
14+
1. AutoRAG checks if a _similar_ prompt (based on your chosen threshold) has been answered before.
15+
2. If a match is found, it returns the cached response instantly.
16+
3. If no match is found, it generates a new response and caches it.
17+
18+
To see if a response came from the cache, check the `cf-aig-cache-status` header: `HIT` for cached and `MISS` for new.
19+
20+
## What to consider when using similarity cache
21+
22+
Consider these behaviors when using similarity caching:
23+
24+
- **Volatile Cache**: If two similar requests hit at the same time, the first might not cache in time for the second to use it, resulting in a `MISS`.
25+
- **30-Day Cache**: Cached responses last 30 days, then expire automatically. No custom durations for now.
26+
- **Data Dependency**: Cached responses are tied to specific document chunks. If those chunks change or get deleted, the cache clears to keep answers fresh.
27+
28+
## How similarity matching works
29+
30+
AutoRAG’s similarity cache uses **MinHash and Locality-Sensitive Hashing (LSH)** to find and reuse responses for prompts that are worded similarly.
31+
32+
Here’s how it works when a new prompt comes in:
33+
34+
1. The prompt is split into small overlapping chunks of words (called shingles), like “what’s the” or “the weather.”
35+
2. These shingles are turned into a “fingerprint” using MinHash. The more overlap two prompts have, the more similar their fingerprints will be.
36+
3. Fingerprints are placed into LSH buckets, which help AutoRAG quickly find similar prompts without comparing every single one.
37+
4. If a past prompt in the same bucket is similar enough (based on your configured threshold), AutoRAG reuses its cached response.
38+
39+
## Choosing a threshold
40+
41+
The similarity threshold decides how close two prompts need to be to reuse a cached response. Here are the available thresholds:
42+
43+
| Threshold | Description | Example Match |
44+
| ---------------- | --------------------------- | ------------------------------------------------------------------------------- |
45+
| Exact | Near-identical matches only | "What’s the weather like today?" matches with "What is the weather like today?" |
46+
| Strong (default) | High semantic similarity | "What’s the weather like today?" matches with "How’s the weather today?" |
47+
| Broad | Moderate match, more hits | "What’s the weather like today?" matches with "Tell me today’s weather" |
48+
| Loose | Low similarity, max reuse | "What’s the weather like today?" matches with "Give me the forecast" |
49+
50+
Test these values to see which works best with your application.

0 commit comments

Comments
 (0)