Skip to content

Commit cb24263

Browse files
committed
completed content
1 parent 1d80199 commit cb24263

File tree

9 files changed

+330
-29
lines changed

9 files changed

+330
-29
lines changed
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
---
2+
pcx_content_type: concept
3+
title: Chunking
4+
sidebar:
5+
order: 6
6+
---
7+
8+
Chunking is the process of splitting large data into smaller segments before embedding them for search. AutoRAG performs **fixed size chunking** during indexing to make your content retrievable at the right level of granularity.
9+
10+
## Chunking controls
11+
12+
AutoRAG exposes two parameters to help you control chunking behavior:
13+
14+
- **Chunk size**: The number of tokens per chunk.
15+
- Minimum: `64`
16+
- Maximum: `512`
17+
- **Chunk overlap**: The percentage of overlapping tokens between adjacent chunks.
18+
- Minimum: `0%`
19+
- Maximum: `30%`
20+
21+
These settings apply during the indexing step, before your data are embedded and stored in Vectorize.
22+
23+
## Example
24+
25+
Let’s say your document is tokenized as: `[The, quick, brown, fox, jumps, over, the, lazy, dog, ...]`
26+
27+
With **chunk size = 5** and **chunk_overlap = 40%** (i.e 2 tokens), your chunks will look like:
28+
29+
- Chunk 1: `[The, quick, brown, fox, jumps]`
30+
- Chunk 2: `[fox, jumps, over, the, lazy]`
31+
- Chunk 3: `[the, lazy, dog, ...]`
32+
33+
## Choosing chunk size and overlap
34+
35+
Chunking affects both how your content is retrieved and how much context is passed into the generation model.
36+
37+
For chunk size, consider how:
38+
39+
- **Smaller chunks** create more percise vector matches, but may split relevant ideas across multiple chunks.
40+
- **Larger chunks** retain more context, but may dilute relevance and reduce retrieval precision.
41+
42+
For chunk overlap, consider how:
43+
44+
- **More overlap** helps preserve continuity across boundaries, especially in flowing or narrative content.
45+
- **Less overlap** reduces indexing time and cost, but can miss context if key terms are split between chunks.
46+
47+
### Additional considerations:
48+
49+
- **Vector index size:** Smaller chunk sizes produce more chunks and more total vectors. Refer to the [Vectorize limits](/vectorize/platform/limits/) to ensure your configuration stays within the maximum allowed vectors per index.
50+
- **Generation model context window:** Generation models have a limited context window that must fit all retrieved chunks (`topK` × `chunk size`), the user query, and the model’s output. Be careful with large chunks or high topK values to avoid context overflows.
51+
- **Cost and performance:** Larger chunks and higher topK settings result in more tokens passed to the model, which can increase latency and cost. You can monitor this usage in [AI Gateway](/ai-gateway/).

src/content/docs/autorag/configuration/index.mdx

Lines changed: 22 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -11,21 +11,25 @@ When creating an AutoRAG instance, you can customize how your RAG pipeline inges
1111

1212
The table below lists all available configuration options:
1313

14-
| Configuration | Editable after creation | Description |
15-
| ---------------------------- | ----------------------- | ----------------------------------------------------------------------------------------- |
16-
| Data source | no | The source where your knowledge base is stored (e.g. R2 bucket) |
17-
| Chunk size | yes | Number of tokens per chunk |
18-
| Chunk overlap | yes | Number of overlapping tokens between chunks |
19-
| Embedding model | no | Model used to generate vector embeddings |
20-
| Query rewrite | yes | Enable or disable query rewriting before retrieval |
21-
| Query rewrite model | yes | Model used for query rewriting |
22-
| Query rewrite system prompt | yes | Custom system prompt to guide query rewriting behavior |
23-
| Match threshold | yes | Minimum similarity score required for a vector match |
24-
| Maximum number of results | yes | Maximum number of vector matches returned (`top_k`) |
25-
| Generation model | yes | Model used to generate the final response |
26-
| Generation system prompt | yes | Custom system prompt to guide response generation |
27-
| Similarity caching | yes | |
28-
| Similiarty caching threshold | yes | |
29-
| AI Gateway | yes | AI Gateway for monitoring and controlling model usage |
30-
| AutoRAG name | no | Name of your AutoRAG instance |
31-
| Service API token | yes | API token granted to AutoRAG to give it permission to configure resources on your account |
14+
| Configuration | Editable after creation | Description |
15+
| ---------------------------------------------------------------------------- | ----------------------- | ------------------------------------------------------------------------------------------ |
16+
| [Data source](/autorag/configuration/data-source/) | no | The source where your knowledge base is stored (e.g. R2 bucket) |
17+
| [Chunk size](/autorag/configuration/chunking/) | yes | Number of tokens per chunk |
18+
| [Chunk overlap](/autorag/configuration/chunking/) | yes | Number of overlapping tokens between chunks |
19+
| [Embedding model](/autorag/configuration/models/) | no | Model used to generate vector embeddings |
20+
| [Query rewrite](/autorag/configuration/query-rewriting/) | yes | Enable or disable query rewriting before retrieval |
21+
| [Query rewrite model](/autorag/configuration/models/) | yes | Model used for query rewriting |
22+
| [Query rewrite system prompt](/autorag/configuration/system-prompt/) | yes | Custom system prompt to guide query rewriting behavior |
23+
| [Match threshold](/autorag/configuration/retrieval-configuration/) | yes | Minimum similarity score required for a vector match |
24+
| [Maximum number of results](/autorag/configuration/retrieval-configuration/) | yes | Maximum number of vector matches returned (`top_k`) |
25+
| [Generation model](/autorag/configuration/models/) | yes | Model used to generate the final response |
26+
| [Generation system prompt](/autorag/configuration/system-prompt/) | yes | Custom system prompt to guide response generation |
27+
| [Similarity caching](/autorag/configuration/similarity-cache/) | yes | Enable or disable caching of responses for similar (not just exact) prompts |
28+
| [Similiarty caching threshold](/autorag/configuration/similarity-cache/) | yes | Controls how similar a new prompt must be to a previous one to reuse its cached response |
29+
| [AI Gateway](/ai-gateway/) | yes | AI Gateway for monitoring and controlling model usage |
30+
| AutoRAG name | no | Name of your AutoRAG instance |
31+
| Service API token | yes | API token granted to AutoRAG to give it permission to configure resources on your account. |
32+
33+
:::note[API token]
34+
Note that the Service API token is different from the AutoRAG API token that you can make to interact with your AutoRAG. The Service API token is only used by AutoRAG to get permissions to configure resources on your account.
35+
:::
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
---
2+
pcx_content_type: concept
3+
title: Models
4+
sidebar:
5+
order: 4
6+
---
7+
8+
AutoRAG uses models at multiple steps of the RAG pipeline. You can configure which models are used, or let AutoRAG automatically select defaults optimized for general use.
9+
10+
## Where models are used
11+
12+
AutoRAG leverages Workers AI models in the following stages:
13+
14+
- **Image to markdown conversion (if images are in data source)**: Converts image content to Markdown using object detection and captioning models.
15+
- **Embedding**: Transforms your documents and queries into vector representations for semantic search.
16+
- **Query rewriting (optional)**: Reformulates the user’s query to improve retrieval accuracy.
17+
- **Generation**: Produces the final response from retrieved context.
18+
19+
## Model providers
20+
21+
AutoRAG currently only supports **Workers AI** as the model provider. Usage of models through AutoRAG contributes to your Workers AI usage and is billed as part of your account.
22+
23+
If you've connected your project to [AI Gateway](../ai-gateway/), all model calls triggered by AutoRAG can be tracked in AI Gateway. This gives you full visibility into inputs, outputs, latency, and usage patterns.
24+
25+
## Choosing a model
26+
27+
When configuring your AutoRAG instance, you can specify the exact model to use for each step of embedding, rewriting, and generation. You can find available model that can be used with AutoRAG in the **Settings** of your AutoRAG.
28+
29+
### Smart default
30+
31+
If you choose Smart Default in your model selection then AutoRAG will select a Cloudflare recommended model. These defaults may change over time as Cloudflare evaluates and updates model choices. You can switch to explicit model configuration at any time by visiting the Settings.
32+
33+
### Per-request generation model override
34+
35+
While the generation model can be set globally at the AutoRAG instance level, you can also override it on a per-request basis in the [AI Search API](/autorag/use-autorag/rest-api/#ai-search). This is useful if your application requires dynamic selection of generation models based on context or user preferences.
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
---
2+
pcx_content_type: concept
3+
title: Query rewriting
4+
sidebar:
5+
order: 5
6+
---
7+
8+
Query rewriting is an optional step in the AutoRAG pipeline that improves retrieval quality by transforming the original user query into a more effective search query.
9+
10+
Instead of embedding the raw user input directly, AutoRAG can use a large language model (LLM) to rewrite the query based on a system prompt. The rewritten query is then used to perform the vector search.
11+
12+
## Why use query rewriting?
13+
14+
The wording of a user’s question may not match how your documents are written. Query rewriting helps bridge this gap by:
15+
16+
- Rephrasing informal or vague queries into precise, information-dense terms
17+
- Adding synonyms or related keywords
18+
- Removing filler words or irrelevant details
19+
- Incorporating domain-specific terminology
20+
21+
This leads to more relevant vector matches, which in turn improves the accuracy of the final generated response.
22+
23+
## Example
24+
25+
**Original query:** `how do i make this work when my api call keeps failing?`
26+
27+
**Rewritten query:** `API call failure troubleshooting authentication headers rate limiting network timeout 500 error`
28+
29+
In this example, the original query is conversational and vague. The rewritten version extracts the core problem (API call failure) and expands it with relevant technical terms and likely causes. These terms are much more likely to appear in documentation or logs, improving semantic matching during vector search.
30+
31+
## How it works
32+
33+
If query rewriting is enabled, AutoRAG performs the following:
34+
35+
1. Sends the **original user query** and the **query rewrite system prompt** to the configured LLM
36+
2. Receives the **rewritten query** from the model
37+
3. Embeds the rewritten query using the selected embedding model
38+
4. Performs vector search in your AutoRAG’s Vectorize index
39+
40+
For details on how to guide model behavior during this step, see the [system prompt](/autorag/configuration/system-prompt/) documentation.
Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
---
2+
pcx_content_type: concept
3+
title: Retrieval configuration
4+
sidebar:
5+
order: 5
6+
---
7+
8+
AutoRAG allows you to configure how content is retrieved from your vector index and used to generate a final response. Two options control this behavior:
9+
10+
- **Match threshold**: Minimum similarity score required for a vector match to be considered relevant.
11+
- **Maximum number of results**: Maximum number of top-matching results to return (`top_k`).
12+
13+
AutoRAG uses the [`query()`](/vectorize/best-practices/query-vectors/) method from [Vectorize](/vectorize/) to perform semantic search. This function compares the embedded query vector against the stored vectors in your index and returns the most similar results.
14+
15+
## Match threshold
16+
17+
The `match_threshold` sets the minimum similarity score (e.g., cosine similarity) that a document chunk must meet to be included in the results. Threshold values range from `0` to `1`.
18+
19+
- A higher threshold means stricter filtering, returning only highly similar matches.
20+
- A lower threshold allows broader matches, increasing recall but possibly reducing precision.
21+
22+
## Maximum number of results
23+
24+
This setting controls the number of top-matching chunks returned by Vectorize after filtering by similarity score. It corresponds to the `topK` parameter in `query()`. The maximum allowed value is 50.
25+
26+
- Use a higher value if you want to synthesize across multiple documents. However, providing more input to the model can increase latency and cost.
27+
- Use a lower value if you prefer concise answers with minimal context.
28+
29+
## How they work together
30+
31+
AutoRAG's retrieval step follows this sequence:
32+
33+
1. Your query is embedded using the configured Workers AI model.
34+
2. `query()` is called to search the Vectorize index, with `topK` set to the `maximum_number_of_results`.
35+
3. Results are filtered using the `match_threshold`.
36+
4. The filtered results are passed into the generation step as context.
37+
38+
If no results meet the threshold, AutoRAG will not generate a response.
39+
40+
## Configuration
41+
42+
These values can be configured at the AutoRAG instance level or overridden on a per-request basis using the [REST API](/autorag/use-autorag/rest-api/) or the [Workers binding](/autorag/use-autorag/workers-binding/).
43+
44+
Use the parameters `match_threshold` and `max_num_results` to customize retrieval behavior per request.

src/content/docs/autorag/configuration/similarity-cache.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
pcx_content_type: concept
33
title: Similarity cache
44
sidebar:
5-
order: 4
5+
order: 6
66
---
77

88
Similarity-based caching in AutoRAG lets you serve responses from Cloudflare’s cache for queries that are _similar enough_ to previous requests, not just exact matches. This speeds up response times and cuts costs by reusing answers for questions that are close in meaning.

0 commit comments

Comments
 (0)