Apply suggestions from code review

aninibread · ToriLindsay · kodster28 · web-flow · commit ef10ff98ffc4 · 2025-04-03T17:19:10.000-04:00
Co-authored-by: ToriLindsay &lt;tgalatro@cloudflare.com&gt;
Co-authored-by: Kody Jackson &lt;kody@cloudflare.com&gt;
diff --git a/src/content/docs/autorag/configuration/cache.mdx b/src/content/docs/autorag/configuration/cache.mdx
@@ -5,25 +5,26 @@ sidebar:
   order: 6
 ---
 
-Similarity-based caching in AutoRAG lets you serve responses from Cloudflare’s cache for queries that are _similar enough_ to previous requests, not just exact matches. This speeds up response times and cuts costs by reusing answers for questions that are close in meaning.
+Similarity-based caching in AutoRAG lets you serve responses from Cloudflare’s cache for queries that are similar to previous requests, rather than creating new, unique responses for every request. This speeds up response times and cuts costs by reusing answers for questions that are close in meaning.
 
 ## How It Works
 
-Unlike basic caching, which only works for identical requests to compare prompts based on their content. When a request comes in:
+Unlike with basic caching, which creates a new response with every request, this is what happens when a request is received using similarity-based caching:
 
 1. AutoRAG checks if a _similar_ prompt (based on your chosen threshold) has been answered before.
 2. If a match is found, it returns the cached response instantly.
 3. If no match is found, it generates a new response and caches it.
 
 To see if a response came from the cache, check the `cf-aig-cache-status` header: `HIT` for cached and `MISS` for new.
 
-## Cache behavior
+## What to consider when using similarity cache
 
+Consider these behaviors when using similarity caching:
 - **Volatile Cache**: If two similar requests hit at the same time, the first might not cache in time for the second to use it, resulting in a `MISS`.
 - **30-Day Cache**: Cached responses last 30 days, then expire automatically. No custom durations for now.
 - **Data Dependency**: Cached responses are tied to specific document chunks. If those chunks change or get deleted, the cache clears to keep answers fresh.
 
-## How Similarity Matching Works
+## How similarity matching works
 
 Similarity caching in AutoRAG uses **MinHash with Locality-Sensitive Hashing (LSH)** to detect prompts that are lexically similar.
 
@@ -34,9 +35,9 @@ When a new prompt is received:
 3. Fingerprints are grouped into LSH buckets, which allow AutoRAG to quickly find past prompts that are likely to be similar without scanning every cached prompt.
 4. If a prompt in the same bucket meets the configured similarity threshold, its cached response is reused.
 
-## Choosing a Threshold
+## Choosing a threshold
 
-The similarity threshold decides how close two prompts need to be to reuse a cached response. Here’s what you can pick from:
+The similarity threshold decides how close two prompts need to be to reuse a cached response. Here are the available thresholds:
 
 | Threshold        | Description                 | Example Match                                                                   |
 | ---------------- | --------------------------- | ------------------------------------------------------------------------------- |
diff --git a/src/content/docs/autorag/configuration/chunking.mdx b/src/content/docs/autorag/configuration/chunking.mdx
@@ -18,7 +18,7 @@ AutoRAG exposes two parameters to help you control chunking behavior:
   - Minimum: `0%`
   - Maximum: `30%`
 
-These settings apply during the indexing step, before your data are embedded and stored in Vectorize.
+These settings apply during the indexing step, before your data is embedded and stored in Vectorize.
 
 ## Example
 
@@ -36,7 +36,7 @@ Chunking affects both how your content is retrieved and how much context is pass
 
 For chunk size, consider how:
 
-- **Smaller chunks** create more percise vector matches, but may split relevant ideas across multiple chunks.
+- **Smaller chunks** create more precise vector matches, but may split relevant ideas across multiple chunks.
 - **Larger chunks** retain more context, but may dilute relevance and reduce retrieval precision.
 
 For chunk overlap, consider how:
diff --git a/src/content/docs/autorag/configuration/data-source.mdx b/src/content/docs/autorag/configuration/data-source.mdx
@@ -11,16 +11,16 @@ AutoRAG currently supports Cloudflare R2 as the data source for storing your kno
 
 AutoRAG will automatically scan and process supported files stored in that bucket. Files that are unsupported or exceed the size limit will be skipped during indexing and logged as errors.
 
-## File Limit
+## File limits
 
 AutoRAG has different file size limits depending on the file type:
 
 - Up to **4 MB** for files that are already in plain text or Markdown.
 - Up to **1 MB** for files that need to be converted into Markdown (like PDFs or other rich formats).
 
-Files that exceed these limits won’t be indexed and will show up in the error logs.
+Files that exceed these limits will not be indexed and will show up in the error logs.
 
-## File Type
+## File types
 
 AutoRAG is powered by and accepts the same file types as [Markdown Conversion](/workers-ai/markdown-conversion/). The following table lists the supported formats:
 
diff --git a/src/content/docs/autorag/configuration/index.mdx b/src/content/docs/autorag/configuration/index.mdx
@@ -13,7 +13,7 @@ The table below lists all available configuration options:
 
 | Configuration                                                                | Editable after creation | Description                                                                                |
 | ---------------------------------------------------------------------------- | ----------------------- | ------------------------------------------------------------------------------------------ |
-| [Data source](/autorag/configuration/data-source/)                           | no                      | The source where your knowledge base is stored (e.g. R2 bucket)                            |
+| [Data source](/autorag/configuration/data-source/)                           | no                      | The source where your knowledge base is stored (for example, R2 bucket)                            |
 | [Chunk size](/autorag/configuration/chunking/)                               | yes                     | Number of tokens per chunk                                                                 |
 | [Chunk overlap](/autorag/configuration/chunking/)                            | yes                     | Number of overlapping tokens between chunks                                                |
 | [Embedding model](/autorag/configuration/models/)                            | no                      | Model used to generate vector embeddings                                                   |
@@ -31,5 +31,5 @@ The table below lists all available configuration options:
 | Service API token                                                            | yes                     | API token granted to AutoRAG to give it permission to configure resources on your account. |
 
 :::note[API token]
-Note that the Service API token is different from the AutoRAG API token that you can make to interact with your AutoRAG. The Service API token is only used by AutoRAG to get permissions to configure resources on your account.
+The Service API token is different from the AutoRAG API token that you can make to interact with your AutoRAG. The Service API token is only used by AutoRAG to get permissions to configure resources on your account.
 :::
diff --git a/src/content/docs/autorag/configuration/indexing.mdx b/src/content/docs/autorag/configuration/indexing.mdx
@@ -13,9 +13,9 @@ AutoRAG automatically monitors your data source for updates and reindexes your c
 
 ## Controls
 
-You can control indexing behavior through the following actions on the Dashboard:
+You can control indexing behavior through the following actions on the dashboard:
 
-- **Sync Index**: This forces AutoRAG to scan your data source for new or modified files and initiates an indexing job to update the associated Vectorize index. A new indexing job can be initiated **every 5 minutes**.
+- **Sync Index**: Force AutoRAG to scan your data source for new or modified files and initiate an indexing job to update the associated Vectorize index. A new indexing job can be initiated every 5 minutes.
 - **Pause Indexing**: Temporarily stop all scheduled indexing checks and reprocessing. Useful for debugging or freezing your knowledge base.
 
 ## Performance
@@ -25,10 +25,10 @@ AutoRAG processes files in parallel for efficient indexing. The total time to in
 Factors that affect performance include:
 
 - Total number of files and their sizes
-- File formats (e.g. images take longer than plain text)
+- File formats (for example, images take longer than plain text)
 - Latency of Workers AI models used for embedding and image processing
 
-## Best Practices
+## Best practices
 
 To ensure smooth and reliable indexing:
 
diff --git a/src/content/docs/autorag/configuration/models.mdx b/src/content/docs/autorag/configuration/models.mdx
@@ -18,17 +18,17 @@ AutoRAG leverages Workers AI models in the following stages:
 
 ## Model providers
 
-AutoRAG currently only supports **Workers AI** as the model provider. Usage of models through AutoRAG contributes to your Workers AI usage and is billed as part of your account.
+AutoRAG currently only supports [Workers AI](/workers-ai/) as the model provider. Usage of models through AutoRAG contributes to your Workers AI usage and is billed as part of your account.
 
-If you've connected your project to [AI Gateway](/ai-gateway), all model calls triggered by AutoRAG can be tracked in AI Gateway. This gives you full visibility into inputs, outputs, latency, and usage patterns.
+If you have connected your project to [AI Gateway](/ai-gateway), all model calls triggered by AutoRAG can be tracked in AI Gateway. This gives you full visibility into inputs, outputs, latency, and usage patterns.
 
 ## Choosing a model
 
-When configuring your AutoRAG instance, you can specify the exact model to use for each step of embedding, rewriting, and generation. You can find available model that can be used with AutoRAG in the **Settings** of your AutoRAG.
+When configuring your AutoRAG instance, you can specify the exact model to use for each step of embedding, rewriting, and generation. You can find available models that can be used with AutoRAG in the **Settings** of your AutoRAG.
 
 ### Smart default
 
-If you choose Smart Default in your model selection then AutoRAG will select a Cloudflare recommended model. These defaults may change over time as Cloudflare evaluates and updates model choices. You can switch to explicit model configuration at any time by visiting the Settings.
+If you choose **Smart Default** in your model selection, then AutoRAG will select a Cloudflare recommended model. These defaults may change over time as Cloudflare evaluates and updates model choices. You can switch to explicit model configuration at any time by visiting **Settings**.
 
 ### Per-request generation model override
 
diff --git a/src/content/docs/autorag/configuration/query-rewriting.mdx b/src/content/docs/autorag/configuration/query-rewriting.mdx
@@ -18,7 +18,7 @@ The wording of a user’s question may not match how your documents are written.
 - Removing filler words or irrelevant details
 - Incorporating domain-specific terminology
 
-This leads to more relevant vector matches, which in turn improves the accuracy of the final generated response.
+This leads to more relevant vector matches which improves the accuracy of the final generated response.
 
 ## Example
 
diff --git a/src/content/docs/autorag/configuration/retrieval-configuration.mdx b/src/content/docs/autorag/configuration/retrieval-configuration.mdx
@@ -14,7 +14,7 @@ AutoRAG uses the [`query()`](/vectorize/best-practices/query-vectors/) method fr
 
 ## Match threshold
 
-The `match_threshold` sets the minimum similarity score (e.g., cosine similarity) that a document chunk must meet to be included in the results. Threshold values range from `0` to `1`.
+The `match_threshold` sets the minimum similarity score (for example, cosine similarity) that a document chunk must meet to be included in the results. Threshold values range from `0` to `1`.
 
 - A higher threshold means stricter filtering, returning only highly similar matches.
 - A lower threshold allows broader matches, increasing recall but possibly reducing precision.
@@ -39,6 +39,6 @@ If no results meet the threshold, AutoRAG will not generate a response.
 
 ## Configuration
 
-These values can be configured at the AutoRAG instance level or overridden on a per-request basis using the [REST API](/autorag/usage/rest-api/) or the [Workers binding](/autorag/usage/workers-binding/).
+These values can be configured at the AutoRAG instance level or overridden on a per-request basis using the [REST API](/autorag/usage/rest-api/) or the [Workers Binding](/autorag/usage/workers-binding/).
 
 Use the parameters `match_threshold` and `max_num_results` to customize retrieval behavior per request.
diff --git a/src/content/docs/autorag/configuration/system-prompt.mdx b/src/content/docs/autorag/configuration/system-prompt.mdx
@@ -7,7 +7,7 @@ sidebar:
 
 System prompts allow you to guide the behavior of the text-generation models used by AutoRAG at query time. AutoRAG supports system prompt configuration in two steps:
 
-- **Query Rewriting**: Reformulates the original user query to improve semantic retrieval. A system prompt can guide how the model interprets and rewrites the query.
+- **Query rewriting**: Reformulates the original user query to improve semantic retrieval. A system prompt can guide how the model interprets and rewrites the query.
 - **Generation**: Generates the final response from retrieved context. A system prompt can help define how the model should format, filter, or prioritize information when constructing the answer.
 
 ## What is a system prompt?
@@ -23,12 +23,12 @@ System prompts are particularly useful for:
 
 ## Default system prompt
 
-When configuring your AutoRAG instance, you can provide your own system prompts. If you don’t provide a system prompt, AutoRAG will use the **default system prompt** provided by Cloudflare.
+When configuring your AutoRAG instance, you can provide your own system prompts. If you do not provide a system prompt, AutoRAG will use the **default system prompt** provided by Cloudflare.
 
 You can view the effective system prompt used for any AutoRAG's model call through AI Gateway logs, where model inputs and outputs are recorded.
 
 :::note
-The default system prompt can change and evolve over time to improve performance, and quality.
+The default system prompt can change and evolve over time to improve performance and quality.
 :::
 
 ## Query rewriting system prompt
@@ -98,6 +98,6 @@ If the available documents don't contain enough information to fully answer the
 Important:
 - Cite which document(s) you're drawing information from
 - Present information in order of relevance
-- If documents contradict each other, note this and explain your reasoning for the chosen answer`
-- Do not repeat the instructions;
+- If documents contradict each other, note this and explain your reasoning for the chosen answer
+- Do not repeat the instructions
 ```
diff --git a/src/content/docs/autorag/get-started.mdx b/src/content/docs/autorag/get-started.mdx
@@ -6,30 +6,30 @@ sidebar:
 head:
   - tag: title
     content: Get started with AutoRAG
-    Description: XX
+    Description: Get started creating fully-managed, retrieval-augmented generation pipelines with Cloudflare AutoRAG.
 ---
 
 AutoRAG allows developers to create fully managed retrieval-augmented generation (RAG) pipelines to power AI applications with accurate and up-to-date information without needing to manage infrastructure.
 
 ## 1. Upload data or use existing data in R2
 
-AutoRAG integrates with R2 for data import. Create an R2 bucket if you don’t have one and upload your data.
+AutoRAG integrates with R2 for data import. Create an R2 bucket if you do not have one and upload your data.
 
 :::note
 Before you create your first bucket, you must purchase R2 from the Cloudflare dashboard.
 :::
 
-To create and upload objects to your bucket from the Cloudflare Dashboard:
+To create and upload objects to your bucket from the Cloudflare dashboard:
 
-1. Log in to the [Cloudflare Dashboard](https://dash.cloudflare.com/?to=/:account/r2) and select **R2**.
+1. Log in to the [Cloudflare dashboard](https://dash.cloudflare.com/?to=/:account/r2) and select **R2**.
 2. Select Create bucket, name the bucket, and select **Create bucket**.
 3. Choose to either drag and drop your file into the upload area or **select from computer**.
 
 ## 2. Create an AutoRAG
 
 To create a new AutoRAG:
 
-1. Log in to the [Cloudflare Dashboard](https://dash.cloudflare.com/?to=/:account/ai/autorag) and select **AI** > **AutoRAG**.
+1. Log in to the [Cloudflare dashboard](https://dash.cloudflare.com/?to=/:account/ai/autorag) and select **AI** > **AutoRAG**.
 2. Select **Create AutoRAG**, configure the AutoRAG, and complete the setup process.
 3. Select **Create**.
 
diff --git a/src/content/docs/autorag/how-autorag-works.mdx b/src/content/docs/autorag/how-autorag-works.mdx
@@ -12,7 +12,7 @@ AutoRAG consists of two core processes:
 - **Indexing:** An asynchronous background process that monitors your data source for changes and transforms your data into vector representation for search.
 - **Querying:** A synchronous process triggered by user queries. It retrieves the most relevant content and generates context-aware responses using a large language model (LLM).
 
-## Indexing
+## How indexing works
 
 Indexing begins automatically when you create an AutoRAG instance and connect a data source. It runs asynchronously in the background and checks for updates periodically, so new or updated data are automatically indexed in the vector index.
 
@@ -22,25 +22,22 @@ Here is what happens during indexing:
 2. **Markdown conversion:** AutoRAG uses [Workers AI’s Markdown Conversion](/workers-ai/markdown-conversion/) to convert all data into structured Markdown. This ensures consistency across diverse file types. For images, Workers AI is used to perform object detection followed by vision-to-language transformation to convert images into Markdown text.
 3. **Chunking:** The extracted text is chunked into smaller pieces to improve retrieval granularity.
 4. **Embedding:** Each chunk is embedded using Workers AI’s embedding model to transform the content into vectors.
-5. **Vector storage:** The resulting vectors, along with metadata like source location and file name, are stored in a Cloudflare’s Vectorize database created on your account.
+5. **Vector storage:** The resulting vectors, along with metadata like source location and file name, are stored in a the Vectorize database created on your Cloudflare account.
 
 ![Indexing](~/assets/images/autorag/indexing.png)
 
-## Querying
+## How querying works
 
 Once indexing is complete, AutoRAG is ready to respond to end-user queries in real time.
 
-Here’s how the querying pipeline works:
+Here is how the querying pipeline works:
 
-1. **Receive query from AutoRAG API:** The query workflow begins when you send a request to either the AutoRAG’s AI Search or Search endpoint.
+1. **Receive query from AutoRAG API:** The query workflow begins when you send a request to either the AutoRAG’s AI Search or search endpoint.
 2. **Query rewriting (optional):** AutoRAG provides the option to rewrite the input query using one of Workers AI’s LLMs to improve retrieval quality by transforming the original query into a more effective search query.
 3. **Embedding the query:** The rewritten (or original) query is transformed into a vector via the same embedding model used to embed your data so that it can be compared against your vectorized data to find the most relevant matches.
 4. **Querying Vectorize index:** The query vector is searched against stored vectors in the associated Vectorize database for your AutoRAG.
-5. **Content retrieval:** Vectorize returns the most relevant chunks and their metadata. And the original content is retrieved from the R2 bucket. These are passed to a text-generation model.
+5. **Content retrieval:** Vectorize returns the most relevant chunks and their metadata and the original content is retrieved from the R2 bucket. These are passed to a text-generation model.
 6. **Response generation:** A text-generation model from Workers AI is used to generate a response using the retrieved content and the original user’s query.
 
 ![Querying](~/assets/images/autorag/querying.png)
 
-## Get Started
-
-Learn how to [get started](/autorag/get-started/) with AutoRAG.
diff --git a/src/content/docs/autorag/index.mdx b/src/content/docs/autorag/index.mdx
diff --git a/src/content/docs/autorag/usage/rest-api.mdx b/src/content/docs/autorag/usage/rest-api.mdx
diff --git a/src/content/docs/autorag/usage/workers-binding.mdx b/src/content/docs/autorag/usage/workers-binding.mdx