cloudflare · aninibread · Apr 11, 2025 · Apr 11, 2025 · Apr 11, 2025
@@ -7,7 +7,7 @@ sidebar:
 
 Chunking is the process of splitting large data into smaller segments before embedding them for search. AutoRAG uses **recursive chunking**, which breaks your content at natural boundaries (like paragraphs or sentences), and then further splits it if the chunks are too large.
 
-## What is recurisve chunking
+## What is recursive chunking
 
 Recursive chunking tries to keep chunks meaningful by:
 

@@ -15,13 +15,46 @@ AutoRAG will automatically scan and process supported files stored in that bucke
 
 AutoRAG has different file size limits depending on the file type:
 
-- Up to **4 MB** for files that are already in plain text or Markdown.
-- Up to **1 MB** for files that need to be converted into Markdown (like PDFs or other rich formats).
+- **Plain text files:** Up to **4 MB**
+- **Rich format files:** Up to **1 MB**
 
 Files that exceed these limits will not be indexed and will show up in the error logs.
 
 ## File types
 
-AutoRAG is powered by and accepts the same file types as [Markdown Conversion](/workers-ai/markdown-conversion/). The following table lists the supported formats:
+AutoRAG can ingest a variety of different file types to power your RAG. The following plain text files and rich format files are supported.
+
+### Plain text file types
+
+AutoRAG supports the following plain text file types:
+
+| Format     | File extensions                                                                | Mime Type                                                             |
+| ---------- | ------------------------------------------------------------------------------ | --------------------------------------------------------------------- |
+| Text       | `.txt`                                                                         | `text/plain`                                                          |
+| Log        | `.log`                                                                         | `text/plain`                                                          |
+| Config     | `.ini`, `.conf`, `.env`, `.properties`, `.gitignore`, `.editorconfig`, `.toml` | `text/plain`, `text/toml`                                             |
+| Markdown   | `.markdown`, `.md`, `.mdx`                                                     | `text/markdown`                                                       |
+| LaTeX      | `.tex`, `.latex`                                                               | `application/x-tex`, `application/x-latex`                            |
+| Script     | `.sh`, `.bat` , `.ps1`                                                         | `application/x-sh` , `application/x-msdos-batch`, `text/x-powershell` |
+| SGML       | `.sgml`                                                                        | `text/sgml`                                                           |
+| JSON       | `.json`                                                                        | `application/json`                                                    |
+| YAML       | `.yaml`, `.yml`                                                                | `application/x-yaml`                                                  |
+| CSS        | `.css`                                                                         | `text/css`                                                            |
+| JavaScript | `.js`                                                                          | `application/javascript`                                              |
+| PHP        | `.php`                                                                         | `application/x-httpd-php`                                             |
+| Python     | `.py`                                                                          | `text/x-python`                                                       |
+| Ruby       | `.rb`                                                                          | `text/x-ruby`                                                         |
+| Java       | `.java`                                                                        | `text/x-java-source`                                                  |
+| C          | `.c`                                                                           | `text/x-c`                                                            |
+| C++        | `.cpp`, `.cxx`                                                                 | `text/x-c++`                                                          |
+| C Header   | `.h`, `.hpp`                                                                   | `text/x-c-header`                                                     |
+| Go         | `.go`                                                                          | `text/x-go`                                                           |
+| Rust       | `.rs`                                                                          | `text/rust`                                                           |
+| Swift      | `.swift`                                                                       | `text/swift`                                                          |
+| Dart       | `.dart`                                                                        | `text/dart`                                                           |
+
+### Rich format file types
+
+AutoRAG uses [Markdown Conversion](/workers-ai/markdown-conversion/) to convert rich format files to markdown. The following table lists the supported formats that will be converted to Markdown:
 
 <Render file="markdown-conversion-support" product="workers-ai" />
@@ -13,7 +13,7 @@ The table below lists all available configuration options:
 
 | Configuration                                                                | Editable after creation | Description                                                                                |
 | ---------------------------------------------------------------------------- | ----------------------- | ------------------------------------------------------------------------------------------ |
-| [Data source](/autorag/configuration/data-source/)                           | no                      | The source where your knowledge base is stored (for example, R2 bucket)                            |
+| [Data source](/autorag/configuration/data-source/)                           | no                      | The source where your knowledge base is stored                                             |
 | [Chunk size](/autorag/configuration/chunking/)                               | yes                     | Number of tokens per chunk                                                                 |
 | [Chunk overlap](/autorag/configuration/chunking/)                            | yes                     | Number of overlapping tokens between chunks                                                |
 | [Embedding model](/autorag/configuration/models/)                            | no                      | Model used to generate vector embeddings                                                   |

@@ -30,9 +30,8 @@ AutoRAG lets you create fully-managed, retrieval-augmented generation (RAG) pipe
 
 You can use AutoRAG to build:
 
-- **Support chatbots:** Answer customer questions using your own product content.
-- **Internal tools:** Help teams quickly find the information they need using internal documentation.
-- **Enterprise knowledge search:** Make documentation easy to search and use.
+- **Product Chatbot:** Answer customer questions using your own product content.
+- **Docs Search:** Make documentation easy to search and use.
 
 <div>
 	<LinkButton href="/autorag/get-started">Get started</LinkButton>

@@ -7,11 +7,9 @@ sidebar:
 
 ## Pricing
 
-During the open beta, AutoRAG is **free to enable**. Compute operations for indexing, retrieval, and augmentation incur no additional cost during this phase.
+During the open beta, AutoRAG is **free to enable**. When you create an AutoRAG instance, it provisions and runs on top of Cloudflare services in your account. These resources are **billed as part of your Cloudflare usage**, and includes:
 
-When you create an AutoRAG instance, it provisions and runs on top of Cloudflare services provisioned within your own account. You retain full visibility and control over these resources, and they are billed as part of your existing Cloudflare usage. These services include:
-
-| Service                                          | Description                                                                               |
+| Service & Pricing                                | Description                                                                               |
 | ------------------------------------------------ | ----------------------------------------------------------------------------------------- |
 | [**R2**](/r2/pricing/)                           | Stores your source data                                                                   |
 | [**Vectorize**](/vectorize/platform/pricing/)    | Stores vector embeddings and powers semantic search                                       |
@@ -24,10 +22,10 @@ For more information about how each resource is used within AutoRAG, reference [
 
 The following limits currently apply to AutoRAG during the open beta:
 
-| Limit                             | Value                                                   |
-| --------------------------------- | ------------------------------------------------------- |
-| Max AutoRAG instances per account | 10                                                      |
-| Max files per AutoRAG             | 100,000                                                 |
-| Max file size                     | 4 MB (plain text or Markdown) / 1 MB (other file types) |
+| Limit                             | Value                                                                                                                                                             |
+| --------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| Max AutoRAG instances per account | 10                                                                                                                                                                |
+| Max files per AutoRAG             | 100,000                                                                                                                                                           |
+| Max file size                     | 4 MB ([Plain text](/autorag/configuration/data-source/#plain-text-file-types)) / 1 MB ([Rich format](/autorag/configuration/data-source/#rich-format-file-types)) |
 
 These limits are subject to change as AutoRAG evolves beyond open beta.
@@ -42,7 +42,6 @@ const answer = await env.AI.autorag("my-autorag").aiSearch({
 	ranking_options: {
 		score_threshold: 0.7,
 	},
-	stream: false,
 });
 ```
 
@@ -54,7 +53,44 @@ const answer = await env.AI.autorag("my-autorag").aiSearch({
 
 This is the response structure without `stream` enabled.
 
-<Render file="ai-search-response" product="autorag" />
+```sh output
+{
+    "object": "vector_store.search_results.page",
+    "search_query": "How do I train a llama to deliver coffee?",
+    "response": "To train a llama to deliver coffee:\n\n1. **Build trust** — Llamas appreciate patience (and decaf).\n2. **Know limits** — Max 3 cups per llama, per `llama-logistics.md`.\n3. **Use voice commands** — Start with \"Espresso Express!\"\n4.",
+    "data": [
+      {
+        "file_id": "llama001",
+        "filename": "docs/llama-logistics.md",
+        "score": 0.98,
+        "attributes": {},
+        "content": [
+          {
+            "id": "llama001",
+            "type": "text",
+            "text": "Llamas can carry 3 drinks max."
+          }
+        ]
+      },
+      {
+        "file_id": "llama042",
+        "filename": "docs/llama-commands.md",
+        "score": 0.95,
+        "attributes": {},
+        "content": [
+          {
+            "id": "llama042",
+            "type": "text",
+            "text": "Start with basic commands like 'Espresso Express!' Llamas love alliteration."
+          }
+        ]
+      },
+    ],
+    "has_more": false,
+    "next_page": null
+}
+
+```
 
 ## `search()`
 
@@ -77,7 +113,43 @@ const answer = await env.AI.autorag("my-autorag").search({
 
 ### Response
 
-<Render file="search-response" product="autorag" />
+```sh output
+{
+    "object": "vector_store.search_results.page",
+    "search_query": "How do I train a llama to deliver coffee?",
+    "data": [
+      {
+        "file_id": "llama001",
+        "filename": "docs/llama-logistics.md",
+        "score": 0.98,
+        "attributes": {},
+        "content": [
+          {
+            "id": "llama001",
+            "type": "text",
+            "text": "Llamas can carry 3 drinks max."
+          }
+        ]
+      },
+      {
+        "file_id": "llama042",
+        "filename": "docs/llama-commands.md",
+        "score": 0.95,
+        "attributes": {},
+        "content": [
+          {
+            "id": "llama042",
+            "type": "text",
+            "text": "Start with basic commands like 'Espresso Express!' Llamas love alliteration."
+          }
+        ]
+      },
+    ],
+    "has_more": false,
+    "next_page": null
+}
+
+```
 
 ## Local development