websites and 50 max results

aninibread · aninibread · commit fc27070946f5 · 2025-08-20T20:13:54.000-04:00
diff --git a/src/content/docs/autorag/configuration/data-source/index.mdx b/src/content/docs/autorag/configuration/data-source/index.mdx
@@ -0,0 +1,13 @@
+---
+title: Data source
+pcx_content_type: how-to
+sidebar:
+  order: 2
+---
+
+You can have AutoRAG ingest data directly from the following sources:
+
+| Data Source   | Description |
+|---------------|-------------|
+| [Website](/autorag/configuration/data-source/website/)   | Connect a domain you own to index website pages. |
+| [R2 Bucket](/autorag/configuration/data-source/r2/) | Connect a Cloudflare R2 bucket to index stored documents. |
diff --git a/src/content/docs/autorag/configuration/data-source/r2.mdx b/src/content/docs/autorag/configuration/data-source/r2.mdx
@@ -1,13 +1,13 @@
 ---
-title: Data source
+title: R2
 pcx_content_type: how-to
 sidebar:
   order: 2
 ---
 
 import { Render } from "~/components";
 
-AutoRAG currently supports Cloudflare R2 as the data source for storing your knowledge base. To get started, [configure an R2 bucket](/r2/get-started/) containing your data.
+You can use Cloudflare R2 to store data for indexing. To get started, [configure an R2 bucket](/r2/get-started/) containing your data.
 
 AutoRAG will automatically scan and process supported files stored in that bucket. Files that are unsupported or exceed the size limit will be skipped during indexing and logged as errors.
 
diff --git a/src/content/docs/autorag/configuration/data-source/website.mdx b/src/content/docs/autorag/configuration/data-source/website.mdx
@@ -0,0 +1,34 @@
+---
+title: Website
+pcx_content_type: how-to
+sidebar:
+  order: 2
+---
+
+The Website data source allows you to connect a domain you own so its pages can be crawled, stored, and indexed. You can only crawl domains that are part of the **same Cloudflare account**.  
+
+## How website crawling works
+When you connect a domain, the crawler looks for your site’s sitemap to determine which pages to visit:  
+
+1. The crawler first checks for a sitemap at `/sitemap.xml`.  
+2. If no sitemap is found, it checks `robots.txt` for listed sitemaps.  
+3. If no sitemap is available, the domain cannot be crawled.  
+
+Pages are visited in the order defined by your sitemap.  
+
+## Parsing options
+You can choose how pages are parsed during crawling:  
+
+- **Static sites**: Downloads the raw HTML for each page.  
+- **Rendered sites**: Loads pages with a headless browser and downloads the fully rendered version, including dynamic JavaScript content. Note that the [Browser Rendering](/browser-rendering/platform/pricing/) limits and billing apply.
+
+## Storage
+During setup, AutoRAG creates a dedicated R2 bucket in your account to store the pages that have been crawled and downloaded as HTML files. This bucket is automatically managed and is used only for content discovered by the crawler. Any files or objects that you add directly to this bucket will not be indexed.  
+
+## Sync and updates
+During scheduled or manual [sync jobs](/autorag/configuration/indexing/) the crawler will check for changes on your website. If a page changes, the updated version is stored in the R2 bucket and reindexed automatically so that your search results always reflect the latest content.  
+
+## Limits
+The regular AutoRAG [limits](/autorag/platform/limits-pricing/) apply when using the Website data source. 
+
+The crawler will download and index pages only up to the maximum object limit supported for an AutoRAG instance, and it processes the first set of pages it visits until that limit is reached. In addition, any files that are downloaded but exceed the file size limit will not be indexed.  
diff --git a/src/content/docs/autorag/configuration/indexing.mdx b/src/content/docs/autorag/configuration/indexing.mdx
@@ -9,7 +9,7 @@ AutoRAG automatically indexes your data into vector embeddings optimized for sem
 
 ## Jobs
 
-AutoRAG automatically monitors your data source for updates and reindexes your content every few hours. During each cycle, new or modified files are reprocessed to keep your Vectorize index up to date.
+AutoRAG automatically monitors your data source for updates and reindexes your content every **6 hours**. During each cycle, new or modified files are reprocessed to keep your Vectorize index up to date.
 
 You can monitor the status and history of all indexing activity in the Jobs tab, including real-time logs for each job to help you troubleshoot and verify successful syncs.
 
diff --git a/src/content/docs/autorag/platform/limits-pricing.mdx b/src/content/docs/autorag/platform/limits-pricing.mdx
@@ -15,6 +15,7 @@ During the open beta, AutoRAG is **free to enable**. When you create an AutoRAG
 | [**Vectorize**](/vectorize/platform/pricing/)    | Stores vector embeddings and powers semantic search                                       |
 | [**Workers AI**](/workers-ai/platform/pricing/)  | Handles image-to-Markdown conversion, embedding, query rewriting, and response generation |
 | [**AI Gateway**](/ai-gateway/reference/pricing/) | Monitors and controls model usage                                                         |
+| [**Browser Rendering**](/browser-rendering/platform/pricing/)     | Loads dynamic JavaScript content during [website](/autorag/configuration/data-source/website/) crawling with the Render option                            |
 
 For more information about how each resource is used within AutoRAG, reference [How AutoRAG works](/autorag/concepts/how-autorag-works/).
 
diff --git a/src/content/partials/autorag/ai-search-api-params.mdx b/src/content/partials/autorag/ai-search-api-params.mdx
@@ -18,7 +18,7 @@ Rewrites the original query into a search optimized query to improve retrieval a
 
 `max_num_results` <Type text="number" /> <MetaInfo text="optional" />
 
-The maximum number of results that can be returned from the Vectorize database. Defaults to `10`. Must be between `1` and `20`.
+The maximum number of results that can be returned from the Vectorize database. Defaults to `10`. Must be between `1` and `50`.
 
 `ranking_options` <Type text="object" /> <MetaInfo text="optional" />
 
diff --git a/src/content/partials/autorag/search-api-params.mdx b/src/content/partials/autorag/search-api-params.mdx
@@ -14,7 +14,7 @@ Rewrites the original query into a search optimized query to improve retrieval a
 
 `max_num_results` <Type text="number" /> <MetaInfo text="optional" />
 
-The maximum number of results that can be returned from the Vectorize database. Defaults to `10`. Must be between `1` and `20`.
+The maximum number of results that can be returned from the Vectorize database. Defaults to `10`. Must be between `1` and `50`.
 
 `ranking_options` <Type text="object" /> <MetaInfo text="optional" />
 
diff --git a/src/content/release-notes/autorag.yaml b/src/content/release-notes/autorag.yaml
@@ -3,6 +3,10 @@ link: "/autorag/platform/release-note/"
 productName: AutoRAG
 productLink: "/autorag/"
 entries:
+  - publish_date: "2025-08-20"
+    title: Increased maximum query results to 50
+    description: |-
+      The maximum number of results returned from a query has been increased from **20** to **50**. This allows you to surface more relevant matches in a single request.  
   - publish_date: "2025-07-16"
     title: Deleted files now removed from index on next sync
     description: |-