changelog + docs updated

aninibread · aninibread · commit d76ebdd27e11 · 2025-06-15T15:26:48.000-04:00
diff --git a/public/__redirects b/public/__redirects
@@ -214,6 +214,7 @@
 
 #autorag
 /autorag/usage/recipes/ /autorag/how-to/ 301
+/autorag/configuration/metadata-filtering/ /autorag/configuration/metadata/ 301
 
 # bots
 /bots/about/plans/ /bots/plans/ 301
diff --git a/src/content/changelog/autorag/2025-06-16-autorag-custom-metadata-and-context.mdx b/src/content/changelog/autorag/2025-06-16-autorag-custom-metadata-and-context.mdx
@@ -0,0 +1,47 @@
+---
+title: View custom metadata in responses and guide AI-search with context in AutoRAG
+description: You can now view custom metadata in AutoRAG search responses and use a context field to provide additional guidance to AI-generated answers.
+products:
+  - autorag
+date: 2025-06-16T6:10:00Z
+---
+
+In [AutoRAG](/autorag/), you can now view your object's custom metadata in the response from [`/search`](/autorag/usage/workers-binding/) and [`/ai-search`](/autorag/usage/workers-binding/), and optionally add a `context` field in the custom metadata of an object to provide additional guidance for AI-generated answers.
+
+You can add [custom metadata](/r2/api/workers/workers-api-reference/#r2putoptions) to an object when uploading it to your R2 bucket. 
+
+# Object's custom metadata in search responses
+
+When you run a search, AutoRAG now returns any custom metadata associated with the object. This metadata appears in the response under `file`, inside `attributes`, and can be used for downstream processing.
+
+For example, the `attributes` section of your search response may look like:
+
+```json
+{
+  "attributes": {
+    "timestamp": 1750001460000,
+    "folder": "docs/",
+    "filename": "product-launch-checklist.md",
+    "file": {
+      "url": "https://wiki.company.com/docs/product-launch-checklist",
+      "context": "A checklist for internal launch readiness, including legal, engineering, and marketing steps."
+    }
+  }
+}
+```
+
+# Add a `context` field to guide LLM answers
+
+When you include a custom metadata field named `context`, AutoRAG attaches that value to each chunk of the file. When you run an `/ai-search` query, this `context` is passed to the LLM and can be used as additional input when generating an answer.
+
+We recommend using the `context` field to describe supplemental information you want the LLM to consider, such as a summary of the document or a source URL.
+
+For example:
+
+```json
+context: "summary: 'Checklist for internal product launch readiness, including legal, engineering, and marketing steps.'; url: 'https://wiki.company.com/docs/product-launch-checklist'"
+```
+
+This gives you more control over how your content is interpreted, without requiring you to modify the original contents of the file.
+
+Learn more in AutoRAG's [metadata filtering documentation](/autorag/configuration/metadata-filtering/).
diff --git a/src/content/changelog/autorag/2025-06-16-autorag-filename-filter.mdx b/src/content/changelog/autorag/2025-06-16-autorag-filename-filter.mdx
@@ -0,0 +1,28 @@
+---
+title: Filter your AutoRAG search by file name 
+description: You can now filter AutoRAG search queries by file name, allowing you to control which files can be retrieved for a given query. 
+products:
+  - autorag
+date: 2025-06-16T6:00:00Z
+---
+
+In [AutoRAG](/autorag/), you can now [filter](/autorag/configuration/metadata-filtering/) by an object's file name using the `filename` attribute, giving you more control over which files are searched for a given query.
+
+This is useful when your application has already determined which files should be searched. For example, you might query a PostgreSQL database to get a list of files a user has access to based on their permissions, and then use that list to limit what AutoRAG retrieves.
+
+For example, your search query may look like:
+
+```json
+const response = await env.AI.autorag("my-autorag").search({
+  query: "what is the project deadline?",
+  filters: {
+    type: "eq",
+    key: "filename",
+    value: "project-alpha-roadmap.md",
+  },
+});
+```
+
+This allows you to connect your application logic with AutoRAG's retrieval process, making it easy to control what gets searched without needing to reindex or modify your data.
+
+Learn more in AutoRAG's [metadata filtering documentation](/autorag/configuration/metadata-filtering/).
diff --git a/src/content/docs/autorag/autorag-api.mdx b/src/content/docs/autorag/autorag-api.mdx
@@ -0,0 +1,7 @@
+---
+pcx_content_type: navigation
+title: REST API
+external_link: /api/resources/autorag/
+sidebar:
+  order: 9
+---
diff --git a/src/content/docs/autorag/configuration/data-source.mdx b/src/content/docs/autorag/configuration/data-source.mdx
@@ -16,7 +16,7 @@ AutoRAG will automatically scan and process supported files stored in that bucke
 AutoRAG has different file size limits depending on the file type:
 
 - **Plain text files:** Up to **4 MB**
-- **Rich format files:** Up to **1 MB**
+- **Rich format files:** Up to **4 MB**
 
 Files that exceed these limits will not be indexed and will show up in the error logs.
 
@@ -30,7 +30,7 @@ AutoRAG supports the following plain text file types:
 
 | Format     | File extensions                                                                | Mime Type                                                             |
 | ---------- | ------------------------------------------------------------------------------ | --------------------------------------------------------------------- |
-| Text       | `.txt`                                                                         | `text/plain`                                                          |
+| Text       | `.txt`, `.rst`                                                                 | `text/plain`                                                          |
 | Log        | `.log`                                                                         | `text/plain`                                                          |
 | Config     | `.ini`, `.conf`, `.env`, `.properties`, `.gitignore`, `.editorconfig`, `.toml` | `text/plain`, `text/toml`                                             |
 | Markdown   | `.markdown`, `.md`, `.mdx`                                                     | `text/markdown`                                                       |
diff --git a/src/content/docs/autorag/configuration/metadata.mdx b/src/content/docs/autorag/configuration/metadata.mdx
@@ -1,12 +1,13 @@
 ---
 pcx_content_type: concept
-title: Metadata filtering
+title: Metadata
 sidebar:
   order: 6
 ---
 
 import { FileTree } from "~/components"
 
+## Metadata filtering
 Metadata filtering narrows down search results based on metadata, so only relevant content is retrieved. The filter narrows down results prior to retrieval, so that you only query the scope of documents that matter.
 
 Here is an example of metadata filtering using [Workers Binding](/autorag/usage/workers-binding/) but it can be easily adapted to use the [REST API](/autorag/usage/rest-api/) instead.
@@ -32,25 +33,19 @@ const answer = await env.AI.autorag("my-autorag").search({
 });
 ```
 
-## Metadata attributes
+### Metadata attributes
 
-You can currently filter by the `folder` and `timestamp` of an R2 object. Currently, custom metadata attributes are not supported.
+| Attribute |  Description | Example |
+| --- | --- | --- |
+| `filename` | The name of the file. | `dog.png` or `animals/mammals/cat.png` |
+| `folder` | The prefix or directory to the object. | For the object `animals/mammals/cat.png`, the folder is `animals/mammals/` |
+| `timestamp` | The timestamp for when the object was last modified. Comparisons are supported using a 13-digit Unix timestamp (milliseconds), but values will be rounded to 10 digits (seconds). | The timestamp `2025-01-01 00:00:00.999 UTC` is `1735689600999` and it will be rounded to `1735689600000`, corresponding to `2025-01-01 00:00:00 UTC` |
 
-### Folder
-
-The directory to the object. For example, the `folder` of the object at `llama/logistics/llama-logistics.mdx` is `llama/logistics/`. Note that the `folder` does not include a leading `/`.
-
-Note that `folder` filter only includes files exactly in that folder, so files in subdirectories are not included. For example, specifying `folder: "llama/"` will match files in `llama/` but does not match files in `llama/logistics`.
-
-### Timestamp
-
-The timestamp indicating when the object was last modified. Comparisons are supported using a 13-digit Unix timestamp (milliseconds), but values will be rounded to 10 digits (seconds). For example, `1735689600999` or `2025-01-01 00:00:00.999 UTC` will be rounded down to `1735689600000`, corresponding to `2025-01-01 00:00:00 UTC`.
-
-## Filter schema
+### Filter schema
 
 You can create simple comparison filters or an array of comparison filters using a compound filter.
 
-### Comparison filter
+#### Comparison filter
 
 You can compare a metadata attribute (for example, `folder` or `timestamp`) with a target value using a comparison filter.
 
@@ -73,7 +68,7 @@ The available operators for the comparison are:
 | `lt`     | Less than                 |
 | `lte`    | Less than or equals to    |
 
-### Compound filter
+#### Compound filter
 
 You can use a compound filter to combine multiple comparison filters with a logical operator.
 
@@ -93,7 +88,7 @@ Note the following limitations with the compound operators:
   - Only the `eq` operator is allowed.
   - All conditions must filter on the **same key** (for example, all on `folder`)
 
-### "Starts with" filter for folders
+#### "Starts with" filter for folders
 
 You can use "starts with" filtering on the `folder` metadata attribute to search for all files and subfolders within a specific path.
 
@@ -137,6 +132,25 @@ This filter identifies paths starting with `customer-a/` by using:
 
 Together, these conditions effectively select paths that begin with the provided path value.
 
+## Add a `context` field to guide AI Search
+You can optionally include a custom metadata field named `context` when uploading an object to your R2 bucket.
+
+The `context` field is attached to each chunk and passed to the LLM during an `/ai-search` query. It does not affect retrieval but helps the LLM interpret and frame the answer.
+
+The field can be used for providing document summaries, source links, or custom instructions without modifying the file content.
+
+You can add [custom metadata](/r2/api/workers/workers-api-reference/#r2putoptions) to an object in the `/PUT` operation when uploading the object to your R2 bucket. For example if you are using the [Workers binding with R2](/r2/api/workers/workers-api-usage/):
+
+```javascript
+await env.MY_BUCKET.put("cat.png", file, {
+  customMetadata: {
+    context: "This is a picture of Joe's cat. His name is Max."
+  }
+});
+```
+
+During `/ai-search`, this context appears in the response under `attributes.file.context`, and is included in the data passed to the LLM for generating a response.
+
 ## Response
 
 You can see the metadata attributes of your retrieved data in the response under the property `attributes` for each retrieved chunk. For example:
@@ -150,6 +164,10 @@ You can see the metadata attributes of your retrieved data in the response under
     "attributes": {
       "timestamp": 1735689600000,   // unix timestamp for 2025-01-01
       "folder": "llama/logistics/",
+      "file": {
+        "url": "www.llamasarethebest.com/logistics"
+        "context": "This file contains information about how llamas can logistically deliver coffee."
+      }
     },
     "content": [
       {
diff --git a/src/content/docs/autorag/platform/limits-pricing.mdx b/src/content/docs/autorag/platform/limits-pricing.mdx
@@ -26,6 +26,6 @@ The following limits currently apply to AutoRAG during the open beta:
 | --------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | Max AutoRAG instances per account | 10                                                                                                                                                                |
 | Max files per AutoRAG             | 100,000                                                                                                                                                           |
-| Max file size                     | 4 MB ([Plain text](/autorag/configuration/data-source/#plain-text-file-types)) / 1 MB ([Rich format](/autorag/configuration/data-source/#rich-format-file-types)) |
+| Max file size                     | 4 MB |
 
 These limits are subject to change as AutoRAG evolves beyond open beta.
diff --git a/src/content/release-notes/autorag.yaml b/src/content/release-notes/autorag.yaml
@@ -5,10 +5,27 @@ productLink: "/autorag/"
 productArea: Developer platform
 productAreaLink: /workers/platform/changelog/platform/
 entries:
+  - publish_date: "2025-06-16"
+    title: Rich format file size limit increased to 4 MB
+    description: |-
+      You can now index rich format files (e.g., PDF) up to 4 MB in size, up from the previous 1 MB limit.
+  - publish_date: "2025-06-12"
+    title: Index processing status
+    description: |-
+      The dashboard now includes a new “Processing” step for the indexing pipeline that displays the files currently being processed.
+  - publish_date: "2025-06-12"
+    title: Sync AutoRAG REST API published
+    description: |-
+      You can now trigger a sync job for an AutoRAG using the [Sync REST API](/api/resources/autorag/subresources/rags/methods/sync/). This scans your data source for changes and queues updated or previously errored files for indexing.
+  - publish_date: "2025-06-10"
+    title: Files modified in the data source will now be updated 
+    description: |-
+      Files modified in your source R2 bucket will now be updated in the AutoRAG index during the next sync. For example, if you upload a new version of an existing file, the changes will be reflected in the index after the subsequent sync job. 
+      Please note that deleted files are not yet removed from the index. We are actively working on this functionality.
   - publish_date: "2025-05-31"
     title: Errored files will now be retried in next sync
     description: |-
-      Files that fail to index will now be automatically retried in the next indexing job. For instance, if a file initially failed because it was oversized but was then corrected (e.g. replaced with a file of the same name/key within the size limit), it will be re-attempted during the next scheduled sync.
+      Files that failed to index will now be automatically retried in the next indexing job. For instance, if a file initially failed because it was oversized but was then corrected (e.g. replaced with a file of the same name/key within the size limit), it will be re-attempted during the next scheduled sync.
   - publish_date: "2025-05-31"
     title: Fixed character cutoff in recursive chunking
     description: |-