Skip to content

Commit 4bd4b2f

Browse files
Anni/autorag changelog 2 (cloudflare#23036)
* changelog + docs updated * test * add metadata description * update the release note message * fix links * metadata filtering link fixed * metadata filtering link removal * Update src/content/docs/autorag/how-to/multitenancy.mdx Co-authored-by: Greg Brimble <[email protected]> * Update src/content/docs/autorag/how-to/multitenancy.mdx Co-authored-by: Greg Brimble <[email protected]> * Update src/content/docs/autorag/configuration/metadata.mdx Co-authored-by: Greg Brimble <[email protected]> * Update src/content/changelog/autorag/2025-06-16-autorag-custom-metadata-and-context.mdx Co-authored-by: Greg Brimble <[email protected]> * Update src/content/changelog/autorag/2025-06-16-autorag-custom-metadata-and-context.mdx Co-authored-by: Greg Brimble <[email protected]> * small fixes * Update src/content/changelog/autorag/2025-06-16-autorag-custom-metadata-and-context.mdx Co-authored-by: Greg Brimble <[email protected]> * update the dates * Update src/content/docs/autorag/configuration/metadata.mdx Co-authored-by: Greg Brimble <[email protected]> --------- Co-authored-by: Greg Brimble <[email protected]>
1 parent f2d9965 commit 4bd4b2f

File tree

12 files changed

+149
-27
lines changed

12 files changed

+149
-27
lines changed

public/__redirects

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -219,6 +219,7 @@
219219

220220
#autorag
221221
/autorag/usage/recipes/ /autorag/how-to/ 301
222+
/autorag/configuration/metadata-filtering/ /autorag/configuration/metadata/ 301
222223

223224
# bots
224225
/bots/about/plans/ /bots/plans/ 301

src/content/changelog/autorag/2025-04-23-autorag-metadata-filtering.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ products:
66
date: 2025-04-23T6:00:00Z
77
---
88

9-
You can now filter [AutoRAG](/autorag) search results by `folder` and `timestamp` using [metadata filtering](/autorag/configuration/metadata-filtering/) to narrow down the scope of your query.
9+
You can now filter [AutoRAG](/autorag) search results by `folder` and `timestamp` using [metadata filtering](/autorag/configuration/metadata) to narrow down the scope of your query.
1010

1111
This makes it easy to build [multitenant experiences](/autorag/how-to/multitenancy/) where each user can only access their own data. By organizing your content into per-tenant folders and applying a `folder` filter at query time, you ensure that each tenant retrieves only their own documents.
1212

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
---
2+
title: View custom metadata in responses and guide AI-search with context in AutoRAG
3+
description: You can now view custom metadata in AutoRAG search responses and use a context field to provide additional guidance to AI-generated answers.
4+
products:
5+
- autorag
6+
date: 2025-06-19T6:10:00Z
7+
---
8+
9+
In [AutoRAG](/autorag/), you can now view your object's custom metadata in the response from [`/search`](/autorag/usage/workers-binding/) and [`/ai-search`](/autorag/usage/workers-binding/), and optionally add a `context` field in the custom metadata of an object to provide additional guidance for AI-generated answers.
10+
11+
You can add [custom metadata](/r2/api/workers/workers-api-reference/#r2putoptions) to an object when uploading it to your R2 bucket.
12+
13+
# Object's custom metadata in search responses
14+
15+
When you run a search, AutoRAG now returns any custom metadata associated with the object. This metadata appears in the response inside `attributes` then `file` , and can be used for downstream processing.
16+
17+
For example, the `attributes` section of your search response may look like:
18+
19+
```json
20+
{
21+
"attributes": {
22+
"timestamp": 1750001460000,
23+
"folder": "docs/",
24+
"filename": "launch-checklist.md",
25+
"file": {
26+
"url": "https://wiki.company.com/docs/launch-checklist",
27+
"context": "A checklist for internal launch readiness, including legal, engineering, and marketing steps."
28+
}
29+
}
30+
}
31+
```
32+
33+
# Add a `context` field to guide LLM answers
34+
35+
When you include a custom metadata field named `context`, AutoRAG attaches that value to each chunk of the file. When you run an `/ai-search` query, this `context` is passed to the LLM and can be used as additional input when generating an answer.
36+
37+
We recommend using the `context` field to describe supplemental information you want the LLM to consider, such as a summary of the document or a source URL. If you have several different metadata attributes, you can join them together however you choose within the `context` string.
38+
39+
For example:
40+
41+
```json
42+
{
43+
"context": "summary: 'Checklist for internal product launch readiness, including legal, engineering, and marketing steps.'; url: 'https://wiki.company.com/docs/launch-checklist'"
44+
}
45+
```
46+
47+
This gives you more control over how your content is interpreted, without requiring you to modify the original contents of the file.
48+
49+
Learn more in AutoRAG's [metadata filtering documentation](/autorag/configuration/metadata).
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
---
2+
title: Filter your AutoRAG search by file name
3+
description: You can now filter AutoRAG search queries by file name, allowing you to control which files can be retrieved for a given query.
4+
products:
5+
- autorag
6+
date: 2025-06-19T6:00:00Z
7+
---
8+
9+
In [AutoRAG](/autorag/), you can now [filter](/autorag/configuration/metadata/) by an object's file name using the `filename` attribute, giving you more control over which files are searched for a given query.
10+
11+
This is useful when your application has already determined which files should be searched. For example, you might query a PostgreSQL database to get a list of files a user has access to based on their permissions, and then use that list to limit what AutoRAG retrieves.
12+
13+
For example, your search query may look like:
14+
15+
```js
16+
const response = await env.AI.autorag("my-autorag").search({
17+
query: "what is the project deadline?",
18+
filters: {
19+
type: "eq",
20+
key: "filename",
21+
value: "project-alpha-roadmap.md",
22+
},
23+
});
24+
```
25+
26+
This allows you to connect your application logic with AutoRAG's retrieval process, making it easy to control what gets searched without needing to reindex or modify your data.
27+
28+
Learn more in AutoRAG's [metadata filtering documentation](/autorag/configuration/metadata/).
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
---
2+
pcx_content_type: navigation
3+
title: REST API
4+
external_link: /api/resources/autorag/
5+
sidebar:
6+
order: 9
7+
---

src/content/docs/autorag/configuration/data-source.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ AutoRAG will automatically scan and process supported files stored in that bucke
1616
AutoRAG has different file size limits depending on the file type:
1717

1818
- **Plain text files:** Up to **4 MB**
19-
- **Rich format files:** Up to **1 MB**
19+
- **Rich format files:** Up to **4 MB**
2020

2121
Files that exceed these limits will not be indexed and will show up in the error logs.
2222

@@ -30,7 +30,7 @@ AutoRAG supports the following plain text file types:
3030

3131
| Format | File extensions | Mime Type |
3232
| ---------- | ------------------------------------------------------------------------------ | --------------------------------------------------------------------- |
33-
| Text | `.txt` | `text/plain` |
33+
| Text | `.txt`, `.rst` | `text/plain` |
3434
| Log | `.log` | `text/plain` |
3535
| Config | `.ini`, `.conf`, `.env`, `.properties`, `.gitignore`, `.editorconfig`, `.toml` | `text/plain`, `text/toml` |
3636
| Markdown | `.markdown`, `.md`, `.mdx` | `text/markdown` |

src/content/docs/autorag/configuration/metadata-filtering.mdx renamed to src/content/docs/autorag/configuration/metadata.mdx

Lines changed: 37 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,15 @@
11
---
22
pcx_content_type: concept
3-
title: Metadata filtering
3+
title: Metadata
44
sidebar:
55
order: 6
66
---
77

88
import { FileTree } from "~/components"
99

10+
Use metadata to filter documents before retrieval and provide context to guide AI responses. This page covers how to apply filters and attach optional context metadata to your files.
11+
12+
## Metadata filtering
1013
Metadata filtering narrows down search results based on metadata, so only relevant content is retrieved. The filter narrows down results prior to retrieval, so that you only query the scope of documents that matter.
1114

1215
Here is an example of metadata filtering using [Workers Binding](/autorag/usage/workers-binding/) but it can be easily adapted to use the [REST API](/autorag/usage/rest-api/) instead.
@@ -32,25 +35,19 @@ const answer = await env.AI.autorag("my-autorag").search({
3235
});
3336
```
3437

35-
## Metadata attributes
36-
37-
You can currently filter by the `folder` and `timestamp` of an R2 object. Currently, custom metadata attributes are not supported.
38-
39-
### Folder
40-
41-
The directory to the object. For example, the `folder` of the object at `llama/logistics/llama-logistics.mdx` is `llama/logistics/`. Note that the `folder` does not include a leading `/`.
38+
### Metadata attributes
4239

43-
Note that `folder` filter only includes files exactly in that folder, so files in subdirectories are not included. For example, specifying `folder: "llama/"` will match files in `llama/` but does not match files in `llama/logistics`.
40+
| Attribute | Description | Example |
41+
| --- | --- | --- |
42+
| `filename` | The name of the file. | `dog.png` or `animals/mammals/cat.png` |
43+
| `folder` | The folder or prefix to the object. | For the object `animals/mammals/cat.png`, the folder is `animals/mammals/` |
44+
| `timestamp` | The timestamp for when the object was last modified. Comparisons are supported using a 13-digit Unix timestamp (milliseconds), but values will be rounded down to 10 digits (seconds). | The timestamp `2025-01-01 00:00:00.999 UTC` is `1735689600999` and it will be rounded down to `1735689600000`, corresponding to `2025-01-01 00:00:00 UTC` |
4445

45-
### Timestamp
46-
47-
The timestamp indicating when the object was last modified. Comparisons are supported using a 13-digit Unix timestamp (milliseconds), but values will be rounded to 10 digits (seconds). For example, `1735689600999` or `2025-01-01 00:00:00.999 UTC` will be rounded down to `1735689600000`, corresponding to `2025-01-01 00:00:00 UTC`.
48-
49-
## Filter schema
46+
### Filter schema
5047

5148
You can create simple comparison filters or an array of comparison filters using a compound filter.
5249

53-
### Comparison filter
50+
#### Comparison filter
5451

5552
You can compare a metadata attribute (for example, `folder` or `timestamp`) with a target value using a comparison filter.
5653

@@ -73,7 +70,7 @@ The available operators for the comparison are:
7370
| `lt` | Less than |
7471
| `lte` | Less than or equals to |
7572

76-
### Compound filter
73+
#### Compound filter
7774

7875
You can use a compound filter to combine multiple comparison filters with a logical operator.
7976

@@ -93,7 +90,7 @@ Note the following limitations with the compound operators:
9390
- Only the `eq` operator is allowed.
9491
- All conditions must filter on the **same key** (for example, all on `folder`)
9592

96-
### "Starts with" filter for folders
93+
#### "Starts with" filter for folders
9794

9895
You can use "starts with" filtering on the `folder` metadata attribute to search for all files and subfolders within a specific path.
9996

@@ -137,6 +134,25 @@ This filter identifies paths starting with `customer-a/` by using:
137134

138135
Together, these conditions effectively select paths that begin with the provided path value.
139136

137+
## Add `context` field to guide AI Search
138+
You can optionally include a custom metadata field named `context` when uploading an object to your R2 bucket.
139+
140+
The `context` field is attached to each chunk and passed to the LLM during an `/ai-search` query. It does not affect retrieval but helps the LLM interpret and frame the answer.
141+
142+
The field can be used for providing document summaries, source links, or custom instructions without modifying the file content.
143+
144+
You can add [custom metadata](/r2/api/workers/workers-api-reference/#r2putoptions) to an object in the `/PUT` operation when uploading the object to your R2 bucket. For example if you are using the [Workers binding with R2](/r2/api/workers/workers-api-usage/):
145+
146+
```javascript
147+
await env.MY_BUCKET.put("cat.png", file, {
148+
customMetadata: {
149+
context: "This is a picture of Joe's cat. His name is Max."
150+
}
151+
});
152+
```
153+
154+
During `/ai-search`, this context appears in the response under `attributes.file.context`, and is included in the data passed to the LLM for generating a response.
155+
140156
## Response
141157

142158
You can see the metadata attributes of your retrieved data in the response under the property `attributes` for each retrieved chunk. For example:
@@ -150,6 +166,10 @@ You can see the metadata attributes of your retrieved data in the response under
150166
"attributes": {
151167
"timestamp": 1735689600000, // unix timestamp for 2025-01-01
152168
"folder": "llama/logistics/",
169+
"file": {
170+
"url": "www.llamasarethebest.com/logistics"
171+
"context": "This file contains information about how llamas can logistically deliver coffee."
172+
}
153173
},
154174
"content": [
155175
{

src/content/docs/autorag/how-to/multitenancy.mdx

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ sidebar:
77

88
import { FileTree } from "~/components"
99

10-
AutoRAG supports multitenancy by letting you segment content by tenant, so each user, customer, or workspace can only access their own data. This is typically done by organizing documents into per-tenant folders and applying [metadata filters](/autorag/configuration/metadata-filtering/) at query time.
10+
AutoRAG supports multitenancy by letting you segment content by tenant, so each user, customer, or workspace can only access their own data. This is typically done by organizing documents into per-tenant folders and applying [metadata filters](/autorag/configuration/metadata/) at query time.
1111

1212
## 1. Organize Content by Tenant
1313

@@ -42,7 +42,7 @@ const response = await env.AI.autorag("my-autorag").search({
4242
});
4343
```
4444

45-
To filter across multiple folders, or to add date-based filtering, you can use a compound filter with an array of [comparison filters](/autorag/configuration/metadata-filtering/#compound-filter).
45+
To filter across multiple folders, or to add date-based filtering, you can use a compound filter with an array of [comparison filters](/autorag/configuration/metadata/#compound-filter).
4646

4747
## Tip: Use "Starts with" filter
4848

@@ -56,7 +56,7 @@ While an `eq` filter targets files at the specific folder, you'll often want to
5656
- contract-1.pdf
5757
</FileTree>
5858

59-
To achieve this [starts with](/autorag/configuration/metadata-filtering/#starts-with-filter-for-folders) behavior, use a compound filter like:
59+
To achieve this [starts with](/autorag/configuration/metadata/#starts-with-filter-for-folders) behavior, use a compound filter like:
6060

6161
```js
6262
filters: {

src/content/docs/autorag/platform/limits-pricing.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,6 @@ The following limits currently apply to AutoRAG during the open beta:
2626
| --------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------- |
2727
| Max AutoRAG instances per account | 10 |
2828
| Max files per AutoRAG | 100,000 |
29-
| Max file size | 4 MB ([Plain text](/autorag/configuration/data-source/#plain-text-file-types)) / 1 MB ([Rich format](/autorag/configuration/data-source/#rich-format-file-types)) |
29+
| Max file size | 4 MB |
3030

3131
These limits are subject to change as AutoRAG evolves beyond open beta.

src/content/partials/autorag/ai-search-api-params.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,4 +33,4 @@ Returns a stream of results as they are available. Defaults to `false`.
3333

3434
`filters` <Type text="object" /> <MetaInfo text="optional" />
3535

36-
Narrow down search results based on metadata, like folder and date, so only relevant content is retrieved. For more details, refer to [Metadata filtering](/autorag/configuration/metadata-filtering).
36+
Narrow down search results based on metadata, like folder and date, so only relevant content is retrieved. For more details, refer to [Metadata filtering](/autorag/configuration/metadata/).

0 commit comments

Comments
 (0)