Skip to content

Commit 4ca70ca

Browse files
aninibreadOxyjun
authored andcommitted
starts with filter instructions added (cloudflare#22819)
* starts with filter instructions added * added explanation * Using FileTree component and minor fixes --------- Co-authored-by: Jun Lee <[email protected]>
1 parent db4124d commit 4ca70ca

File tree

2 files changed

+97
-7
lines changed

2 files changed

+97
-7
lines changed

src/content/docs/autorag/configuration/metadata-filtering.mdx

Lines changed: 48 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,8 @@ sidebar:
55
order: 6
66
---
77

8+
import { FileTree } from "~/components"
9+
810
Metadata filtering narrows down search results based on metadata, so only relevant content is retrieved. The filter narrows down results prior to retrieval, so that you only query the scope of documents that matter.
911

1012
Here is an example of metadata filtering using [Workers Binding](/autorag/usage/workers-binding/) but it can be easily adapted to use the [REST API](/autorag/usage/rest-api/) instead.
@@ -34,13 +36,13 @@ const answer = await env.AI.autorag("my-autorag").search({
3436

3537
You can currently filter by the `folder` and `timestamp` of an R2 object. Currently, custom metadata attributes are not supported.
3638

37-
### `folder`
39+
### Folder
3840

3941
The directory to the object. For example, the `folder` of the object at `llama/logistics/llama-logistics.mdx` is `llama/logistics/`. Note that the `folder` does not include a leading `/`.
4042

4143
Note that `folder` filter only includes files exactly in that folder, so files in subdirectories are not included. For example, specifying `folder: "llama/"` will match files in `llama/` but does not match files in `llama/logistics`.
4244

43-
### `timestamp`
45+
### Timestamp
4446

4547
The timestamp indicating when the object was last modified. Comparisons are supported using a 13-digit Unix timestamp (milliseconds), but values will be rounded to 10 digits (seconds). For example, `1735689600999` or `2025-01-01 00:00:00.999 UTC` will be rounded down to `1735689600000`, corresponding to `2025-01-01 00:00:00 UTC`.
4648

@@ -91,6 +93,50 @@ Note the following limitations with the compound operators:
9193
- Only the `eq` operator is allowed.
9294
- All conditions must filter on the **same key** (for example, all on `folder`)
9395

96+
### "Starts with" filter for folders
97+
98+
You can use "starts with" filtering on the `folder` metadata attribute to search for all files and subfolders within a specific path.
99+
100+
For example, consider this file structure:
101+
102+
<FileTree>
103+
- customer-a
104+
- profile.md
105+
- contracts
106+
- property
107+
- contract-1.pdf
108+
</FileTree>
109+
110+
If you were to filter using an `eq` (equals) operator with `value: "customer-a/"`, it would only match files directly within that folder, like `profile.md`. It would not include files in subfolders like `customer-a/contracts/`.
111+
112+
To recursively filter for all items starting with the path `customer-a/`, you can use the following compound filter:
113+
114+
```js
115+
filters: {
116+
type: "and",
117+
filters: [
118+
{
119+
type: "gt",
120+
key: "folder",
121+
value: "customer-a//",
122+
},
123+
{
124+
type: "lte",
125+
key: "folder",
126+
value: "customer-a/z",
127+
},
128+
],
129+
},
130+
```
131+
132+
This filter identifies paths starting with `customer-a/` by using:
133+
134+
- The `and` condition to combine the effects of the `gt` and `lte` conditions.
135+
- The `gt` condition to include paths greater than the `/` ASCII character.
136+
- The `lte` condition to include paths less than and including the lower case `z` ASCII character.
137+
138+
Together, these conditions effectively select paths that begin with the provided path value.
139+
94140
## Response
95141

96142
You can see the metadata attributes of your retrieved data in the response under the property `attributes` for each retrieved chunk. For example:

src/content/docs/autorag/how-to/multitenancy.mdx

Lines changed: 49 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,8 @@ sidebar:
55
order: 5
66
---
77

8+
import { FileTree } from "~/components"
9+
810
AutoRAG supports multitenancy by letting you segment content by tenant, so each user, customer, or workspace can only access their own data. This is typically done by organizing documents into per-tenant folders and applying [metadata filters](/autorag/configuration/metadata-filtering/) at query time.
911

1012
## 1. Organize Content by Tenant
@@ -13,11 +15,13 @@ When uploading files to R2, structure your content by tenant using unique folder
1315

1416
Example folder structure:
1517

16-
```bash
17-
customer-a/logs/
18-
customer-a/contracts/
19-
customer-b/contracts/
20-
```
18+
<FileTree>
19+
- customer-a
20+
- logs/
21+
- contracts/
22+
- customer-b
23+
- contracts/
24+
</FileTree>
2125

2226
When indexing, AutoRAG will automatically store the folder path as metadata under the `folder` attribute. It is recommended to enforce folder separation during upload or indexing to prevent accidental data access across tenants.
2327

@@ -39,3 +43,43 @@ const response = await env.AI.autorag("my-autorag").search({
3943
```
4044

4145
To filter across multiple folders, or to add date-based filtering, you can use a compound filter with an array of [comparison filters](/autorag/configuration/metadata-filtering/#compound-filter).
46+
47+
## Tip: Use "Starts with" filter
48+
49+
While an `eq` filter targets files at the specific folder, you'll often want to retrieve all documents belonging to a tenant regardless if there are files in its subfolders. For example, all files in `customer-a/` with a structure like:
50+
51+
<FileTree>
52+
- customer-a
53+
- profile.md
54+
- contracts
55+
- property
56+
- contract-1.pdf
57+
</FileTree>
58+
59+
To achieve this [starts with](/autorag/configuration/metadata-filtering/#starts-with-filter-for-folders) behavior, use a compound filter like:
60+
61+
```js
62+
filters: {
63+
type: "and",
64+
filters: [
65+
{
66+
type: "gt",
67+
key: "folder",
68+
value: "customer-a//",
69+
},
70+
{
71+
type: "lte",
72+
key: "folder",
73+
value: "customer-a/z",
74+
},
75+
],
76+
},
77+
```
78+
79+
This filter identifies paths starting with `customer-a/` by using:
80+
81+
- The `and` condition to combine the effects of the `gt` and `lte` conditions.
82+
- The `gt` condition to include paths greater than the `/` ASCII character.
83+
- The `lte` condition to include paths less than and including the lower case `z` ASCII character.
84+
85+
This filter captures both files `profile.md` and `contract-1.pdf`.

0 commit comments

Comments
 (0)