|
| 1 | +--- |
| 2 | +pcx_content_type: concept |
| 3 | +title: Metadata filtering |
| 4 | +sidebar: |
| 5 | + order: 6 |
| 6 | +--- |
| 7 | + |
| 8 | +Metadata filtering narrows down search results based on metadata, so only relevant content is retrieved. The filter narrows down results prior to retrieval, so that you only query the scope of documents that matter. |
| 9 | + |
| 10 | +Here is an example of metadata filtering using [Workers Binding](/autorag/usage/workers-binding/) but it can be easily adapted to use the [REST API](/autorag/usage/rest-api/) instead. |
| 11 | + |
| 12 | +```js |
| 13 | +const answer = await env.AI.autorag("my-autorag").search({ |
| 14 | + query: "How do I train a llama to deliver coffee?", |
| 15 | + filters: { |
| 16 | + type: "and", |
| 17 | + filters: [ |
| 18 | + { |
| 19 | + type: "eq", |
| 20 | + key: "folder", |
| 21 | + value: "llama/logistics/", |
| 22 | + }, |
| 23 | + { |
| 24 | + type: "gte", |
| 25 | + key: "modified_date", |
| 26 | + value: "1735689600000", // unix timestamp for 2025-01-01 |
| 27 | + }, |
| 28 | + ], |
| 29 | + }, |
| 30 | +}); |
| 31 | +``` |
| 32 | + |
| 33 | +## Metadata attributes |
| 34 | + |
| 35 | +You can currently filter by the `folder` and `modified_date` of an R2 object. Currently, custom metadata attributes are not supported. |
| 36 | + |
| 37 | +### `folder` |
| 38 | + |
| 39 | +The directory to the object. For example, the `folder` of the object at `llama/logistics/llama-logistics.mdx` is `llama/logistics/`. Note that the `folder` does not include a leading `/`. |
| 40 | + |
| 41 | +Note that `folder` filter only includes files exactly in that folder, so files in subdirectories are not included. For example, specifying `folder: "llama/"` will match files in `llama/` but does not match files in `llama/logistics`. |
| 42 | + |
| 43 | +### `modified_date` |
| 44 | + |
| 45 | +The timestamp indicating when the object was last modified. Comparisons are supported using a 13-digit Unix timestamp (milliseconds), but values will be rounded to 10 digits (seconds). For example, `1735689600999` or `2025-01-01 00:00:00.999 UTC` will be rounded down to `1735689600000`, corresponding to `2025-01-01 00:00:00 UTC`. |
| 46 | + |
| 47 | +## Filter schema |
| 48 | + |
| 49 | +You can create simple comparison filters or an array of comparison filters using a compound filter. |
| 50 | + |
| 51 | +### Comparison filter |
| 52 | + |
| 53 | +You can compare a metadata attribute (for example, `folder` or `modified_date`) with a target value using a comparison filter. |
| 54 | + |
| 55 | +```js |
| 56 | +filters: { |
| 57 | + type: "operator", |
| 58 | + key: "metadata_attribute", |
| 59 | + value: "target_value" |
| 60 | +} |
| 61 | +``` |
| 62 | + |
| 63 | +The available operators for the comparison are: |
| 64 | + |
| 65 | +| Operator | Description | |
| 66 | +| -------- | ------------------------- | |
| 67 | +| `eq` | Equals | |
| 68 | +| `ne` | Not equals | |
| 69 | +| `gt` | Greater than | |
| 70 | +| `gte` | Greater than or equals to | |
| 71 | +| `lt` | Less than | |
| 72 | +| `lte` | Less than or equals to | |
| 73 | + |
| 74 | +### Compound filter |
| 75 | + |
| 76 | +You can use a compound filter to combine multiple comparison filters with a logical operator. |
| 77 | + |
| 78 | +```js |
| 79 | +filters: { |
| 80 | + type: "compound_operator", |
| 81 | + filters: [...] |
| 82 | +} |
| 83 | +``` |
| 84 | + |
| 85 | +The available compound operators are: `and`, `or`. |
| 86 | + |
| 87 | +Note the following limitations with the compound operators: |
| 88 | + |
| 89 | +- No nesting combinations of `and`'s and `or`'s, meaning you can only pick 1 `and` or 1 `or`. |
| 90 | +- When using `or`: |
| 91 | + - Only the `eq` operator is allowed. |
| 92 | + - All conditions must filter on the **same key** (for example, all on `folder`) |
| 93 | + |
| 94 | +## Response |
| 95 | + |
| 96 | +You can see the metadata attributes of your retrieved data in the response under the property `attributes` for each retrieved chunk. For example: |
| 97 | + |
| 98 | +```js |
| 99 | +"data": [ |
| 100 | + { |
| 101 | + "file_id": "llama001", |
| 102 | + "filename": "llama/logistics/llama-logistics.md", |
| 103 | + "score": 0.45, |
| 104 | + "attributes": { |
| 105 | + "modified_date": 1735689600000, // unix timestamp for 2025-01-01 |
| 106 | + "folder": "llama/logistics/", |
| 107 | + }, |
| 108 | + "content": [ |
| 109 | + { |
| 110 | + "id": "llama001", |
| 111 | + "type": "text", |
| 112 | + "text": "Llamas can carry 3 drinks max." |
| 113 | + } |
| 114 | + ] |
| 115 | + } |
| 116 | +] |
| 117 | +``` |
0 commit comments