Skip to content

Commit 917e713

Browse files
authored
[DOC] Metadata arrays docs (#6383)
## Description of changes Adding docs for metadata arrays
1 parent 7cc289f commit 917e713

File tree

6 files changed

+674
-88
lines changed

6 files changed

+674
-88
lines changed

docs/mintlify/cloud/schema/schema-basics.mdx

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -252,7 +252,11 @@ schema.deleteIndex(undefined, "temporary_field");
252252
</CodeGroup>
253253

254254
<Callout>
255-
**Note:** Not all indexes can be deleted. Vector and FTS indexes currently cannot be disabled
255+
**Note:** Not all indexes can be deleted. Vector indexes currently cannot be disabled.
256+
</Callout>
257+
258+
<Callout>
259+
**Array metadata and indexes:** Array metadata (e.g. `[1, 2, 3]` or `["action", "comedy"]`) shares the same inverted index as its scalar counterpart. Disabling `IntInvertedIndexConfig` will also prevent `$contains` and `$not_contains` queries on integer arrays, and similarly for other types.
256260
</Callout>
257261

258262
## Method Chaining

docs/mintlify/cloud/search-api/filtering.mdx

Lines changed: 194 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -114,8 +114,8 @@ Chroma supports three data types for metadata: strings, numbers (int/float), and
114114
**Supported operators:**
115115
- `is_in()` - Value matches any in the list
116116
- `not_in()` - Value doesn't match any in the list
117-
- `contains()` - String contains substring (case-sensitive, currently K.DOCUMENT only)
118-
- `not_contains()` - String doesn't contain substring (currently K.DOCUMENT only)
117+
- `contains()` - On `K.DOCUMENT`: substring search (case-sensitive). On metadata fields: checks if an array contains a scalar value.
118+
- `not_contains()` - On `K.DOCUMENT`: excludes by substring. On metadata fields: checks that an array does not contain a scalar value.
119119
- `regex()` - String matches regex pattern (currently K.DOCUMENT only)
120120
- `not_regex()` - String doesn't match regex pattern (currently K.DOCUMENT only)
121121

@@ -126,14 +126,19 @@ K.ID.is_in(["doc1", "doc2", "doc3"]) # Match any ID in list
126126
K("category").is_in(["tech", "science"]) # Match any category
127127
K("status").not_in(["draft", "deleted"]) # Exclude specific values
128128

129-
# String content operators (currently K.DOCUMENT only)
129+
# String content operators (K.DOCUMENT only)
130130
K.DOCUMENT.contains("machine learning") # Substring search in document
131131
K.DOCUMENT.not_contains("deprecated") # Exclude documents with text
132132
K.DOCUMENT.regex(r"\bAPI\b") # Match whole word "API" in document
133133

134-
# Note: String pattern matching on metadata fields not yet supported
135-
# K("title").contains("Python") # NOT YET SUPPORTED
136-
# K("email").regex(r".*@company\.com$") # NOT YET SUPPORTED
134+
# Array membership operators (metadata fields)
135+
K("tags").contains("action") # Array contains value
136+
K("tags").not_contains("draft") # Array does not contain value
137+
K("scores").contains(42) # Works with numbers
138+
K("flags").contains(True) # Works with booleans
139+
140+
# Note: String pattern matching on metadata scalar fields not yet supported
141+
# K("title").regex(r".*Python.*") # NOT YET SUPPORTED
137142
```
138143

139144
```typescript TypeScript
@@ -142,14 +147,19 @@ K.ID.isIn(["doc1", "doc2", "doc3"]); // Match any ID in list
142147
K("category").isIn(["tech", "science"]); // Match any category
143148
K("status").notIn(["draft", "deleted"]); // Exclude specific values
144149

145-
// String content operators (currently K.DOCUMENT only)
150+
// String content operators (K.DOCUMENT only)
146151
K.DOCUMENT.contains("machine learning"); // Substring search in document
147152
K.DOCUMENT.notContains("deprecated"); // Exclude documents with text
148153
K.DOCUMENT.regex("\\bAPI\\b"); // Match whole word "API" in document
149154

150-
// Note: String pattern matching on metadata fields not yet supported
151-
// K("title").contains("Python") // NOT YET SUPPORTED
152-
// K("email").regex(".*@company\\.com$") // NOT YET SUPPORTED
155+
// Array membership operators (metadata fields)
156+
K("tags").contains("action"); // Array contains value
157+
K("tags").notContains("draft"); // Array does not contain value
158+
K("scores").contains(42); // Works with numbers
159+
K("flags").contains(true); // Works with booleans
160+
161+
// Note: String pattern matching on metadata scalar fields not yet supported
162+
// K("title").regex(".*Python.*") // NOT YET SUPPORTED
153163
```
154164

155165
```rust Rust
@@ -161,13 +171,150 @@ Key::field("status").not_in(["draft", "deleted"]);
161171
Key::Document.contains("machine learning");
162172
Key::Document.not_contains("deprecated");
163173
Key::Document.regex(r"\bAPI\b");
174+
175+
// Array membership operators (metadata fields)
176+
Key::field("tags").contains_value("action");
177+
Key::field("tags").not_contains_value("draft");
178+
Key::field("scores").contains_value(42);
179+
Key::field("flags").contains_value(true);
164180
```
165181
</CodeGroup>
166182

167183
<Callout>
168-
String operations like `contains()` and `regex()` are case-sensitive by default. The `is_in()` operator is efficient even with large lists.
184+
String operations like `contains()` and `regex()` on `K.DOCUMENT` are case-sensitive by default. When used on metadata fields, `contains()` checks array membership rather than substring matching. The `is_in()` operator is efficient even with large lists.
169185
</Callout>
170186

187+
## Array Metadata
188+
189+
Chroma supports storing arrays of values in metadata fields. You can use `contains()` / `not_contains()` (or `$contains` / `$not_contains` in dictionary syntax) to filter records based on whether an array includes a specific scalar value.
190+
191+
### Storing Array Metadata
192+
193+
Arrays can contain strings, numbers, or booleans. All elements in an array must be the same type. Empty arrays are not allowed.
194+
195+
<CodeGroup>
196+
```python Python
197+
collection.add(
198+
ids=["m1", "m2", "m3"],
199+
embeddings=[[1, 0, 0], [0, 1, 0], [0, 0, 1]],
200+
metadatas=[
201+
{"genres": ["action", "comedy"], "year": 2020},
202+
{"genres": ["drama"], "year": 2021},
203+
{"genres": ["action", "thriller"], "year": 2022},
204+
],
205+
)
206+
```
207+
208+
```typescript TypeScript
209+
await collection.add({
210+
ids: ["m1", "m2", "m3"],
211+
embeddings: [[1, 0, 0], [0, 1, 0], [0, 0, 1]],
212+
metadatas: [
213+
{ genres: ["action", "comedy"], year: 2020 },
214+
{ genres: ["drama"], year: 2021 },
215+
{ genres: ["action", "thriller"], year: 2022 },
216+
],
217+
});
218+
```
219+
220+
```rust Rust
221+
use chroma::types::{Metadata, MetadataValue};
222+
223+
let mut m = Metadata::new();
224+
m.insert(
225+
"genres".into(),
226+
MetadataValue::StringArray(vec!["action".to_string(), "comedy".to_string()]),
227+
);
228+
m.insert("year".into(), MetadataValue::Int(2020));
229+
230+
// Also supports IntArray, FloatArray, and BoolArray
231+
let mut m2 = Metadata::new();
232+
m2.insert("scores".into(), MetadataValue::IntArray(vec![10, 20, 30]));
233+
```
234+
</CodeGroup>
235+
236+
### Filtering Arrays
237+
238+
Use `contains()` to check if a metadata array includes a value, and `not_contains()` to check that it does not.
239+
240+
<CodeGroup>
241+
```python Python
242+
from chromadb import Search, K
243+
244+
# Find all records where genres contains "action"
245+
search = Search().where(K("genres").contains("action"))
246+
247+
# Exclude records with a specific tag
248+
search = Search().where(K("tags").not_contains("draft"))
249+
250+
# Works with numbers and booleans too
251+
search = Search().where(K("scores").contains(42))
252+
253+
# Combine with other filters
254+
search = Search().where(
255+
K("genres").contains("action") &
256+
(K("year") >= 2021)
257+
)
258+
```
259+
260+
```typescript TypeScript
261+
import { Search, K } from 'chromadb';
262+
263+
// Find all records where genres contains "action"
264+
const search1 = new Search().where(K("tags").contains("action"));
265+
266+
// Exclude records with a specific tag
267+
const search2 = new Search().where(K("tags").notContains("draft"));
268+
269+
// Works with numbers and booleans too
270+
const search3 = new Search().where(K("scores").contains(42));
271+
272+
// Combine with other filters
273+
const search4 = new Search().where(
274+
K("genres").contains("action")
275+
.and(K("year").gte(2021))
276+
);
277+
```
278+
279+
```rust Rust
280+
use chroma::types::{Key, SearchPayload};
281+
282+
// Find all records where genres contains "action"
283+
let search = SearchPayload::default()
284+
.r#where(Key::field("tags").contains_value("action"));
285+
286+
// Exclude records with a specific tag
287+
let search = SearchPayload::default()
288+
.r#where(Key::field("tags").not_contains_value("draft"));
289+
290+
// Works with numbers and booleans too
291+
let search = SearchPayload::default()
292+
.r#where(Key::field("scores").contains_value(42));
293+
294+
// Combine with other filters
295+
let search = SearchPayload::default()
296+
.r#where(
297+
Key::field("genres").contains_value("action")
298+
& Key::field("year").gte(2021i64),
299+
);
300+
301+
let results = collection.search(vec![search]).await?;
302+
```
303+
</CodeGroup>
304+
305+
### Supported Array Types
306+
307+
| Type | Python | TypeScript | Rust |
308+
|------|--------|------------|------|
309+
| String | `["a", "b"]` | `["a", "b"]` | `MetadataValue::StringArray(...)` |
310+
| Integer | `[1, 2, 3]` | `[1, 2, 3]` | `MetadataValue::IntArray(...)` |
311+
| Float | `[1.5, 2.5]` | `[1.5, 2.5]` | `MetadataValue::FloatArray(...)` |
312+
| Boolean | `[true, false]` | `[true, false]` | `MetadataValue::BoolArray(...)` |
313+
314+
<Warning>
315+
The `$contains` value must be a scalar that matches the array's element type. All elements in an array must be the same type, and nested arrays are not supported.
316+
</Warning>
317+
171318
## Logical Operators
172319

173320
**Supported operators:**
@@ -243,8 +390,8 @@ You can also use dictionary syntax instead of K expressions. This is useful when
243390
- `$lte` - Less than or equal (numeric only)
244391
- `$in` - Value in list
245392
- `$nin` - Value not in list
246-
- `$contains` - String contains
247-
- `$not_contains` - String doesn't contain
393+
- `$contains` - On `#document`: substring search. On metadata fields: array contains value.
394+
- `$not_contains` - On `#document`: excludes by substring. On metadata fields: array does not contain value.
248395
- `$regex` - Regex match
249396
- `$not_regex` - Regex doesn't match
250397
- `$and` - Logical AND
@@ -268,12 +415,16 @@ You can also use dictionary syntax instead of K expressions. This is useful when
268415
{"category": {"$in": ["tech", "ai"]}} # Same as K("category").is_in(["tech", "ai"])
269416
{"status": {"$nin": ["draft", "deleted"]}} # Same as K("status").not_in(["draft", "deleted"])
270417

271-
# String operators (currently K.DOCUMENT only)
418+
# String operators (K.DOCUMENT only)
272419
{"#document": {"$contains": "API"}} # Same as K.DOCUMENT.contains("API")
273-
# {"title": {"$not_contains": "draft"}} # Not yet supported - metadata fields
274420
# {"email": {"$regex": ".*@example\\.com"}} # Not yet supported - metadata fields
275421
# {"version": {"$not_regex": "^beta"}} # Not yet supported - metadata fields
276422

423+
# Array membership operators (metadata fields)
424+
{"genres": {"$contains": "action"}} # Same as K("genres").contains("action")
425+
{"genres": {"$not_contains": "draft"}} # Same as K("genres").not_contains("draft")
426+
{"scores": {"$contains": 42}} # Works with numbers
427+
277428
# Logical operators
278429
{"$and": [
279430
{"status": "published"},
@@ -317,12 +468,16 @@ You can also use dictionary syntax instead of K expressions. This is useful when
317468
{ category: { $in: ["tech", "ai"] } } // Same as K("category").isIn(["tech", "ai"])
318469
{ status: { $nin: ["draft", "deleted"] } } // Same as K("status").notIn(["draft", "deleted"])
319470

320-
// String operators (currently K.DOCUMENT only)
471+
// String operators (K.DOCUMENT only)
321472
{ "#document": { $contains: "API" } } // Same as K.DOCUMENT.contains("API")
322-
// { title: { $not_contains: "draft" } } // Not yet supported - metadata fields
323473
// { email: { $regex: ".*@example\\.com" } } // Not yet supported - metadata fields
324474
// { version: { $not_regex: "^beta" } } // Not yet supported - metadata fields
325475

476+
// Array membership operators (metadata fields)
477+
{ genres: { $contains: "action" } } // Same as K("genres").contains("action")
478+
{ genres: { $not_contains: "draft" } } // Same as K("genres").notContains("draft")
479+
{ scores: { $contains: 42 } } // Works with numbers
480+
326481
// Logical operators
327482
{
328483
$and: [
@@ -488,50 +643,56 @@ K("score").gt(90); // Undefined results when mixed types exist
488643

489644
### String Pattern Matching Limitations
490645

491-
**Currently, `contains()`, `not_contains()`, `regex()`, and `not_regex()` operators only work on `K.DOCUMENT`**. These operators do not yet support metadata fields.
646+
**`regex()` and `not_regex()` only work on `K.DOCUMENT`**. These operators do not yet support metadata fields.
492647

493-
Additionally, the pattern must contain at least 3 literal characters to ensure accurate results.
648+
`contains()` and `not_contains()` have different behavior depending on the field:
649+
- On `K.DOCUMENT`: substring search (the pattern must have at least 3 literal characters)
650+
- On metadata fields: array membership check (see [Array Metadata](#array-metadata) above)
651+
652+
Substring matching on metadata scalar fields (e.g. checking if a string field contains a substring) is not yet supported.
494653

495654
<CodeGroup>
496655
```python Python
497-
# Currently supported - K.DOCUMENT only
656+
# Substring search on K.DOCUMENT - works
498657
K.DOCUMENT.contains("API") # Works
499658
K.DOCUMENT.regex(r"v\d\.\d\.\d") # Works
500-
K.DOCUMENT.contains("machine learning") # Works
501659

502-
# NOT YET SUPPORTED - metadata fields
503-
K("title").contains("Python") # Not supported yet
504-
K("description").regex(r"API.*") # Not supported yet
660+
# Array membership on metadata fields - works
661+
K("tags").contains("action") # Works - checks if array contains value
662+
663+
# Substring/regex on metadata scalar fields - NOT YET SUPPORTED
664+
# K("title").regex(r".*Python.*") # Not supported yet
505665

506-
# Pattern length requirements (for K.DOCUMENT)
666+
# Pattern length requirements (for K.DOCUMENT substring search)
507667
K.DOCUMENT.contains("API") # 3 characters - good
508668
K.DOCUMENT.contains("AI") # Only 2 characters - may give incorrect results
509669
K.DOCUMENT.regex(r"\d+") # No literal characters - may give incorrect results
510670
```
511671

512672
```typescript TypeScript
513-
// Currently supported - K.DOCUMENT only
673+
// Substring search on K.DOCUMENT - works
514674
K.DOCUMENT.contains("API"); // Works
515675
K.DOCUMENT.regex("v\\d\\.\\d\\.\\d"); // Works
516-
K.DOCUMENT.contains("machine learning"); // Works
517676

518-
// NOT YET SUPPORTED - metadata fields
519-
K("title").contains("Python"); // Not supported yet
520-
K("description").regex("API.*"); // Not supported yet
677+
// Array membership on metadata fields - works
678+
K("tags").contains("action"); // Works - checks if array contains value
679+
680+
// Substring/regex on metadata scalar fields - NOT YET SUPPORTED
681+
// K("title").regex(".*Python.*") // Not supported yet
521682

522-
// Pattern length requirements (for K.DOCUMENT)
683+
// Pattern length requirements (for K.DOCUMENT substring search)
523684
K.DOCUMENT.contains("API"); // 3 characters - good
524685
K.DOCUMENT.contains("AI"); // Only 2 characters - may give incorrect results
525686
K.DOCUMENT.regex("\\d+"); // No literal characters - may give incorrect results
526687
```
527688
</CodeGroup>
528689

529690
<Warning>
530-
String pattern matching currently only works on `K.DOCUMENT`. Support for metadata fields is not yet available. Also, patterns with fewer than 3 literal characters may return incorrect results.
691+
`regex()` and `not_regex()` currently only work on `K.DOCUMENT`. Substring matching on metadata scalar fields is not yet available. Also, patterns with fewer than 3 literal characters may return incorrect results.
531692
</Warning>
532693

533694
<Callout>
534-
String pattern matching on metadata fields is not currently supported. Full support is coming in a future release, which will allow users to opt-in to additional indexes for string pattern matching on specific metadata fields.
695+
Substring and regex matching on metadata scalar fields is not currently supported. Full support is coming in a future release, which will allow users to opt-in to additional indexes for string pattern matching on specific metadata fields.
535696
</Callout>
536697

537698
## Complete Example

docs/mintlify/docs/collections/add-data.mdx

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -133,6 +133,50 @@ collection.add(
133133
```
134134
</CodeGroup>
135135

136+
## Metadata
137+
138+
Metadata values can be strings, integers, floats, or booleans. Additionally, you can store arrays of these types.
139+
140+
<CodeGroup>
141+
```python Python
142+
collection.add(
143+
ids=["id1"],
144+
documents=["lorem ipsum..."],
145+
metadatas=[{
146+
"chapter": 3,
147+
"tags": ["fiction", "adventure"],
148+
"scores": [1, 2, 3],
149+
}],
150+
)
151+
```
152+
153+
```typescript TypeScript
154+
await collection.add({
155+
ids: ["id1"],
156+
documents: ["lorem ipsum..."],
157+
metadatas: [{
158+
chapter: 3,
159+
tags: ["fiction", "adventure"],
160+
scores: [1, 2, 3],
161+
}],
162+
});
163+
```
164+
165+
```rust Rust
166+
use chroma::types::{Metadata, MetadataValue};
167+
168+
let mut metadata = Metadata::new();
169+
metadata.insert("chapter".into(), MetadataValue::Int(3));
170+
metadata.insert(
171+
"tags".into(),
172+
MetadataValue::StringArray(vec!["fiction".to_string(), "adventure".to_string()]),
173+
);
174+
metadata.insert("scores".into(), MetadataValue::IntArray(vec![1, 2, 3]));
175+
```
176+
</CodeGroup>
177+
178+
All elements in an array must be the same type, and empty arrays are not allowed. You can filter on array metadata using the `$contains` and `$not_contains` operators — see [Metadata Filtering](/docs/querying-collections/metadata-filtering#using-array-metadata) for details.
179+
136180
## Behaviors
137181

138182
- If you add a record with an ID that already exists in the collection, it will be ignored without throwing an error. In order to overwrite data in your collection, you must [update](./update-data) the data.

0 commit comments

Comments
 (0)