From 1e0d5c04e5669a14005f563ad0d37822606eeaa3 Mon Sep 17 00:00:00 2001 From: Garvit Gupta Date: Mon, 25 Aug 2025 15:25:00 -0700 Subject: [PATCH 1/2] [Vectorize] Add documentation for the Vectorize list-vectors operation --- .../vectorize/best-practices/list-vectors.mdx | 65 +++++++++++++++++++ .../docs/vectorize/platform/limits.mdx | 3 +- .../docs/vectorize/reference/client-api.mdx | 16 +++++ .../workers/wrangler-commands/vectorize.mdx | 15 +++++ src/content/release-notes/vectorize.yaml | 5 ++ 5 files changed, 103 insertions(+), 1 deletion(-) create mode 100644 src/content/docs/vectorize/best-practices/list-vectors.mdx diff --git a/src/content/docs/vectorize/best-practices/list-vectors.mdx b/src/content/docs/vectorize/best-practices/list-vectors.mdx new file mode 100644 index 000000000000000..0517fa0e47bd9ef --- /dev/null +++ b/src/content/docs/vectorize/best-practices/list-vectors.mdx @@ -0,0 +1,65 @@ +--- +title: List vectors +pcx_content_type: concept +sidebar: + order: 5 + +--- + +The list-vectors operation allows you to enumerate all vector identifiers in a Vectorize index using paginated requests. This guide covers best practices for efficiently using this operation. + +## When to use list-vectors + +Use list-vectors for: + +- **Bulk operations**: When you need to process all vectors in an index +- **Auditing**: To verify the contents of your index or generate reports +- **Data migration**: When moving vectors between indexes or systems +- **Cleanup operations**: To identify and remove outdated vectors + +## Pagination behavior + +The list-vectors operation uses cursor-based pagination with important consistency guarantees: + +### Snapshot consistency + +Vector identifiers returned belong to the index snapshot captured at the time of the first list-vectors request. This ensures consistent pagination even when the index is being modified during iteration: + +- **New vectors**: Vectors inserted after the initial request will not appear in subsequent paginated results +- **Deleted vectors**: Vectors deleted after the initial request will continue to appear in the remaining responses until pagination is complete + +### Starting a new iteration + +To see recently added or removed vectors, you must start a new list-vectors request sequence (without a cursor). This captures a fresh snapshot of the index. + +### Response structure + +Each response includes: +- `count`: Number of vectors returned in this response +- `totalCount`: Total number of vectors in the index +- `isTruncated`: Whether there are more vectors available +- `nextCursor`: Cursor for the next page (null if no more results) +- `cursorExpirationTimestamp`: When the cursor expires +- `vectors`: Array of vector identifiers + +### Cursor expiration + +Cursors have an expiration timestamp. If a cursor expires, you'll need to start a new list-vectors request sequence to continue pagination. + +## Performance considerations + +Take care to have sufficient gap between consecutive requests to avoid hitting rate-limits. + +## Example workflow + +Here's a typical pattern for processing all vectors in an index: + +```sh +# Start iteration +wrangler vectorize list-vectors my-index --count=1000 + +# Continue with cursor from response +wrangler vectorize list-vectors my-index --count=1000 --cursor="" + +# Repeat until no more results +``` \ No newline at end of file diff --git a/src/content/docs/vectorize/platform/limits.mdx b/src/content/docs/vectorize/platform/limits.mdx index 71cea04e7182a28..2be546ed4bd98e2 100644 --- a/src/content/docs/vectorize/platform/limits.mdx +++ b/src/content/docs/vectorize/platform/limits.mdx @@ -11,12 +11,13 @@ The following limits apply to accounts, indexes and vectors (as specified): | ------------------------------------------------------------- | ----------------------------------- | | Indexes per account | 50,000 (Workers Paid) / 100 (Free) | | Maximum dimensions per vector | 1536 dimensions, 32 bits precision | -| Precision per vector dimension | 32 bits (float32) | +| Precision per vector dimension | 32 bits (float32) | | Maximum vector ID length | 64 bytes | | Metadata per vector | 10KiB | | Maximum returned results (`topK`) with values or metadata | 20 | | Maximum returned results (`topK`) without values and metadata | 100 | | Maximum upsert batch size (per batch) | 1000 (Workers) / 5000 (HTTP API) | +| Maximum vectors in a list-vectors page | 1000 | | Maximum index name length | 64 bytes | | Maximum vectors per index | 5,000,000 | | Maximum namespaces per index | 50,000 (Workers Paid) / 1000 (Free) | diff --git a/src/content/docs/vectorize/reference/client-api.mdx b/src/content/docs/vectorize/reference/client-api.mdx index 2b8d065f692929b..ff9193c8dac9eab 100644 --- a/src/content/docs/vectorize/reference/client-api.mdx +++ b/src/content/docs/vectorize/reference/client-api.mdx @@ -127,6 +127,22 @@ const details = await env.YOUR_INDEX.describe(); Retrieves the configuration of a given index directly, including its configured `dimensions` and distance `metric`. +### List Vectors + +List all vector identifiers in an index using paginated requests, returning up to 1000 vector identifiers per page. + +```sh +wrangler vectorize list-vectors [--count=] [--cursor=] +``` + +**Parameters:** + +- `` - The name of your Vectorize index +- `--count` (optional) - Number of vector IDs to return per page. Must be between 1 and 1000 (default: 100) +- `--cursor` (optional) - Pagination cursor from the previous response to continue listing from that position + +For detailed guidance on pagination behavior and best practices, refer to [List vectors best practices](/vectorize/best-practices/list-vectors/). + ### Create Metadata Index Enable metadata filtering on the specified property. Limited to 10 properties. diff --git a/src/content/partials/workers/wrangler-commands/vectorize.mdx b/src/content/partials/workers/wrangler-commands/vectorize.mdx index f9dbe0b39d1b9ca..ab348d67e416a59 100644 --- a/src/content/partials/workers/wrangler-commands/vectorize.mdx +++ b/src/content/partials/workers/wrangler-commands/vectorize.mdx @@ -130,6 +130,21 @@ npx wrangler vectorize query [OPTIONS] - `--filter` - Filter vectors based on this metadata filter. Example: `'{ 'p1': 'abc', 'p2': { '$ne': true }, 'p3': 10, 'p4': false, 'nested.p5': 'abcd' }'` + + +List vector identifiers in a Vectorize index in a paginated manner. + +```sh +npx wrangler vectorize list-vectors [OPTIONS] +``` + +- `INDEX_NAME` + - The name of the Vectorize index from which vector identifiers need to be listed. +- `--count` + - Number of vector IDs to return per page. Must be between 1 and 1000 (default: `100`). +- `--cursor` + - Pagination cursor from the previous response to continue listing from that position. + Fetch vectors from a Vectorize index using the provided ids. diff --git a/src/content/release-notes/vectorize.yaml b/src/content/release-notes/vectorize.yaml index 891a8eaab0000d7..107b3bfc26b98e5 100644 --- a/src/content/release-notes/vectorize.yaml +++ b/src/content/release-notes/vectorize.yaml @@ -3,6 +3,11 @@ link: "/vectorize/platform/changelog/" productName: Vectorize productLink: "/vectorize/" entries: + - publish_date: "2025-08-25" + title: Added support for the list-vectors operation + description: |- + Vectorize now supports iteration through all the vector identifiers in an index in a paginated manner using the list-vectors operation. + - publish_date: "2024-12-20" title: Added support for index name reuse description: |- From a8e151ce14527e350811c1b103f069e71300d224 Mon Sep 17 00:00:00 2001 From: Jun Lee Date: Tue, 26 Aug 2025 09:16:59 +0100 Subject: [PATCH 2/2] PCX review --- src/content/docs/vectorize/best-practices/list-vectors.mdx | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/content/docs/vectorize/best-practices/list-vectors.mdx b/src/content/docs/vectorize/best-practices/list-vectors.mdx index 0517fa0e47bd9ef..64a0d9743bad66b 100644 --- a/src/content/docs/vectorize/best-practices/list-vectors.mdx +++ b/src/content/docs/vectorize/best-practices/list-vectors.mdx @@ -12,9 +12,9 @@ The list-vectors operation allows you to enumerate all vector identifiers in a V Use list-vectors for: -- **Bulk operations**: When you need to process all vectors in an index +- **Bulk operations**: To process all vectors in an index - **Auditing**: To verify the contents of your index or generate reports -- **Data migration**: When moving vectors between indexes or systems +- **Data migration**: To move vectors between indexes or systems - **Cleanup operations**: To identify and remove outdated vectors ## Pagination behavior @@ -39,7 +39,7 @@ Each response includes: - `totalCount`: Total number of vectors in the index - `isTruncated`: Whether there are more vectors available - `nextCursor`: Cursor for the next page (null if no more results) -- `cursorExpirationTimestamp`: When the cursor expires +- `cursorExpirationTimestamp`: Timestamp of when the cursor expires - `vectors`: Array of vector identifiers ### Cursor expiration