Skip to content

vectors - list vectors - add segments and nextToken#9516

Open
alphaprinz wants to merge 1 commit intonoobaa:masterfrom
alphaprinz:vectors_nextToken_segments
Open

vectors - list vectors - add segments and nextToken#9516
alphaprinz wants to merge 1 commit intonoobaa:masterfrom
alphaprinz:vectors_nextToken_segments

Conversation

@alphaprinz
Copy link
Copy Markdown
Contributor

@alphaprinz alphaprinz commented Mar 25, 2026

Describe the Problem

Allow object (vector buckets, vector indices, vectors) to be listed in several independent s3 op invocations.

Explain the Changes

  1. Implement nextToken and segment params.
  2. For db objects (vector bucket and indices), skip to nextToken and return from there.
  3. For vector segments, split up the table and use lance table.offset to get to next staring point.

Issues: Fixed #xxx / Gap #xxx

  1. nextToken and segments params were ignored.

Testing Instructions:

  • Doc added/updated
  • Tests added

Summary by CodeRabbit

  • New Features

    • Vector listing: pagination (maxResults now defaults to 500), segmentation (segmentCount/segmentIndex), and combined paged-segment retrieval; responses include nextToken until a segment is exhausted.
  • Bug Fixes

    • Validation added for nextToken, segmentCount, and segmentIndex; malformed inputs return structured validation errors with field-level details.
  • Tests

    • Integration tests added for pagination, segmentation, and combined paged-segment flows.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 25, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

post_list_vectors adds request-body validation for nextToken and segment params, forwards maxResults/segmentCount/segmentIndex/nextToken to the SDK, and may throw VectorError with a fieldList. LanceConn.list_vectors now accepts a params object, implements segment-aware pagination and returns an optional nextToken. VectorError supports fieldList.

Changes

Cohort / File(s) Summary
Endpoint: list vectors
src/endpoint/vector/ops/vector_list_vectors.js
Validate nextToken (via next_token_sanity_check) and segmentCount/segmentIndex; collect field errors and throw VectorError (ValidationException) with fieldList; default max_results to 500 when falsy; forward segment_count, segment_index, and next_token to SDK.
Utils: Lance connector & wrapper
src/util/vectors_util.js
Refactor LanceConn.list_vectors and exported list_vectors to accept a single params object; derive table_name, compute/derive segment start/end when next_token absent, parse next_token as offset_end, apply offset and cap limit to segment end, return { vectors, nextToken }; add debug logging around table access and params.
Errors: validation payload
src/endpoint/vector/vector_errors.js
Extend VectorErrorSpec typedef and VectorError constructor to accept an optional fieldList; reply() now includes body.fieldList when present.
Tests: integration
src/test/integration_tests/api/vectors/test_vectors_ops.js
Add integration tests for pagination (maxResults), segment-based listing (segmentCount/segmentIndex), and combined segmentation+pagination; add ESLint suppression for long function.

Sequence Diagram(s)

sequenceDiagram
  participant Client as Client
  participant API as Vector API
  participant SDK as Vectors SDK (list_vectors)
  participant Storage as LanceConn/Table

  Client->>API: POST /vectors.list { maxResults, nextToken?, segmentCount?, segmentIndex? }
  API->>API: next_token_sanity_check(nextToken)\nsegment_validate(segmentCount, segmentIndex)\naccumulate fieldList
  alt invalid params
    API-->>Client: 400 VectorError { code: ValidationException, fieldList:[{path:"...", message:"..."}] }
  else valid params
    API->>SDK: list_vectors(params { max_results, next_token, segment_count, segment_index })
    SDK->>Storage: open table / compute count / determine segment start/end
    Storage-->>SDK: row count / table handle
    SDK->>Storage: query with offset and limit (capped to segment end)
    Storage-->>SDK: vectors page
    SDK-->>API: { vectors, nextToken? }
    API-->>Client: 200 { vectors, nextToken? }
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • guymguym
  • nimrod-becker
  • romayalon
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: adding segment and nextToken support to the list vectors functionality.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/endpoint/vector/ops/vector_list_vectors.js`:
- Around line 13-22: The call to req.vector_sdk.list_vectors uses
`req.body.maxResults || 500`, which converts an explicit 0 to 500; change that
to use the nullish coalescing operator so only null/undefined default: replace
the `max_results: req.body.maxResults || 500` usage with `max_results:
req.body.maxResults ?? 500` (locate this in the list_vectors call in
vector_list_vectors.js) so an explicit 0 is preserved and only null/undefined
falls back to 500.

In `@src/util/vectors_util.js`:
- Around line 114-125: The code assumes get_table(table_name) always returns a
table and immediately calls table.countRows(), which causes a TypeError when an
index exists but has no Lance table (created later by create_vector_index on
first insert); add a null/undefined guard after const table = await
this.get_table(table_name) to handle the "no table" case (treat as empty index
or return appropriate empty response/error) before any dereference, e.g., in the
logic that computes segments using table.countRows(), update the branch to
handle table === null/undefined (use 0 rows or short-circuit to the same
empty-result behavior as a non-existent index).
- Around line 121-143: The code trusts client-provided pagination values
(params.segment_count, params.segment_index, params.next_token,
params.max_results) before computing offsets and calling
query.offset()/query.limit(), so add defensive validation: after computing rows
(from table.countRows()) and before setting params.next_token or parsing it,
ensure params.segment_count and params.segment_index are integers >=1 and >=0
respectively and that segment_index < segment_count; when parsing
params.next_token check it has exactly two parts, both parse to finite
non-negative integers (offset and end), and that end > offset and offset < rows;
also validate params.max_results is a positive integer and clamp computed limit
to max(0, end - offset) to avoid NaN/negative values. Update the logic around
next_token_parts, offset, end, and the limit calculation used with
query.offset() and query.limit() to return an error/throw or normalize values
when validation fails.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: abce8980-3e53-476e-b495-52d096218b5a

📥 Commits

Reviewing files that changed from the base of the PR and between 0d398c2 and a4c5854.

📒 Files selected for processing (3)
  • src/endpoint/vector/ops/vector_list_vectors.js
  • src/test/integration_tests/api/vectors/test_vectors_ops.js
  • src/util/vectors_util.js

Comment on lines 114 to +125
const table = await this.get_table(table_name);
//TODO - check if(!table)

//determine this request place in the table
if (!params.next_token) {
//if there's no next_token, we're either in a segmented list_vector op or not segmented
//for not-segmented, create a default dummy segment which will include all rows
params.segment_count = params.segment_count || 1;
params.segment_index = params.segment_index || 0;
//now that we definitely have a segment, calc it's start and end for the given index
const rows = await table.countRows();
const segment = Math.floor(rows / params.segment_count); //total number of rows in a segment
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Handle the empty-index path before dereferencing table.

Line 114 can legitimately return no table: create_vector_index() defers Lance table creation until the first insert. The next access to table.countRows() then turns a valid list on an empty index into a TypeError/500. If "no table" is also your no-such-index signal, do that existence check before calling into Lance.

🩹 Suggested guard
         const table = await this.get_table(table_name);
-        //TODO - check if(!table)
+        if (!table) {
+            return { vectors: [], nextToken: undefined };
+        }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/util/vectors_util.js` around lines 114 - 125, The code assumes
get_table(table_name) always returns a table and immediately calls
table.countRows(), which causes a TypeError when an index exists but has no
Lance table (created later by create_vector_index on first insert); add a
null/undefined guard after const table = await this.get_table(table_name) to
handle the "no table" case (treat as empty index or return appropriate empty
response/error) before any dereference, e.g., in the logic that computes
segments using table.countRows(), update the branch to handle table ===
null/undefined (use 0 rows or short-circuit to the same empty-result behavior as
a non-existent index).

@alphaprinz alphaprinz force-pushed the vectors_nextToken_segments branch from a4c5854 to f2bcbbf Compare March 25, 2026 19:09
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/util/vectors_util.js`:
- Around line 143-144: The limit calculation can receive undefined because
params.max_results may be missing; add a defensive default at the start of the
function that reads params (e.g., create a local const maxResults =
(params.max_results ?? SOME_DEFAULT) and coerce to a safe integer/positive
value), then replace uses of params.max_results in the limit calculation with
maxResults and call query.limit with that computed numeric value; update any
variable names accordingly in the function that contains offset and query.limit
to ensure no NaN/undefined is passed to query.limit.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: fdcfab19-e448-4eda-851f-a610ce402f6c

📥 Commits

Reviewing files that changed from the base of the PR and between a4c5854 and f2bcbbf.

📒 Files selected for processing (3)
  • src/endpoint/vector/ops/vector_list_vectors.js
  • src/test/integration_tests/api/vectors/test_vectors_ops.js
  • src/util/vectors_util.js
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/test/integration_tests/api/vectors/test_vectors_ops.js

@alphaprinz alphaprinz force-pushed the vectors_nextToken_segments branch from f2bcbbf to 170e146 Compare March 25, 2026 23:14
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (3)
src/util/vectors_util.js (2)

114-124: ⚠️ Potential issue | 🟠 Major

Still dereferencing table before the empty-index case.

Line 124 assumes get_table() always returns a Lance table, but create_vector_index() deliberately defers table creation until the first insert. A valid-but-empty index will still fail list_vectors() with a TypeError/500 unless this path short-circuits or maps to the not-found case first.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/util/vectors_util.js` around lines 114 - 124, The code in list_vectors
dereferences the result of get_table (using table.countRows()) without handling
the case where create_vector_index defers table creation for an empty index;
update the list_vectors flow to check whether get_table(table_name) returned a
valid table before calling table.countRows(), and if it returned null/undefined
treat it as the empty-index/not-found case (short-circuit to the same response
path used for a missing index) so that create_vector_index-created-but-empty
indexes do not throw a TypeError; reference get_table, list_vectors,
create_vector_index, and the table.countRows() call when locating and updating
the logic.

121-131: ⚠️ Potential issue | 🟠 Major

Still missing integer/bounds validation for segment pagination inputs.

segment_count / segment_index are still trusted here. Values like segment_count <= 0 or segment_index >= segment_count generate bogus start/end windows, and the || default on segment_count silently rewrites an explicit 0 into a different request. Please switch to nullish defaults plus explicit integer/bounds checks before computing the slice.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/util/vectors_util.js` around lines 121 - 131, params.segment_count and
params.segment_index are trusted and can be invalid; replace the current falsy
defaults with nullish defaults (use params.segment_count ?? 1 and
params.segment_index ?? 0), validate they are integers (e.g., Number.isInteger)
and enforce bounds (segment_count must be >=1, segment_index must be >=0 and <
segment_count); clamp or throw a clear error if invalid, then compute segment
via table.countRows(), and set start/end/params.next_token using the validated
values (refer to params.segment_count, params.segment_index, table.countRows(),
segment, start, end, params.next_token).
src/endpoint/vector/ops/vector_list_vectors.js (1)

16-24: ⚠️ Potential issue | 🟠 Major

Make nextToken validation type-safe and integer-only.

Line 45 calls string methods on raw request input, so {} / 123 becomes a 500 instead of the ValidationException you intend. The helper also still accepts tokens like -1_5 or 1.5_4, even though the downstream pagination code treats both parts as row offsets.

🔒 Minimal hardening
 function next_token_sanity_check(next_token) {
-    if (!next_token) return true;
-     const delim = next_token.indexOf('_');
-    if (delim === -1) return false;
-    const split = next_token.split('_');
-    if (split.length !== 2) return false;
-    if (split[0] === '' || split[1] === '') return false;
-    const start = Number(split[0]);
-    const end = Number(split[1]);
-    if (Number.isNaN(start) || Number.isNaN(end)) return false;
-    if (start > end) return false;
-    return true;
+    if (next_token === undefined || next_token === null) return true;
+    if (typeof next_token !== 'string') return false;
+    const match = /^(\d+)_(\d+)$/.exec(next_token);
+    if (!match) return false;
+    return Number(match[1]) <= Number(match[2]);
 }

Also applies to: 42-55

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/endpoint/vector/ops/vector_list_vectors.js` around lines 16 - 24, The
nextToken validation currently treats raw input as a string and allows invalid
tokens like objects, numbers, negative or dotted/underscored strings; update
next_token_sanity_check (and the call site using next_token) to be type-safe and
accept only non-negative integer tokens: first check typeof next_token ===
'string' (or explicitly handle undefined/null as allowed/absent), then validate
against a strict regex like /^[0-9]+$/ (reject floats, negatives, underscores,
and non-string types), and if invalid throw the same
VectorError.ValidationException with fieldList [{path: 'nextToken', message:
'Bad nextToken'}]; reference functions/identifiers: next_token,
next_token_sanity_check, VectorError.ValidationException.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/endpoint/vector/ops/vector_list_vectors.js`:
- Around line 16-24: The nextToken validation currently treats raw input as a
string and allows invalid tokens like objects, numbers, negative or
dotted/underscored strings; update next_token_sanity_check (and the call site
using next_token) to be type-safe and accept only non-negative integer tokens:
first check typeof next_token === 'string' (or explicitly handle undefined/null
as allowed/absent), then validate against a strict regex like /^[0-9]+$/ (reject
floats, negatives, underscores, and non-string types), and if invalid throw the
same VectorError.ValidationException with fieldList [{path: 'nextToken',
message: 'Bad nextToken'}]; reference functions/identifiers: next_token,
next_token_sanity_check, VectorError.ValidationException.

In `@src/util/vectors_util.js`:
- Around line 114-124: The code in list_vectors dereferences the result of
get_table (using table.countRows()) without handling the case where
create_vector_index defers table creation for an empty index; update the
list_vectors flow to check whether get_table(table_name) returned a valid table
before calling table.countRows(), and if it returned null/undefined treat it as
the empty-index/not-found case (short-circuit to the same response path used for
a missing index) so that create_vector_index-created-but-empty indexes do not
throw a TypeError; reference get_table, list_vectors, create_vector_index, and
the table.countRows() call when locating and updating the logic.
- Around line 121-131: params.segment_count and params.segment_index are trusted
and can be invalid; replace the current falsy defaults with nullish defaults
(use params.segment_count ?? 1 and params.segment_index ?? 0), validate they are
integers (e.g., Number.isInteger) and enforce bounds (segment_count must be >=1,
segment_index must be >=0 and < segment_count); clamp or throw a clear error if
invalid, then compute segment via table.countRows(), and set
start/end/params.next_token using the validated values (refer to
params.segment_count, params.segment_index, table.countRows(), segment, start,
end, params.next_token).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 2abacf1a-28b2-482d-a5f5-3209fea71a7e

📥 Commits

Reviewing files that changed from the base of the PR and between f2bcbbf and 170e146.

📒 Files selected for processing (4)
  • src/endpoint/vector/ops/vector_list_vectors.js
  • src/endpoint/vector/vector_errors.js
  • src/test/integration_tests/api/vectors/test_vectors_ops.js
  • src/util/vectors_util.js
✅ Files skipped from review due to trivial changes (1)
  • src/test/integration_tests/api/vectors/test_vectors_ops.js

@alphaprinz alphaprinz force-pushed the vectors_nextToken_segments branch from 170e146 to 8eb52f0 Compare March 27, 2026 00:03
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
src/test/integration_tests/api/vectors/test_vectors_ops.js (1)

210-348: Consider adding negative test cases for invalid nextToken.

The new tests comprehensively cover happy paths. For completeness, consider adding a test that verifies the ValidationException is thrown for malformed nextToken values (e.g., "invalid", "5_3" where start > end, "1_2_3" with multiple underscores).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/test/integration_tests/api/vectors/test_vectors_ops.js` around lines 210
- 348, Add negative tests near the existing "should list vectors" cases that
send s3vectors.ListVectorsCommand via send(s3_vectors_client, ...) with
malformed nextToken values and assert a ValidationException is thrown;
specifically create tests that reuse vectorBucketName/indexName used in the
other cases and call ListVectorsCommand with nextToken values such as "invalid",
"5_3" (start > end), and "1_2_3" (too many underscores), verifying the response
throws/returns a ValidationException for each case so the ListVectorsCommand
validation logic is exercised.
src/endpoint/vector/ops/vector_list_vectors.js (1)

43-55: Redundant delimiter check.

Lines 45-46 are unnecessary since split('_') on a string without underscores returns a single-element array, which fails the split.length !== 2 check on line 48.

♻️ Simplified version
 function next_token_sanity_check(next_token) {
     if (!next_token) return true;
-     const delim = next_token.indexOf('_');
-    if (delim === -1) return false;
     const split = next_token.split('_');
     if (split.length !== 2) return false;
     if (split[0] === '' || split[1] === '') return false;
     const start = Number(split[0]);
     const end = Number(split[1]);
     if (Number.isNaN(start) || Number.isNaN(end)) return false;
     if (start > end) return false;
     return true;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/endpoint/vector/ops/vector_list_vectors.js` around lines 43 - 55, The
function next_token_sanity_check contains a redundant delimiter check
(delim/indexOf) — remove the two lines that compute and check delim and instead
rely on split('_') and the subsequent split.length !== 2 check; implement the
function so it first returns true for falsy next_token, then splits with
next_token.split('_'), verifies split.length === 2, ensures neither split[0] nor
split[1] are empty, converts to Number and checks Number.isNaN and that start <=
end, preserving the existing validation behavior in next_token_sanity_check.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/endpoint/vector/ops/vector_list_vectors.js`:
- Around line 43-55: The function next_token_sanity_check contains a redundant
delimiter check (delim/indexOf) — remove the two lines that compute and check
delim and instead rely on split('_') and the subsequent split.length !== 2
check; implement the function so it first returns true for falsy next_token,
then splits with next_token.split('_'), verifies split.length === 2, ensures
neither split[0] nor split[1] are empty, converts to Number and checks
Number.isNaN and that start <= end, preserving the existing validation behavior
in next_token_sanity_check.

In `@src/test/integration_tests/api/vectors/test_vectors_ops.js`:
- Around line 210-348: Add negative tests near the existing "should list
vectors" cases that send s3vectors.ListVectorsCommand via
send(s3_vectors_client, ...) with malformed nextToken values and assert a
ValidationException is thrown; specifically create tests that reuse
vectorBucketName/indexName used in the other cases and call ListVectorsCommand
with nextToken values such as "invalid", "5_3" (start > end), and "1_2_3" (too
many underscores), verifying the response throws/returns a ValidationException
for each case so the ListVectorsCommand validation logic is exercised.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 89c2d7d5-a616-4808-b4f9-529708b3a9ab

📥 Commits

Reviewing files that changed from the base of the PR and between 170e146 and 8eb52f0.

📒 Files selected for processing (4)
  • src/endpoint/vector/ops/vector_list_vectors.js
  • src/endpoint/vector/vector_errors.js
  • src/test/integration_tests/api/vectors/test_vectors_ops.js
  • src/util/vectors_util.js
✅ Files skipped from review due to trivial changes (1)
  • src/util/vectors_util.js
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/endpoint/vector/vector_errors.js

//TODO - this won't work for large tables.
//support aws segments/tokens?
query.offset(offset);
//limit can't get us over end
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't go beyond end but we seem to have no limits on the params.next_token. A client can construct a request like 0_9999999 right?

Copy link
Copy Markdown
Contributor Author

@alphaprinz alphaprinz Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, yes. Although note max_results limit is 1000.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max_results isn't being validated though.. right? It can be more than 1000 or negative too right?

@alphaprinz alphaprinz force-pushed the vectors_nextToken_segments branch from 8eb52f0 to 889deca Compare March 27, 2026 17:37
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (4)
src/endpoint/vector/ops/vector_list_vectors.js (2)

23-25: ⚠️ Potential issue | 🟠 Major

Validate maxResults, segmentCount, and segmentIndex as integers.

segment_validate() still accepts strings/floats, and maxResults is forwarded without any bounds check. That breaks the pagination math locally — e.g. segmentIndex: "1" computes the wrong segment end, and maxResults: -1 makes the page window move backwards.

Suggested fix
     const fieldList = [];
+    const max_results = req.body.maxResults;
+    if (max_results != null && (!Number.isInteger(max_results) || max_results < 1 || max_results > 500)) {
+        fieldList.push({ path: 'maxResults', message: 'Must be an integer between 1 and 500.' });
+    }

     const next_token = req.body.nextToken;
     if (!next_token_sanity_check(next_token)) {
         fieldList.push({path: 'nextToken', message: "Bad nextToken"})
     }
@@
     const list = await req.vector_sdk.list_vectors({
         vector_bucket_name: req.body.vectorBucketName,
         vector_index_name: req.body.indexName,
-        max_results: req.body.maxResults || 500,
+        max_results: max_results == null ? 500 : max_results,
         return_data: req.body.returnData,
         return_metadata: req.body.returnMetadata,
         segment_count,
         segment_index,
         next_token,
@@
 function segment_validate(count, index) {
     if (count == null && index == null) return;
     if (count == null) return {path: "segmentCount", message: "Missing"};
     if (index == null) return {path: "segmentIndex", message: "Missing"};
+    if (!Number.isInteger(count)) return {path: "segmentCount", message: "Must be an integer."};
+    if (!Number.isInteger(index)) return {path: "segmentIndex", message: "Must be an integer."};
     if (count <= 0) return {path: "segmentCount", message: "Must be greater than zero."};
     if (index < 0) return {path: "segmentIndex", message: "Cannot be negative."};
     if (index >= count) return {path: "segmentIndex", message: "Must be less than segmentCount."};
 }

Also applies to: 41-49, 57-64

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/endpoint/vector/ops/vector_list_vectors.js` around lines 23 - 25, Convert
and validate req.body.segmentCount, req.body.segmentIndex, and
req.body.maxResults to integers before use: parseInt the values used to set
segment_count, segment_index and maxResults, then call
segment_validate(segment_count, segment_index) with those integers; add bounds
checks to ensure segment_count and segment_index are non-negative integers and
that maxResults is an integer > 0 (and optionally cap it to a safe maximum).
Update all occurrences where segment_count/segment_index/maxResults are read
(e.g., the variables set in vector_list_vectors.js and later uses around the
pagination math) to use the validated integer values and return a
validation_error if any check fails.

18-20: ⚠️ Potential issue | 🟠 Major

Only null / undefined should mean “no nextToken”.

if (!next_token) treats "" and 0 as missing, so an explicitly bad token falls back to page 1 instead of returning a validation error. A truthy non-string like {} also throws at indexOf() and bypasses the fieldList path entirely.

Suggested fix
 function next_token_sanity_check(next_token) {
-    if (!next_token) return true;
+    if (next_token == null) return true;
+    if (typeof next_token !== 'string') return false;
     const delim = next_token.indexOf('_');
     if (delim === -1) return false;

Also applies to: 67-79

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/endpoint/vector/ops/vector_list_vectors.js` around lines 18 - 20, The
code treats any falsy nextToken as missing; instead explicitly treat only
null/undefined as absent and validate all other values. Change the logic around
the next_token variable so it first checks if next_token === null || next_token
=== undefined (handle as no token), else run next_token_sanity_check(next_token)
and push to fieldList when that check fails; apply the same change for the
second occurrence in the file (lines ~67-79). Reference: variable next_token,
function next_token_sanity_check, and fieldList.
src/util/vectors_util.js (2)

114-124: ⚠️ Potential issue | 🟠 Major

Guard empty indexes before dereferencing table.

create_vector_index() in this file intentionally defers Lance table creation until the first insert, so a valid list on a newly created index can still reach this path with table === undefined. The current countRows() / query() flow then turns an empty result into a 500.

Suggested fix
         const table = await this.get_table(table_name);
-        //TODO - check if(!table)
+        if (!table) {
+            return { vectors: [], nextToken: undefined };
+        }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/util/vectors_util.js` around lines 114 - 124, The code assumes table
exists when calling table.countRows() / table.query() which breaks for indexes
created but with no Lance table; add a guard after const table = await
this.get_table(table_name) to handle table === undefined by returning an empty
paginated response (or treating rows as 0) so the subsequent logic that uses
table.countRows() and table.query() is skipped; update the logic in the
pagination branch that uses params.next_token, and ensure functions get_table
and any callers of this path (e.g., create_vector_index flow) get the empty
result behavior for missing tables to avoid a 500.

134-152: ⚠️ Potential issue | 🟠 Major

Clamp client-supplied tokens to the current row count.

A forged token like 0_9999999 still passes the handler’s syntax check. Once offset moves past the real table size, this code keeps returning vectors: [] with another nextToken, so callers can paginate through many empty pages.

Suggested fix
         const table = await this.get_table(table_name);
         if (!table) {
             return { vectors: [], nextToken: undefined };
         }
+        const rows = await table.countRows();

         //determine this request place in the table
         if (!params.next_token) {
             //if there's no next_token, we're either in a segmented list_vector op or not segmented
             //for not-segmented, create a default dummy segment which will include all rows
             params.segment_count = params.segment_count || 1;
             params.segment_index = params.segment_index || 0;
             //now that we definitely have a segment, calc it's start and end for the given index
-            const rows = await table.countRows();
             const segment = Math.floor(rows / params.segment_count); //total number of rows in a segment
             const start = segment * params.segment_index;
             //the last will pick up the remainder
             const end = (params.segment_index + 1 === params.segment_count) ? rows : (segment * (params.segment_index + 1));
             //continue as if we have next_token
             dbg.log0("rows = ", rows, ", segment = ", segment, ", start = ", start, ", end =", end);
             params.next_token = start + "_" + end;
         }

         //next_token syntax is currentOffset_endOfSegment
         const next_token_parts = params.next_token.split('_');
         const offset = Number(next_token_parts[0]);
-        const end = Number(next_token_parts[1]);
+        const end = Math.min(Number(next_token_parts[1]), rows);
+        if (offset >= end) {
+            return { vectors: [], nextToken: undefined };
+        }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/util/vectors_util.js` around lines 134 - 152, Clamp the parsed next_token
bounds against the actual table size: after parsing next_token_parts into offset
and end, fetch the real row count from the Lance table (e.g. await table.count()
or table.numRows()) and set end = Math.min(end, totalRows) and offset =
Math.min(offset, totalRows); if offset >= totalRows return vectors: [] with
nextToken: undefined; otherwise compute limit using the clamped values (limit =
Math.min(params.max_results, end - offset)), apply query.offset(offset) and
query.limit(limit), and produce nextToken only if offset + limit < end. Update
references: next_token_parts, offset, end, query.offset, query.limit, nextToken,
and this._lance_to_aws.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/endpoint/vector/ops/vector_list_vectors.js`:
- Around line 23-25: Convert and validate req.body.segmentCount,
req.body.segmentIndex, and req.body.maxResults to integers before use: parseInt
the values used to set segment_count, segment_index and maxResults, then call
segment_validate(segment_count, segment_index) with those integers; add bounds
checks to ensure segment_count and segment_index are non-negative integers and
that maxResults is an integer > 0 (and optionally cap it to a safe maximum).
Update all occurrences where segment_count/segment_index/maxResults are read
(e.g., the variables set in vector_list_vectors.js and later uses around the
pagination math) to use the validated integer values and return a
validation_error if any check fails.
- Around line 18-20: The code treats any falsy nextToken as missing; instead
explicitly treat only null/undefined as absent and validate all other values.
Change the logic around the next_token variable so it first checks if next_token
=== null || next_token === undefined (handle as no token), else run
next_token_sanity_check(next_token) and push to fieldList when that check fails;
apply the same change for the second occurrence in the file (lines ~67-79).
Reference: variable next_token, function next_token_sanity_check, and fieldList.

In `@src/util/vectors_util.js`:
- Around line 114-124: The code assumes table exists when calling
table.countRows() / table.query() which breaks for indexes created but with no
Lance table; add a guard after const table = await this.get_table(table_name) to
handle table === undefined by returning an empty paginated response (or treating
rows as 0) so the subsequent logic that uses table.countRows() and table.query()
is skipped; update the logic in the pagination branch that uses
params.next_token, and ensure functions get_table and any callers of this path
(e.g., create_vector_index flow) get the empty result behavior for missing
tables to avoid a 500.
- Around line 134-152: Clamp the parsed next_token bounds against the actual
table size: after parsing next_token_parts into offset and end, fetch the real
row count from the Lance table (e.g. await table.count() or table.numRows()) and
set end = Math.min(end, totalRows) and offset = Math.min(offset, totalRows); if
offset >= totalRows return vectors: [] with nextToken: undefined; otherwise
compute limit using the clamped values (limit = Math.min(params.max_results, end
- offset)), apply query.offset(offset) and query.limit(limit), and produce
nextToken only if offset + limit < end. Update references: next_token_parts,
offset, end, query.offset, query.limit, nextToken, and this._lance_to_aws.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b156c22e-3f6c-4700-9c2d-66646defdf52

📥 Commits

Reviewing files that changed from the base of the PR and between 8eb52f0 and 889deca.

📒 Files selected for processing (4)
  • src/endpoint/vector/ops/vector_list_vectors.js
  • src/endpoint/vector/vector_errors.js
  • src/test/integration_tests/api/vectors/test_vectors_ops.js
  • src/util/vectors_util.js
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/endpoint/vector/vector_errors.js
  • src/test/integration_tests/api/vectors/test_vectors_ops.js

@alphaprinz alphaprinz force-pushed the vectors_nextToken_segments branch 3 times, most recently from 0a7e4f3 to cf26b25 Compare March 30, 2026 17:25
@alphaprinz alphaprinz requested a review from tangledbytes March 30, 2026 17:28
Copy link
Copy Markdown
Member

@tangledbytes tangledbytes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general. Just one max_results validation related comment.

Signed-off-by: Amit Prinz Setter <alphaprinz@gmail.com>
@alphaprinz alphaprinz force-pushed the vectors_nextToken_segments branch from cf26b25 to c3f979e Compare April 1, 2026 18:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants