vectors - list vectors - add segments and nextToken#9516
vectors - list vectors - add segments and nextToken#9516alphaprinz wants to merge 1 commit intonoobaa:masterfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughpost_list_vectors adds request-body validation for nextToken and segment params, forwards maxResults/segmentCount/segmentIndex/nextToken to the SDK, and may throw VectorError with a fieldList. LanceConn.list_vectors now accepts a params object, implements segment-aware pagination and returns an optional nextToken. VectorError supports fieldList. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant API as Vector API
participant SDK as Vectors SDK (list_vectors)
participant Storage as LanceConn/Table
Client->>API: POST /vectors.list { maxResults, nextToken?, segmentCount?, segmentIndex? }
API->>API: next_token_sanity_check(nextToken)\nsegment_validate(segmentCount, segmentIndex)\naccumulate fieldList
alt invalid params
API-->>Client: 400 VectorError { code: ValidationException, fieldList:[{path:"...", message:"..."}] }
else valid params
API->>SDK: list_vectors(params { max_results, next_token, segment_count, segment_index })
SDK->>Storage: open table / compute count / determine segment start/end
Storage-->>SDK: row count / table handle
SDK->>Storage: query with offset and limit (capped to segment end)
Storage-->>SDK: vectors page
SDK-->>API: { vectors, nextToken? }
API-->>Client: 200 { vectors, nextToken? }
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/endpoint/vector/ops/vector_list_vectors.js`:
- Around line 13-22: The call to req.vector_sdk.list_vectors uses
`req.body.maxResults || 500`, which converts an explicit 0 to 500; change that
to use the nullish coalescing operator so only null/undefined default: replace
the `max_results: req.body.maxResults || 500` usage with `max_results:
req.body.maxResults ?? 500` (locate this in the list_vectors call in
vector_list_vectors.js) so an explicit 0 is preserved and only null/undefined
falls back to 500.
In `@src/util/vectors_util.js`:
- Around line 114-125: The code assumes get_table(table_name) always returns a
table and immediately calls table.countRows(), which causes a TypeError when an
index exists but has no Lance table (created later by create_vector_index on
first insert); add a null/undefined guard after const table = await
this.get_table(table_name) to handle the "no table" case (treat as empty index
or return appropriate empty response/error) before any dereference, e.g., in the
logic that computes segments using table.countRows(), update the branch to
handle table === null/undefined (use 0 rows or short-circuit to the same
empty-result behavior as a non-existent index).
- Around line 121-143: The code trusts client-provided pagination values
(params.segment_count, params.segment_index, params.next_token,
params.max_results) before computing offsets and calling
query.offset()/query.limit(), so add defensive validation: after computing rows
(from table.countRows()) and before setting params.next_token or parsing it,
ensure params.segment_count and params.segment_index are integers >=1 and >=0
respectively and that segment_index < segment_count; when parsing
params.next_token check it has exactly two parts, both parse to finite
non-negative integers (offset and end), and that end > offset and offset < rows;
also validate params.max_results is a positive integer and clamp computed limit
to max(0, end - offset) to avoid NaN/negative values. Update the logic around
next_token_parts, offset, end, and the limit calculation used with
query.offset() and query.limit() to return an error/throw or normalize values
when validation fails.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: abce8980-3e53-476e-b495-52d096218b5a
📒 Files selected for processing (3)
src/endpoint/vector/ops/vector_list_vectors.jssrc/test/integration_tests/api/vectors/test_vectors_ops.jssrc/util/vectors_util.js
| const table = await this.get_table(table_name); | ||
| //TODO - check if(!table) | ||
|
|
||
| //determine this request place in the table | ||
| if (!params.next_token) { | ||
| //if there's no next_token, we're either in a segmented list_vector op or not segmented | ||
| //for not-segmented, create a default dummy segment which will include all rows | ||
| params.segment_count = params.segment_count || 1; | ||
| params.segment_index = params.segment_index || 0; | ||
| //now that we definitely have a segment, calc it's start and end for the given index | ||
| const rows = await table.countRows(); | ||
| const segment = Math.floor(rows / params.segment_count); //total number of rows in a segment |
There was a problem hiding this comment.
Handle the empty-index path before dereferencing table.
Line 114 can legitimately return no table: create_vector_index() defers Lance table creation until the first insert. The next access to table.countRows() then turns a valid list on an empty index into a TypeError/500. If "no table" is also your no-such-index signal, do that existence check before calling into Lance.
🩹 Suggested guard
const table = await this.get_table(table_name);
- //TODO - check if(!table)
+ if (!table) {
+ return { vectors: [], nextToken: undefined };
+ }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/util/vectors_util.js` around lines 114 - 125, The code assumes
get_table(table_name) always returns a table and immediately calls
table.countRows(), which causes a TypeError when an index exists but has no
Lance table (created later by create_vector_index on first insert); add a
null/undefined guard after const table = await this.get_table(table_name) to
handle the "no table" case (treat as empty index or return appropriate empty
response/error) before any dereference, e.g., in the logic that computes
segments using table.countRows(), update the branch to handle table ===
null/undefined (use 0 rows or short-circuit to the same empty-result behavior as
a non-existent index).
a4c5854 to
f2bcbbf
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/util/vectors_util.js`:
- Around line 143-144: The limit calculation can receive undefined because
params.max_results may be missing; add a defensive default at the start of the
function that reads params (e.g., create a local const maxResults =
(params.max_results ?? SOME_DEFAULT) and coerce to a safe integer/positive
value), then replace uses of params.max_results in the limit calculation with
maxResults and call query.limit with that computed numeric value; update any
variable names accordingly in the function that contains offset and query.limit
to ensure no NaN/undefined is passed to query.limit.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: fdcfab19-e448-4eda-851f-a610ce402f6c
📒 Files selected for processing (3)
src/endpoint/vector/ops/vector_list_vectors.jssrc/test/integration_tests/api/vectors/test_vectors_ops.jssrc/util/vectors_util.js
🚧 Files skipped from review as they are similar to previous changes (1)
- src/test/integration_tests/api/vectors/test_vectors_ops.js
f2bcbbf to
170e146
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (3)
src/util/vectors_util.js (2)
114-124:⚠️ Potential issue | 🟠 MajorStill dereferencing
tablebefore the empty-index case.Line 124 assumes
get_table()always returns a Lance table, butcreate_vector_index()deliberately defers table creation until the first insert. A valid-but-empty index will still faillist_vectors()with a TypeError/500 unless this path short-circuits or maps to the not-found case first.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/util/vectors_util.js` around lines 114 - 124, The code in list_vectors dereferences the result of get_table (using table.countRows()) without handling the case where create_vector_index defers table creation for an empty index; update the list_vectors flow to check whether get_table(table_name) returned a valid table before calling table.countRows(), and if it returned null/undefined treat it as the empty-index/not-found case (short-circuit to the same response path used for a missing index) so that create_vector_index-created-but-empty indexes do not throw a TypeError; reference get_table, list_vectors, create_vector_index, and the table.countRows() call when locating and updating the logic.
121-131:⚠️ Potential issue | 🟠 MajorStill missing integer/bounds validation for segment pagination inputs.
segment_count/segment_indexare still trusted here. Values likesegment_count <= 0orsegment_index >= segment_countgenerate bogusstart/endwindows, and the||default onsegment_countsilently rewrites an explicit0into a different request. Please switch to nullish defaults plus explicit integer/bounds checks before computing the slice.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/util/vectors_util.js` around lines 121 - 131, params.segment_count and params.segment_index are trusted and can be invalid; replace the current falsy defaults with nullish defaults (use params.segment_count ?? 1 and params.segment_index ?? 0), validate they are integers (e.g., Number.isInteger) and enforce bounds (segment_count must be >=1, segment_index must be >=0 and < segment_count); clamp or throw a clear error if invalid, then compute segment via table.countRows(), and set start/end/params.next_token using the validated values (refer to params.segment_count, params.segment_index, table.countRows(), segment, start, end, params.next_token).src/endpoint/vector/ops/vector_list_vectors.js (1)
16-24:⚠️ Potential issue | 🟠 MajorMake
nextTokenvalidation type-safe and integer-only.Line 45 calls string methods on raw request input, so
{}/123becomes a 500 instead of theValidationExceptionyou intend. The helper also still accepts tokens like-1_5or1.5_4, even though the downstream pagination code treats both parts as row offsets.🔒 Minimal hardening
function next_token_sanity_check(next_token) { - if (!next_token) return true; - const delim = next_token.indexOf('_'); - if (delim === -1) return false; - const split = next_token.split('_'); - if (split.length !== 2) return false; - if (split[0] === '' || split[1] === '') return false; - const start = Number(split[0]); - const end = Number(split[1]); - if (Number.isNaN(start) || Number.isNaN(end)) return false; - if (start > end) return false; - return true; + if (next_token === undefined || next_token === null) return true; + if (typeof next_token !== 'string') return false; + const match = /^(\d+)_(\d+)$/.exec(next_token); + if (!match) return false; + return Number(match[1]) <= Number(match[2]); }Also applies to: 42-55
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/endpoint/vector/ops/vector_list_vectors.js` around lines 16 - 24, The nextToken validation currently treats raw input as a string and allows invalid tokens like objects, numbers, negative or dotted/underscored strings; update next_token_sanity_check (and the call site using next_token) to be type-safe and accept only non-negative integer tokens: first check typeof next_token === 'string' (or explicitly handle undefined/null as allowed/absent), then validate against a strict regex like /^[0-9]+$/ (reject floats, negatives, underscores, and non-string types), and if invalid throw the same VectorError.ValidationException with fieldList [{path: 'nextToken', message: 'Bad nextToken'}]; reference functions/identifiers: next_token, next_token_sanity_check, VectorError.ValidationException.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@src/endpoint/vector/ops/vector_list_vectors.js`:
- Around line 16-24: The nextToken validation currently treats raw input as a
string and allows invalid tokens like objects, numbers, negative or
dotted/underscored strings; update next_token_sanity_check (and the call site
using next_token) to be type-safe and accept only non-negative integer tokens:
first check typeof next_token === 'string' (or explicitly handle undefined/null
as allowed/absent), then validate against a strict regex like /^[0-9]+$/ (reject
floats, negatives, underscores, and non-string types), and if invalid throw the
same VectorError.ValidationException with fieldList [{path: 'nextToken',
message: 'Bad nextToken'}]; reference functions/identifiers: next_token,
next_token_sanity_check, VectorError.ValidationException.
In `@src/util/vectors_util.js`:
- Around line 114-124: The code in list_vectors dereferences the result of
get_table (using table.countRows()) without handling the case where
create_vector_index defers table creation for an empty index; update the
list_vectors flow to check whether get_table(table_name) returned a valid table
before calling table.countRows(), and if it returned null/undefined treat it as
the empty-index/not-found case (short-circuit to the same response path used for
a missing index) so that create_vector_index-created-but-empty indexes do not
throw a TypeError; reference get_table, list_vectors, create_vector_index, and
the table.countRows() call when locating and updating the logic.
- Around line 121-131: params.segment_count and params.segment_index are trusted
and can be invalid; replace the current falsy defaults with nullish defaults
(use params.segment_count ?? 1 and params.segment_index ?? 0), validate they are
integers (e.g., Number.isInteger) and enforce bounds (segment_count must be >=1,
segment_index must be >=0 and < segment_count); clamp or throw a clear error if
invalid, then compute segment via table.countRows(), and set
start/end/params.next_token using the validated values (refer to
params.segment_count, params.segment_index, table.countRows(), segment, start,
end, params.next_token).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 2abacf1a-28b2-482d-a5f5-3209fea71a7e
📒 Files selected for processing (4)
src/endpoint/vector/ops/vector_list_vectors.jssrc/endpoint/vector/vector_errors.jssrc/test/integration_tests/api/vectors/test_vectors_ops.jssrc/util/vectors_util.js
✅ Files skipped from review due to trivial changes (1)
- src/test/integration_tests/api/vectors/test_vectors_ops.js
170e146 to
8eb52f0
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (2)
src/test/integration_tests/api/vectors/test_vectors_ops.js (1)
210-348: Consider adding negative test cases for invalidnextToken.The new tests comprehensively cover happy paths. For completeness, consider adding a test that verifies the
ValidationExceptionis thrown for malformednextTokenvalues (e.g.,"invalid","5_3"where start > end,"1_2_3"with multiple underscores).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/test/integration_tests/api/vectors/test_vectors_ops.js` around lines 210 - 348, Add negative tests near the existing "should list vectors" cases that send s3vectors.ListVectorsCommand via send(s3_vectors_client, ...) with malformed nextToken values and assert a ValidationException is thrown; specifically create tests that reuse vectorBucketName/indexName used in the other cases and call ListVectorsCommand with nextToken values such as "invalid", "5_3" (start > end), and "1_2_3" (too many underscores), verifying the response throws/returns a ValidationException for each case so the ListVectorsCommand validation logic is exercised.src/endpoint/vector/ops/vector_list_vectors.js (1)
43-55: Redundant delimiter check.Lines 45-46 are unnecessary since
split('_')on a string without underscores returns a single-element array, which fails thesplit.length !== 2check on line 48.♻️ Simplified version
function next_token_sanity_check(next_token) { if (!next_token) return true; - const delim = next_token.indexOf('_'); - if (delim === -1) return false; const split = next_token.split('_'); if (split.length !== 2) return false; if (split[0] === '' || split[1] === '') return false; const start = Number(split[0]); const end = Number(split[1]); if (Number.isNaN(start) || Number.isNaN(end)) return false; if (start > end) return false; return true; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/endpoint/vector/ops/vector_list_vectors.js` around lines 43 - 55, The function next_token_sanity_check contains a redundant delimiter check (delim/indexOf) — remove the two lines that compute and check delim and instead rely on split('_') and the subsequent split.length !== 2 check; implement the function so it first returns true for falsy next_token, then splits with next_token.split('_'), verifies split.length === 2, ensures neither split[0] nor split[1] are empty, converts to Number and checks Number.isNaN and that start <= end, preserving the existing validation behavior in next_token_sanity_check.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/endpoint/vector/ops/vector_list_vectors.js`:
- Around line 43-55: The function next_token_sanity_check contains a redundant
delimiter check (delim/indexOf) — remove the two lines that compute and check
delim and instead rely on split('_') and the subsequent split.length !== 2
check; implement the function so it first returns true for falsy next_token,
then splits with next_token.split('_'), verifies split.length === 2, ensures
neither split[0] nor split[1] are empty, converts to Number and checks
Number.isNaN and that start <= end, preserving the existing validation behavior
in next_token_sanity_check.
In `@src/test/integration_tests/api/vectors/test_vectors_ops.js`:
- Around line 210-348: Add negative tests near the existing "should list
vectors" cases that send s3vectors.ListVectorsCommand via
send(s3_vectors_client, ...) with malformed nextToken values and assert a
ValidationException is thrown; specifically create tests that reuse
vectorBucketName/indexName used in the other cases and call ListVectorsCommand
with nextToken values such as "invalid", "5_3" (start > end), and "1_2_3" (too
many underscores), verifying the response throws/returns a ValidationException
for each case so the ListVectorsCommand validation logic is exercised.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 89c2d7d5-a616-4808-b4f9-529708b3a9ab
📒 Files selected for processing (4)
src/endpoint/vector/ops/vector_list_vectors.jssrc/endpoint/vector/vector_errors.jssrc/test/integration_tests/api/vectors/test_vectors_ops.jssrc/util/vectors_util.js
✅ Files skipped from review due to trivial changes (1)
- src/util/vectors_util.js
🚧 Files skipped from review as they are similar to previous changes (1)
- src/endpoint/vector/vector_errors.js
| //TODO - this won't work for large tables. | ||
| //support aws segments/tokens? | ||
| query.offset(offset); | ||
| //limit can't get us over end |
There was a problem hiding this comment.
We can't go beyond end but we seem to have no limits on the params.next_token. A client can construct a request like 0_9999999 right?
There was a problem hiding this comment.
Well, yes. Although note max_results limit is 1000.
There was a problem hiding this comment.
max_results isn't being validated though.. right? It can be more than 1000 or negative too right?
8eb52f0 to
889deca
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (4)
src/endpoint/vector/ops/vector_list_vectors.js (2)
23-25:⚠️ Potential issue | 🟠 MajorValidate
maxResults,segmentCount, andsegmentIndexas integers.
segment_validate()still accepts strings/floats, andmaxResultsis forwarded without any bounds check. That breaks the pagination math locally — e.g.segmentIndex: "1"computes the wrong segment end, andmaxResults: -1makes the page window move backwards.Suggested fix
const fieldList = []; + const max_results = req.body.maxResults; + if (max_results != null && (!Number.isInteger(max_results) || max_results < 1 || max_results > 500)) { + fieldList.push({ path: 'maxResults', message: 'Must be an integer between 1 and 500.' }); + } const next_token = req.body.nextToken; if (!next_token_sanity_check(next_token)) { fieldList.push({path: 'nextToken', message: "Bad nextToken"}) } @@ const list = await req.vector_sdk.list_vectors({ vector_bucket_name: req.body.vectorBucketName, vector_index_name: req.body.indexName, - max_results: req.body.maxResults || 500, + max_results: max_results == null ? 500 : max_results, return_data: req.body.returnData, return_metadata: req.body.returnMetadata, segment_count, segment_index, next_token, @@ function segment_validate(count, index) { if (count == null && index == null) return; if (count == null) return {path: "segmentCount", message: "Missing"}; if (index == null) return {path: "segmentIndex", message: "Missing"}; + if (!Number.isInteger(count)) return {path: "segmentCount", message: "Must be an integer."}; + if (!Number.isInteger(index)) return {path: "segmentIndex", message: "Must be an integer."}; if (count <= 0) return {path: "segmentCount", message: "Must be greater than zero."}; if (index < 0) return {path: "segmentIndex", message: "Cannot be negative."}; if (index >= count) return {path: "segmentIndex", message: "Must be less than segmentCount."}; }Also applies to: 41-49, 57-64
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/endpoint/vector/ops/vector_list_vectors.js` around lines 23 - 25, Convert and validate req.body.segmentCount, req.body.segmentIndex, and req.body.maxResults to integers before use: parseInt the values used to set segment_count, segment_index and maxResults, then call segment_validate(segment_count, segment_index) with those integers; add bounds checks to ensure segment_count and segment_index are non-negative integers and that maxResults is an integer > 0 (and optionally cap it to a safe maximum). Update all occurrences where segment_count/segment_index/maxResults are read (e.g., the variables set in vector_list_vectors.js and later uses around the pagination math) to use the validated integer values and return a validation_error if any check fails.
18-20:⚠️ Potential issue | 🟠 MajorOnly
null/undefinedshould mean “no nextToken”.
if (!next_token)treats""and0as missing, so an explicitly bad token falls back to page 1 instead of returning a validation error. A truthy non-string like{}also throws atindexOf()and bypasses thefieldListpath entirely.Suggested fix
function next_token_sanity_check(next_token) { - if (!next_token) return true; + if (next_token == null) return true; + if (typeof next_token !== 'string') return false; const delim = next_token.indexOf('_'); if (delim === -1) return false;Also applies to: 67-79
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/endpoint/vector/ops/vector_list_vectors.js` around lines 18 - 20, The code treats any falsy nextToken as missing; instead explicitly treat only null/undefined as absent and validate all other values. Change the logic around the next_token variable so it first checks if next_token === null || next_token === undefined (handle as no token), else run next_token_sanity_check(next_token) and push to fieldList when that check fails; apply the same change for the second occurrence in the file (lines ~67-79). Reference: variable next_token, function next_token_sanity_check, and fieldList.src/util/vectors_util.js (2)
114-124:⚠️ Potential issue | 🟠 MajorGuard empty indexes before dereferencing
table.
create_vector_index()in this file intentionally defers Lance table creation until the first insert, so a valid list on a newly created index can still reach this path withtable === undefined. The currentcountRows()/query()flow then turns an empty result into a 500.Suggested fix
const table = await this.get_table(table_name); - //TODO - check if(!table) + if (!table) { + return { vectors: [], nextToken: undefined }; + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/util/vectors_util.js` around lines 114 - 124, The code assumes table exists when calling table.countRows() / table.query() which breaks for indexes created but with no Lance table; add a guard after const table = await this.get_table(table_name) to handle table === undefined by returning an empty paginated response (or treating rows as 0) so the subsequent logic that uses table.countRows() and table.query() is skipped; update the logic in the pagination branch that uses params.next_token, and ensure functions get_table and any callers of this path (e.g., create_vector_index flow) get the empty result behavior for missing tables to avoid a 500.
134-152:⚠️ Potential issue | 🟠 MajorClamp client-supplied tokens to the current row count.
A forged token like
0_9999999still passes the handler’s syntax check. Onceoffsetmoves past the real table size, this code keeps returningvectors: []with anothernextToken, so callers can paginate through many empty pages.Suggested fix
const table = await this.get_table(table_name); if (!table) { return { vectors: [], nextToken: undefined }; } + const rows = await table.countRows(); //determine this request place in the table if (!params.next_token) { //if there's no next_token, we're either in a segmented list_vector op or not segmented //for not-segmented, create a default dummy segment which will include all rows params.segment_count = params.segment_count || 1; params.segment_index = params.segment_index || 0; //now that we definitely have a segment, calc it's start and end for the given index - const rows = await table.countRows(); const segment = Math.floor(rows / params.segment_count); //total number of rows in a segment const start = segment * params.segment_index; //the last will pick up the remainder const end = (params.segment_index + 1 === params.segment_count) ? rows : (segment * (params.segment_index + 1)); //continue as if we have next_token dbg.log0("rows = ", rows, ", segment = ", segment, ", start = ", start, ", end =", end); params.next_token = start + "_" + end; } //next_token syntax is currentOffset_endOfSegment const next_token_parts = params.next_token.split('_'); const offset = Number(next_token_parts[0]); - const end = Number(next_token_parts[1]); + const end = Math.min(Number(next_token_parts[1]), rows); + if (offset >= end) { + return { vectors: [], nextToken: undefined }; + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/util/vectors_util.js` around lines 134 - 152, Clamp the parsed next_token bounds against the actual table size: after parsing next_token_parts into offset and end, fetch the real row count from the Lance table (e.g. await table.count() or table.numRows()) and set end = Math.min(end, totalRows) and offset = Math.min(offset, totalRows); if offset >= totalRows return vectors: [] with nextToken: undefined; otherwise compute limit using the clamped values (limit = Math.min(params.max_results, end - offset)), apply query.offset(offset) and query.limit(limit), and produce nextToken only if offset + limit < end. Update references: next_token_parts, offset, end, query.offset, query.limit, nextToken, and this._lance_to_aws.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@src/endpoint/vector/ops/vector_list_vectors.js`:
- Around line 23-25: Convert and validate req.body.segmentCount,
req.body.segmentIndex, and req.body.maxResults to integers before use: parseInt
the values used to set segment_count, segment_index and maxResults, then call
segment_validate(segment_count, segment_index) with those integers; add bounds
checks to ensure segment_count and segment_index are non-negative integers and
that maxResults is an integer > 0 (and optionally cap it to a safe maximum).
Update all occurrences where segment_count/segment_index/maxResults are read
(e.g., the variables set in vector_list_vectors.js and later uses around the
pagination math) to use the validated integer values and return a
validation_error if any check fails.
- Around line 18-20: The code treats any falsy nextToken as missing; instead
explicitly treat only null/undefined as absent and validate all other values.
Change the logic around the next_token variable so it first checks if next_token
=== null || next_token === undefined (handle as no token), else run
next_token_sanity_check(next_token) and push to fieldList when that check fails;
apply the same change for the second occurrence in the file (lines ~67-79).
Reference: variable next_token, function next_token_sanity_check, and fieldList.
In `@src/util/vectors_util.js`:
- Around line 114-124: The code assumes table exists when calling
table.countRows() / table.query() which breaks for indexes created but with no
Lance table; add a guard after const table = await this.get_table(table_name) to
handle table === undefined by returning an empty paginated response (or treating
rows as 0) so the subsequent logic that uses table.countRows() and table.query()
is skipped; update the logic in the pagination branch that uses
params.next_token, and ensure functions get_table and any callers of this path
(e.g., create_vector_index flow) get the empty result behavior for missing
tables to avoid a 500.
- Around line 134-152: Clamp the parsed next_token bounds against the actual
table size: after parsing next_token_parts into offset and end, fetch the real
row count from the Lance table (e.g. await table.count() or table.numRows()) and
set end = Math.min(end, totalRows) and offset = Math.min(offset, totalRows); if
offset >= totalRows return vectors: [] with nextToken: undefined; otherwise
compute limit using the clamped values (limit = Math.min(params.max_results, end
- offset)), apply query.offset(offset) and query.limit(limit), and produce
nextToken only if offset + limit < end. Update references: next_token_parts,
offset, end, query.offset, query.limit, nextToken, and this._lance_to_aws.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: b156c22e-3f6c-4700-9c2d-66646defdf52
📒 Files selected for processing (4)
src/endpoint/vector/ops/vector_list_vectors.jssrc/endpoint/vector/vector_errors.jssrc/test/integration_tests/api/vectors/test_vectors_ops.jssrc/util/vectors_util.js
🚧 Files skipped from review as they are similar to previous changes (2)
- src/endpoint/vector/vector_errors.js
- src/test/integration_tests/api/vectors/test_vectors_ops.js
0a7e4f3 to
cf26b25
Compare
tangledbytes
left a comment
There was a problem hiding this comment.
LGTM in general. Just one max_results validation related comment.
Signed-off-by: Amit Prinz Setter <alphaprinz@gmail.com>
cf26b25 to
c3f979e
Compare
Describe the Problem
Allow object (vector buckets, vector indices, vectors) to be listed in several independent s3 op invocations.
Explain the Changes
Issues: Fixed #xxx / Gap #xxx
Testing Instructions:
Summary by CodeRabbit
New Features
Bug Fixes
Tests