Conversation
Co-authored-by: Kent C. Dodds <me+github@kentcdodds.com>
Co-authored-by: Kent C. Dodds <me+github@kentcdodds.com>
Co-authored-by: Kent C. Dodds <me+github@kentcdodds.com>
Co-authored-by: Kent C. Dodds <me+github@kentcdodds.com>
Co-authored-by: Kent C. Dodds <me+github@kentcdodds.com>
Co-authored-by: Kent C. Dodds <me+github@kentcdodds.com>
Co-authored-by: Kent C. Dodds <me+github@kentcdodds.com>
|
Cursor Agent can help with this pull request. Just |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a comprehensive MSW Cloudflare API mock (AI Run and Vectorize) with an in-memory vector store and deterministic embeddings, new tests exercising those mocks, a tiny TypeScript cast in a transcription utility, and a small export addition for transistor mocks. (50 words) Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Test Client
participant MSW as MSW Handler
participant Store as In-Memory Vector Store
Client->>MSW: POST /accounts/:acct/ai/run (audio or text)
MSW->>MSW: parse path → model, check Content-Type
alt audio/*
MSW-->>Client: transcription envelope (result.text)
else text/embed
MSW->>MSW: deterministic embedding (SHA‑256)
MSW-->>Client: embeddings envelope (vectors)
end
Client->>MSW: POST /accounts/:acct/vectorize/.../upsert (NDJSON or multipart)
MSW->>MSW: parse NDJSON/multipart, validate
MSW->>Store: insert/upsert vectors with metadata
Store-->>MSW: ack
MSW-->>Client: upsert envelope (success)
Client->>MSW: POST /accounts/:acct/vectorize/.../query ({ query or vector })
MSW->>MSW: embed query if text, fetch vectors
MSW->>Store: retrieve candidates
MSW->>MSW: compute cosine similarity, normalize, sort
MSW-->>Client: query envelope (matches)
Client->>MSW: POST /accounts/:acct/vectorize/.../delete_by_ids ({ ids })
MSW->>Store: delete requested ids
Store-->>MSW: deleted count
MSW-->>Client: delete envelope (deleted count)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~70 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
mocks/cloudflare.ts (2)
30-33: Module-level mutable state with no reset mechanism.
vectorizeIndexesis a module-level singleton that accumulates state across test runs. There's no exported function to clear the store between tests (e.g., inafterEach). This will cause state leakage across tests, leading to flaky or order-dependent failures as the test suite grows.Consider exporting a
resetVectorizeStore()function:Proposed addition
const vectorizeIndexes = new Map<string, VectorizeIndexStore>() + +/** Clear all in-memory vector data. Call in `afterEach` / `beforeEach`. */ +export function resetVectorizeStore() { + vectorizeIndexes.clear() +}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@mocks/cloudflare.ts` around lines 30 - 33, The module-level Map vectorizeIndexes currently accumulates state across tests; add and export a resetVectorizeStore() function that clears its contents to allow test teardown to reset state. Implement resetVectorizeStore to call vectorizeIndexes.clear() (and also clear any nested VectorizeIndexStore Maps if you prefer deep-clearing) and export it from the module so tests can call it in afterEach to prevent cross-test leakage.
295-613: ~250 lines of near-identical v2/legacy handler duplication.Each operation (query, insert, upsert, delete_by_ids) is copy-pasted for v2 and legacy paths, differing only in the URL. This could be reduced to ~half the code with a small helper that generates handlers for both path variants:
Sketch
function vectorizeRoute( subpath: string, // e.g. "query", "insert", … handler: (args: { request: Request; accountId: string; indexName: string }) => Promise<Response>, ): HttpHandler[] { const cb = async ({ request, params }: any) => { requiredHeader(request.headers, 'authorization') return handler({ request, accountId: String(params.accountId), indexName: String(params.indexName), }) } return [ http.post(`${CLOUDFLARE_API_BASE}/accounts/:accountId/vectorize/v2/indexes/:indexName/${subpath}`, cb), http.post(`${CLOUDFLARE_API_BASE}/accounts/:accountId/vectorize/indexes/:indexName/${subpath}`, cb), ] } // Then flatten into cloudflareHandlers: export const cloudflareHandlers = [ aiRunHandler, ...vectorizeRoute('query', handleQuery), ...vectorizeRoute('insert', handleInsert), // … ]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@mocks/cloudflare.ts` around lines 295 - 613, The file duplicates near-identical v2 and legacy vectorize routes; create a small generator (e.g., vectorizeRoute) that takes the subpath ("query", "insert", "upsert", "delete_by_ids") and a core handler (e.g., handleQuery, handleInsert, handleUpsert, handleDeleteByIds) and returns two HttpHandler entries (v2 and legacy paths) that share a single callback which calls requiredHeader and extracts accountId/indexName from params; move the existing per-route logic into those core handlers and reuse utilities already in the file (parseVectorizeNdjsonVectors, ensureSeededIndex, getOrCreateIndexStore, allVectors, scoreFromSimilarity, cosineSimilarity) and then replace the duplicated http.post blocks with spread calls like ...vectorizeRoute('query', handleQuery) inside cloudflareHandlers.mocks/__tests__/cloudflare.test.ts (1)
1-13: Consider addingafterEachto reset mock server handlers and clear vector store.The test suite has no
afterEachcleanup. While the current tests use distinctaccountIdvalues to avoid collisions, this will become fragile as more tests are added. Once the store reset function is available (per the suggestion onmocks/cloudflare.ts), wiring it intoafterEachwould prevent order-dependent failures.afterEach(() => { server.resetHandlers() resetVectorizeStore() })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@mocks/__tests__/cloudflare.test.ts` around lines 1 - 13, Add an afterEach cleanup that calls server.resetHandlers() and the vector store reset function to avoid test-order dependencies: in the test file where setupServer(...cloudflareHandlers) creates server and beforeAll/afterAll are defined, add an afterEach block that invokes server.resetHandlers() and resetVectorizeStore() (or the actual reset function exported from the cloudflare mocks) so mocks and in-memory vector state are cleared between tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@mocks/cloudflare.ts`:
- Around line 442-466: The insert handler in the mock (the HTTP POST for
`${CLOUDFLARE_API_BASE}/.../insert`) currently overwrites existing vectors via
ns.set(v.id, v) giving upsert semantics; update the handler in
mocks/cloudflare.ts (the insert route that uses parseVectorizeNdjsonVectors and
getOrCreateIndexStore) to detect duplicates before writing—either reject the
request or skip duplicates per real Cloudflare insert behavior (e.g., check
ns.has(v.id) and respond with an appropriate error or count skipped items), or
at minimum add a clear comment next to the insert handler documenting that this
mock intentionally uses upsert semantics and does not enforce duplicate-id
rejection.
---
Nitpick comments:
In `@mocks/__tests__/cloudflare.test.ts`:
- Around line 1-13: Add an afterEach cleanup that calls server.resetHandlers()
and the vector store reset function to avoid test-order dependencies: in the
test file where setupServer(...cloudflareHandlers) creates server and
beforeAll/afterAll are defined, add an afterEach block that invokes
server.resetHandlers() and resetVectorizeStore() (or the actual reset function
exported from the cloudflare mocks) so mocks and in-memory vector state are
cleared between tests.
In `@mocks/cloudflare.ts`:
- Around line 30-33: The module-level Map vectorizeIndexes currently accumulates
state across tests; add and export a resetVectorizeStore() function that clears
its contents to allow test teardown to reset state. Implement
resetVectorizeStore to call vectorizeIndexes.clear() (and also clear any nested
VectorizeIndexStore Maps if you prefer deep-clearing) and export it from the
module so tests can call it in afterEach to prevent cross-test leakage.
- Around line 295-613: The file duplicates near-identical v2 and legacy
vectorize routes; create a small generator (e.g., vectorizeRoute) that takes the
subpath ("query", "insert", "upsert", "delete_by_ids") and a core handler (e.g.,
handleQuery, handleInsert, handleUpsert, handleDeleteByIds) and returns two
HttpHandler entries (v2 and legacy paths) that share a single callback which
calls requiredHeader and extracts accountId/indexName from params; move the
existing per-route logic into those core handlers and reuse utilities already in
the file (parseVectorizeNdjsonVectors, ensureSeededIndex, getOrCreateIndexStore,
allVectors, scoreFromSimilarity, cosineSimilarity) and then replace the
duplicated http.post blocks with spread calls like ...vectorizeRoute('query',
handleQuery) inside cloudflareHandlers.
This comment has been minimized.
This comment has been minimized.
Co-authored-by: Kent C. Dodds <me+github@kentcdodds.com>
Co-authored-by: Kent C. Dodds <me+github@kentcdodds.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
mocks/cloudflare.ts (2)
340-382:handleVectorizeInsertandhandleVectorizeUpsertare copy-pastes of each other.The only difference between the two is the
operationstring in thejsonOkresponse. Any future bug fix or behavioural change has to be applied twice.♻️ Proposed refactor
-const handleVectorizeInsert = async ({ request, params }) => { - requiredHeader(request.headers, 'authorization') - const accountId = String(params.accountId) - const indexName = String(params.indexName) - const parsed = await parseVectorizeNdjsonVectors(request) - if (!parsed.ok) return jsonError(400, parsed.error, 10005) - - const store = getOrCreateIndexStore(accountId, indexName) - let updated = 0 - for (const v of parsed.vectors) { - let ns = store.get(v.namespace) - if (!ns) { - ns = new Map() - store.set(v.namespace, ns) - } - ns.set(v.id, v) - updated++ - } - - return jsonOk({ operation: 'insert', updated }) -} - -const handleVectorizeUpsert = async ({ request, params }) => { - requiredHeader(request.headers, 'authorization') - const accountId = String(params.accountId) - const indexName = String(params.indexName) - const parsed = await parseVectorizeNdjsonVectors(request) - if (!parsed.ok) return jsonError(400, parsed.error, 10005) - - const store = getOrCreateIndexStore(accountId, indexName) - let updated = 0 - for (const v of parsed.vectors) { - let ns = store.get(v.namespace) - if (!ns) { - ns = new Map() - store.set(v.namespace, ns) - } - ns.set(v.id, v) - updated++ - } - - return jsonOk({ operation: 'upsert', updated }) -} +const makeVectorizeWriteHandler = + (operation: 'insert' | 'upsert'): HttpResponseResolver => + async ({ request, params }) => { + requiredHeader(request.headers, 'authorization') + const accountId = String(params.accountId) + const indexName = String(params.indexName) + const parsed = await parseVectorizeNdjsonVectors(request) + if (!parsed.ok) return jsonError(400, parsed.error, 10005) + + const store = getOrCreateIndexStore(accountId, indexName) + let updated = 0 + for (const v of parsed.vectors) { + let ns = store.get(v.namespace) + if (!ns) { + ns = new Map() + store.set(v.namespace, ns) + } + ns.set(v.id, v) + updated++ + } + + return jsonOk({ operation, updated }) + } + +const handleVectorizeInsert = makeVectorizeWriteHandler('insert') +const handleVectorizeUpsert = makeVectorizeWriteHandler('upsert')🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@mocks/cloudflare.ts` around lines 340 - 382, handleVectorizeInsert and handleVectorizeUpsert duplicate the same logic; extract the shared parsing/store/upsert loop into a single helper (e.g., processVectorizeRequest) that accepts (request, params) and an operation string or returns the updated count, then have handleVectorizeInsert call the helper and return jsonOk({ operation: 'insert', updated }) and handleVectorizeUpsert do the same with 'upsert'; reference the existing functions handleVectorizeInsert, handleVectorizeUpsert and the helpers parseVectorizeNdjsonVectors/getOrCreateIndexStore/jsonOk when moving the shared code.
210-215: Unquoted multipart boundary may include trailing whitespace.The unquoted branch
([^;]+)stops at;but not at whitespace, so a content-type likemultipart/form-data; boundary=someboundary(trailing space before end-of-string) setsboundaryto"someboundary ". The subsequent--${boundary}split then fails silently, returning a misleading'Missing "vectors" multipart field.'error.🛡️ Proposed fix
- const boundary = boundaryMatch?.[1] ?? boundaryMatch?.[2] ?? null + const boundary = (boundaryMatch?.[1] ?? boundaryMatch?.[2] ?? '').trim() || null🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@mocks/cloudflare.ts` around lines 210 - 215, The regex that extracts the multipart boundary (used in the boundaryMatch/boundary code that parses contentType) allows an unquoted boundary to include trailing whitespace, causing later `--${boundary}` splits to fail; update the extraction so the boundary value is normalized by trimming surrounding whitespace and quotes (or adjust the regex to exclude trailing whitespace/semicolons) and then use the trimmed value for subsequent splitting and checks like the "vectors" field validation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@mocks/cloudflare.ts`:
- Line 33: The module-level Map vectorizeIndexes is persisting state between
tests; add a reset helper to clear it and export that helper (or the Map) so
test suites can call it in beforeEach/afterEach. Specifically, add and export a
function like resetVectorizeIndexes() that calls vectorizeIndexes.clear(), then
update tests to call this helper before/after each test so ensureSeededIndex()
will behave deterministically per-test.
---
Duplicate comments:
In `@mocks/cloudflare.ts`:
- Around line 349-356: The mock insert loop unconditionally overwrites existing
vectors (ns.set(v.id, v)), so change the logic in the handler that processes
parsed.vectors to check for existing IDs (use store.get and ns.has(v.id)) and
reject duplicates instead of upserting; e.g., when a duplicate is found in the
parsed.vectors loop, return or throw an error (or increment a duplicate counter
and respond with an error result) so the mock mimics insert semantics rather
than upsert for the insert handler that uses parsed.vectors, store, and ns.set.
---
Nitpick comments:
In `@mocks/cloudflare.ts`:
- Around line 340-382: handleVectorizeInsert and handleVectorizeUpsert duplicate
the same logic; extract the shared parsing/store/upsert loop into a single
helper (e.g., processVectorizeRequest) that accepts (request, params) and an
operation string or returns the updated count, then have handleVectorizeInsert
call the helper and return jsonOk({ operation: 'insert', updated }) and
handleVectorizeUpsert do the same with 'upsert'; reference the existing
functions handleVectorizeInsert, handleVectorizeUpsert and the helpers
parseVectorizeNdjsonVectors/getOrCreateIndexStore/jsonOk when moving the shared
code.
- Around line 210-215: The regex that extracts the multipart boundary (used in
the boundaryMatch/boundary code that parses contentType) allows an unquoted
boundary to include trailing whitespace, causing later `--${boundary}` splits to
fail; update the extraction so the boundary value is normalized by trimming
surrounding whitespace and quotes (or adjust the regex to exclude trailing
whitespace/semicolons) and then use the trimmed value for subsequent splitting
and checks like the "vectors" field validation.
Co-authored-by: Kent C. Dodds <me+github@kentcdodds.com>
Co-authored-by: Kent C. Dodds <me+github@kentcdodds.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
mocks/cloudflare.ts (1)
503-507: Unquoted boundary regex may capture trailing whitespace.The unquoted capture group
[^;]+in the boundary regex retains any trailing whitespace before a semicolon or end-of-string. If theContent-Typeheader value has trailing spaces after the boundary token (e.g.,multipart/form-data; boundary=abc), the captured string includes those spaces, and--${boundary}won't match the actual CRLF-delimited delimiter in the body.♻️ Proposed defensive fix
-const boundary = boundaryMatch?.[1] ?? boundaryMatch?.[2] ?? null +const boundary = (boundaryMatch?.[1] ?? boundaryMatch?.[2] ?? '').trim() || null🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@mocks/cloudflare.ts` around lines 503 - 507, The boundary regex can capture trailing whitespace into boundaryMatch/[2], causing mismatched delimiters; update the parsing in mocks/cloudflare.ts so the captured boundary is trimmed or the regex excludes whitespace (e.g., use a token like [^;\s]+) before using `boundary` in delimiter checks — specifically adjust the `/\bboundary=(?:"([^"]+)"|([^;]+))/i.exec(contentType)` logic and ensure the `boundary` variable (derived from `boundaryMatch?.[1] ?? boundaryMatch?.[2]`) is normalized (trimmed/validated) so `--${boundary}` matches actual multipart delimiters.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@mocks/__tests__/cloudflare.test.ts`:
- Around line 57-91: The test "Vectorize query uses match-sorter when embedding
text is known" is non-hermetic because it relies on getSearchCorpus() containing
a page titled "About KCD MCP"; make the test self-contained by explicitly
seeding the search corpus with that document before calling the query: create
and insert a document object with metadata.title = "About KCD MCP", metadata.url
containing "about-mcp", and the expected text into whatever mock data source or
index the handler uses (mirroring the approach in test 5), so matchSorter will
find it deterministically; update the test (the test block using fetch to the
vectorize/v2/indexes/semantic-index/query endpoint) to add the seeding step and
ensure getStaticDocs()/getSearchCorpus() are not relied upon for this assertion.
In `@mocks/cloudflare.ts`:
- Around line 455-458: The reset helper that clears vectorizeIndexes must also
clear the seededIndexKeys set to avoid the early-return in ensureSeededIndex
(function ensureSeededIndex) skipping reseeding; update the same reset/clear
utility to call seededIndexKeys.clear() (in addition to
vectorizeIndexes.clear()) so previously-seen index keys will be reseeded after a
reset and subsequent queries return correct results.
---
Duplicate comments:
In `@mocks/__tests__/cloudflare.test.ts`:
- Around line 7-13: Add an afterEach teardown that resets the module-level
singletons to avoid cross-test pollution: clear or reassign vectorizeIndexes and
seededIndexKeys (the module-level collections in mocks/cloudflare.ts) after each
test so tests don't share accumulated state; keep the existing
beforeAll/afterAll server lifecycle (server.listen/server.close) but add an
afterEach that empties vectorizeIndexes and seededIndexKeys (or reinitializes
them) to ensure isolation between tests.
In `@mocks/cloudflare.ts`:
- Around line 687-707: handleVectorizeInsert currently silently overwrites
existing vectors via ns.set(v.id, v); change the logic to prevent silent upserts
by checking for existing IDs before setting: use ns.has(v.id) to detect
duplicates (from getOrCreateIndexStore and parsed.vectors returned by
parseVectorizeNdjsonVectors) and either skip the write and count/skipping via a
separate counter or return a 409/appropriate error for duplicates; only call
ns.set(v.id, v) when the ID does not already exist and adjust the returned
payload (operation/updated/inserted/duplicates) accordingly.
- Around line 52-53: The Map and Set (vectorizeIndexes and seededIndexKeys)
persist across test runs causing cross-test pollution; add a small reset
function (e.g., resetMockStores or clearVectorizeMocks) that calls
vectorizeIndexes.clear() and seededIndexKeys.clear(), export it from the mock
module, and ensure tests invoke it in their beforeEach/afterEach (or call it
from your test harness) so the stores are reinitialized between tests.
---
Nitpick comments:
In `@mocks/cloudflare.ts`:
- Around line 503-507: The boundary regex can capture trailing whitespace into
boundaryMatch/[2], causing mismatched delimiters; update the parsing in
mocks/cloudflare.ts so the captured boundary is trimmed or the regex excludes
whitespace (e.g., use a token like [^;\s]+) before using `boundary` in delimiter
checks — specifically adjust the
`/\bboundary=(?:"([^"]+)"|([^;]+))/i.exec(contentType)` logic and ensure the
`boundary` variable (derived from `boundaryMatch?.[1] ?? boundaryMatch?.[2]`) is
normalized (trimmed/validated) so `--${boundary}` matches actual multipart
delimiters.
| test('Vectorize query uses match-sorter when embedding text is known', async () => { | ||
| const embedRes = await fetch( | ||
| 'https://api.cloudflare.com/client/v4/accounts/acc123/ai/run/@cf/google/embeddinggemma-300m', | ||
| { | ||
| method: 'POST', | ||
| headers: { | ||
| Authorization: 'Bearer test-token', | ||
| 'Content-Type': 'application/json', | ||
| }, | ||
| body: JSON.stringify({ text: ['About KCD MCP'] }), | ||
| }, | ||
| ) | ||
| expect(embedRes.ok).toBe(true) | ||
| const embedJson = (await embedRes.json()) as any | ||
| const vector = embedJson.result.data[0] | ||
| expect(Array.isArray(vector)).toBe(true) | ||
|
|
||
| const queryRes = await fetch( | ||
| 'https://api.cloudflare.com/client/v4/accounts/acc123/vectorize/v2/indexes/semantic-index/query', | ||
| { | ||
| method: 'POST', | ||
| headers: { | ||
| Authorization: 'Bearer test-token', | ||
| 'Content-Type': 'application/json', | ||
| }, | ||
| body: JSON.stringify({ vector, topK: 5, returnMetadata: 'all' }), | ||
| }, | ||
| ) | ||
| expect(queryRes.ok).toBe(true) | ||
| const queryJson = (await queryRes.json()) as any | ||
| expect(queryJson.success).toBe(true) | ||
| expect(queryJson.result.matches.length).toBeGreaterThan(0) | ||
| expect(queryJson.result.matches[0].metadata.title).toBe('About KCD MCP') | ||
| expect(String(queryJson.result.matches[0].metadata.url)).toContain('about-mcp') | ||
| }) |
There was a problem hiding this comment.
Test 3 is non-hermetic — depends on a filesystem content file for "About KCD MCP".
The assertion expect(queryJson.result.matches[0].metadata.title).toBe('About KCD MCP') at line 89 passes only if getSearchCorpus() returns a document with that exact title. The static seed list (getStaticDocs) and faker-generated podcast docs contain no such entry; the match must come from a file under content/pages/ with that exact frontmatter title.
If content/pages/about-mcp.mdx (or equivalent) is missing, renamed, or if the test runs in an environment where the content directory is unavailable, matchSorter returns no match, the handler falls back to cosine-similarity scoring, and the assertion fails with no obvious link to the missing file.
Consider making this test self-contained by seeding the target document explicitly before querying, similar to the approach in test 5:
🛡️ Suggested hermetic approach
test('Vectorize query uses match-sorter when embedding text is known', async () => {
+ // Seed the expected doc so the test is independent of the filesystem.
+ const seedBoundary = '----vitest-seed-boundary'
+ const seedNdjson = `${JSON.stringify({
+ id: '/about-mcp',
+ values: [0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
+ metadata: { type: 'page', title: 'About KCD MCP', url: '/about-mcp', snippet: 'About KCD MCP' },
+ })}\n`
+ const seedMultipart = [
+ `--${seedBoundary}\r\n`,
+ 'Content-Disposition: form-data; name="vectors"; filename="vectors.ndjson"\r\n',
+ 'Content-Type: application/x-ndjson\r\n',
+ '\r\n',
+ seedNdjson,
+ `\r\n--${seedBoundary}--\r\n`,
+ ].join('')
+ await fetch(
+ 'https://api.cloudflare.com/client/v4/accounts/acc123/vectorize/v2/indexes/semantic-index/upsert',
+ {
+ method: 'POST',
+ headers: { Authorization: 'Bearer test-token', 'Content-Type': `multipart/form-data; boundary=${seedBoundary}` },
+ body: seedMultipart,
+ },
+ )
+
const embedRes = await fetch( ...📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| test('Vectorize query uses match-sorter when embedding text is known', async () => { | |
| const embedRes = await fetch( | |
| 'https://api.cloudflare.com/client/v4/accounts/acc123/ai/run/@cf/google/embeddinggemma-300m', | |
| { | |
| method: 'POST', | |
| headers: { | |
| Authorization: 'Bearer test-token', | |
| 'Content-Type': 'application/json', | |
| }, | |
| body: JSON.stringify({ text: ['About KCD MCP'] }), | |
| }, | |
| ) | |
| expect(embedRes.ok).toBe(true) | |
| const embedJson = (await embedRes.json()) as any | |
| const vector = embedJson.result.data[0] | |
| expect(Array.isArray(vector)).toBe(true) | |
| const queryRes = await fetch( | |
| 'https://api.cloudflare.com/client/v4/accounts/acc123/vectorize/v2/indexes/semantic-index/query', | |
| { | |
| method: 'POST', | |
| headers: { | |
| Authorization: 'Bearer test-token', | |
| 'Content-Type': 'application/json', | |
| }, | |
| body: JSON.stringify({ vector, topK: 5, returnMetadata: 'all' }), | |
| }, | |
| ) | |
| expect(queryRes.ok).toBe(true) | |
| const queryJson = (await queryRes.json()) as any | |
| expect(queryJson.success).toBe(true) | |
| expect(queryJson.result.matches.length).toBeGreaterThan(0) | |
| expect(queryJson.result.matches[0].metadata.title).toBe('About KCD MCP') | |
| expect(String(queryJson.result.matches[0].metadata.url)).toContain('about-mcp') | |
| }) | |
| test('Vectorize query uses match-sorter when embedding text is known', async () => { | |
| // Seed the expected doc so the test is independent of the filesystem. | |
| const seedBoundary = '----vitest-seed-boundary' | |
| const seedNdjson = `${JSON.stringify({ | |
| id: '/about-mcp', | |
| values: [0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], | |
| metadata: { type: 'page', title: 'About KCD MCP', url: '/about-mcp', snippet: 'About KCD MCP' }, | |
| })}\n` | |
| const seedMultipart = [ | |
| `--${seedBoundary}\r\n`, | |
| 'Content-Disposition: form-data; name="vectors"; filename="vectors.ndjson"\r\n', | |
| 'Content-Type: application/x-ndjson\r\n', | |
| '\r\n', | |
| seedNdjson, | |
| `\r\n--${seedBoundary}--\r\n`, | |
| ].join('') | |
| await fetch( | |
| 'https://api.cloudflare.com/client/v4/accounts/acc123/vectorize/v2/indexes/semantic-index/upsert', | |
| { | |
| method: 'POST', | |
| headers: { Authorization: 'Bearer test-token', 'Content-Type': `multipart/form-data; boundary=${seedBoundary}` }, | |
| body: seedMultipart, | |
| }, | |
| ) | |
| const embedRes = await fetch( | |
| 'https://api.cloudflare.com/client/v4/accounts/acc123/ai/run/@cf/google/embeddinggemma-300m', | |
| { | |
| method: 'POST', | |
| headers: { | |
| Authorization: 'Bearer test-token', | |
| 'Content-Type': 'application/json', | |
| }, | |
| body: JSON.stringify({ text: ['About KCD MCP'] }), | |
| }, | |
| ) | |
| expect(embedRes.ok).toBe(true) | |
| const embedJson = (await embedRes.json()) as any | |
| const vector = embedJson.result.data[0] | |
| expect(Array.isArray(vector)).toBe(true) | |
| const queryRes = await fetch( | |
| 'https://api.cloudflare.com/client/v4/accounts/acc123/vectorize/v2/indexes/semantic-index/query', | |
| { | |
| method: 'POST', | |
| headers: { | |
| Authorization: 'Bearer test-token', | |
| 'Content-Type': 'application/json', | |
| }, | |
| body: JSON.stringify({ vector, topK: 5, returnMetadata: 'all' }), | |
| }, | |
| ) | |
| expect(queryRes.ok).toBe(true) | |
| const queryJson = (await queryRes.json()) as any | |
| expect(queryJson.success).toBe(true) | |
| expect(queryJson.result.matches.length).toBeGreaterThan(0) | |
| expect(queryJson.result.matches[0].metadata.title).toBe('About KCD MCP') | |
| expect(String(queryJson.result.matches[0].metadata.url)).toContain('about-mcp') | |
| }) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@mocks/__tests__/cloudflare.test.ts` around lines 57 - 91, The test "Vectorize
query uses match-sorter when embedding text is known" is non-hermetic because it
relies on getSearchCorpus() containing a page titled "About KCD MCP"; make the
test self-contained by explicitly seeding the search corpus with that document
before calling the query: create and insert a document object with
metadata.title = "About KCD MCP", metadata.url containing "about-mcp", and the
expected text into whatever mock data source or index the handler uses
(mirroring the approach in test 5), so matchSorter will find it
deterministically; update the test (the test block using fetch to the
vectorize/v2/indexes/semantic-index/query endpoint) to add the seeding step and
ensure getStaticDocs()/getSearchCorpus() are not relied upon for this assertion.
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
mocks/__tests__/cloudflare.test.ts (1)
7-17: Consider addingserver.resetHandlers()inafterEach.Standard MSW setup includes
afterEach(() => server.resetHandlers())to remove any runtime handlers added viaserver.use(). No current test usesserver.use(), but adding this guard is cheap insurance against future test additions leaking handler overrides.♻️ Suggested addition
+import { afterAll, afterEach, beforeAll, beforeEach, describe, expect, test } from 'vitest' afterAll(() => { server.close() }) +afterEach(() => { + server.resetHandlers() +})🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@mocks/__tests__/cloudflare.test.ts` around lines 7 - 17, Add a cleanup step to reset any runtime MSW handlers after each test by calling server.resetHandlers() in an afterEach; for example add afterEach(() => server.resetHandlers()) (or combine with your existing resetCloudflareMockState() into afterEach(() => { server.resetHandlers(); resetCloudflareMockState(); })) so any server.use() overrides won't leak between tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@mocks/cloudflare.ts`:
- Around line 434-445: buildSearchCorpus currently calls getPodcastDocs()
outside its defensive try/catch which can let any runtime error reject the
module-level searchCorpusPromise permanently; wrap the getPodcastDocs()
invocation in the same try/catch (or add a dedicated try/catch around
docs.push(...getPodcastDocs())) so any errors are caught/handled (e.g., ignore
or log) and do not cause buildSearchCorpus to reject, ensuring
searchCorpusPromise isn't poisoned and resetCloudflareMockState() isn't relied
on to clear it.
---
Nitpick comments:
In `@mocks/__tests__/cloudflare.test.ts`:
- Around line 7-17: Add a cleanup step to reset any runtime MSW handlers after
each test by calling server.resetHandlers() in an afterEach; for example add
afterEach(() => server.resetHandlers()) (or combine with your existing
resetCloudflareMockState() into afterEach(() => { server.resetHandlers();
resetCloudflareMockState(); })) so any server.use() overrides won't leak between
tests.
There was a problem hiding this comment.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@mocks/cloudflare.ts`:
- Around line 57-65: resetCloudflareMockState currently doesn't clear the cached
searchCorpusPromise and buildSearchCorpus calls getPodcastDocs() outside a
try/catch, which can cache a rejected promise; update resetCloudflareMockState
to set searchCorpusPromise = null, and modify buildSearchCorpus/getSearchCorpus
so the call to getPodcastDocs() is wrapped in try/catch (or ensure any rejection
is caught), and on error clear searchCorpusPromise (set it to null) before
rethrowing the error so a poisoned promise isn't retained and retries via
ensureSeededIndex / seededIndexPromises can succeed.
- Around line 724-738: The current loop over parsed.vectors writes items to
store (ns.set) before detecting an insert conflict, allowing partial commits;
fix by pre-validating the entire batch before any mutation: in the handler that
uses parsed.vectors, when operation === 'insert' first iterate parsed.vectors to
check for duplicate IDs within the batch and for any existing IDs in
store.get(v.namespace) (use ns.has(v.id)), and if any conflict call
jsonError(409, ...) and return without modifying store or touching updated; only
after validation do the actual ns.set(v.id, v) and increment updated. Ensure
updated is only updated during the second (commit) loop.
Co-authored-by: Kent C. Dodds <me+github@kentcdodds.com>
Co-authored-by: Kent C. Dodds <me+github@kentcdodds.com>
| vectorizeIndexes.clear() | ||
| seededIndexPromises.clear() | ||
| embeddingVectorToText.clear() | ||
| } |
There was a problem hiding this comment.
Reset function doesn't clear all module-level mutable state
Low Severity
resetCloudflareMockState() clears vectorizeIndexes, seededIndexPromises, and embeddingVectorToText, but does not clear searchCorpusPromise or docEmbeddingCache. The searchCorpusPromise depends on the mutable mockTransistorEpisodes array (which the transistor mock's POST handler mutates via episodes.push). Once the corpus is built and cached, subsequent resets won't reflect mutations to that shared array, creating a stale-data cross-test pollution vector that contradicts the function's documented purpose.
Additional Locations (1)
|
Bugbot Autofix prepared fixes for 1 of the 1 bugs found in the latest run.
|


Add MSW mocks for Cloudflare Workers AI and Vectorize APIs to enable local development and testing.
Note
Medium Risk
Introduces a sizeable new mock implementation and hooks it into the global MSW server, which could change test/local-dev network behavior if the handlers match unexpected requests. Production logic change is minimal (a type-cast on the transcription request body).
Overview
Adds new MSW mocks for Cloudflare’s
Workers AIandVectorizeAPIs (mocks/cloudflare.ts) to support local development/testing, including embedding generation, audio transcription responses, and Vectorizequery/insert/upsert/delete_by_idsflows with seeded/searchable mock data.Wires these handlers into the existing mock server (
mocks/index.ts), adds a focused Vitest suite covering embeddings/transcription and Vectorize write/query/delete behaviors, and updatestranscribeMp3WithWorkersAito accept broaderUint8Arrayinputs by casting the request body to satisfy stricter fetch typings.Exports
episodesasmockTransistorEpisodesfrommocks/transistor.tsto seed the mock search corpus used by the Vectorize mock.Written by Cursor Bugbot for commit ffae634. This will update automatically on new commits. Configure here.
Summary by CodeRabbit
Tests
Chores