Skip to content

Cloudflare API mocks#603

Merged
kentcdodds merged 15 commits intomainfrom
cursor/cloudflare-api-mocks-10d3
Feb 19, 2026
Merged

Cloudflare API mocks#603
kentcdodds merged 15 commits intomainfrom
cursor/cloudflare-api-mocks-10d3

Conversation

@kentcdodds
Copy link
Owner

@kentcdodds kentcdodds commented Feb 19, 2026

Add MSW mocks for Cloudflare Workers AI and Vectorize APIs to enable local development and testing.


Open in Cursor Open in Web


Note

Medium Risk
Introduces a sizeable new mock implementation and hooks it into the global MSW server, which could change test/local-dev network behavior if the handlers match unexpected requests. Production logic change is minimal (a type-cast on the transcription request body).

Overview
Adds new MSW mocks for Cloudflare’s Workers AI and Vectorize APIs (mocks/cloudflare.ts) to support local development/testing, including embedding generation, audio transcription responses, and Vectorize query/insert/upsert/delete_by_ids flows with seeded/searchable mock data.

Wires these handlers into the existing mock server (mocks/index.ts), adds a focused Vitest suite covering embeddings/transcription and Vectorize write/query/delete behaviors, and updates transcribeMp3WithWorkersAi to accept broader Uint8Array inputs by casting the request body to satisfy stricter fetch typings.

Exports episodes as mockTransistorEpisodes from mocks/transistor.ts to seed the mock search corpus used by the Vectorize mock.

Written by Cursor Bugbot for commit ffae634. This will update automatically on new commits. Configure here.

Summary by CodeRabbit

  • Tests

    • Added comprehensive tests for Cloudflare-style AI endpoints: embeddings, transcription, vectorize queries, upsert/query/delete flows.
  • Chores

    • Added a local Cloudflare API mock with seeded sample data to support AI run, embeddings, transcription, and vector insert/query/delete scenarios.
    • Improved request handling compatibility for transcription payloads.

cursoragent and others added 7 commits February 19, 2026 00:47
Co-authored-by: Kent C. Dodds <me+github@kentcdodds.com>
Co-authored-by: Kent C. Dodds <me+github@kentcdodds.com>
Co-authored-by: Kent C. Dodds <me+github@kentcdodds.com>
Co-authored-by: Kent C. Dodds <me+github@kentcdodds.com>
Co-authored-by: Kent C. Dodds <me+github@kentcdodds.com>
Co-authored-by: Kent C. Dodds <me+github@kentcdodds.com>
Co-authored-by: Kent C. Dodds <me+github@kentcdodds.com>
@cursor
Copy link

cursor bot commented Feb 19, 2026

Cursor Agent can help with this pull request. Just @cursor in comments and I'll start working on changes in this branch.
Learn more about Cursor Agents

@kentcdodds kentcdodds marked this pull request as ready for review February 19, 2026 01:00
@coderabbitai
Copy link

coderabbitai bot commented Feb 19, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a comprehensive MSW Cloudflare API mock (AI Run and Vectorize) with an in-memory vector store and deterministic embeddings, new tests exercising those mocks, a tiny TypeScript cast in a transcription utility, and a small export addition for transistor mocks. (50 words)

Changes

Cohort / File(s) Summary
TypeScript type adjustment
app/utils/cloudflare-ai-transcription.server.ts
Request body cast to Uint8Array<ArrayBuffer> to satisfy stricter TypeScript typings; comment added about Buffer/Uint8Array views. No runtime behavior change.
Cloudflare MSW mock implementation
mocks/cloudflare.ts
Adds cloudflareHandlers and resetCloudflareMockState() implementing MSW handlers for AI Run (transcription/embeddings) and Vectorize v2/legacy (query, insert/upsert, delete_by_ids). Implements in-memory index store, deterministic SHA‑256 embeddings, cosine-similarity scoring, NDJSON & multipart parsing, metadata/frontmatter extraction, seeding, and Cloudflare-style envelope responses.
Mock integration
mocks/index.ts
Imports and spreads cloudflareHandlers into the MSW handler list.
Tests for Cloudflare mocks
mocks/__tests__/cloudflare.test.ts
New test suite using msw/node validating embeddings, transcription, vector queries (seeded and direct), upsert→query→delete_by_ids flows, and response shapes.
Transistor mock export change
mocks/transistor.ts
Re-exports internal episodes as mockTransistorEpisodes alongside transistorHandlers (public export surface expanded).

Sequence Diagram(s)

sequenceDiagram
    participant Client as Test Client
    participant MSW as MSW Handler
    participant Store as In-Memory Vector Store

    Client->>MSW: POST /accounts/:acct/ai/run (audio or text)
    MSW->>MSW: parse path → model, check Content-Type
    alt audio/*
        MSW-->>Client: transcription envelope (result.text)
    else text/embed
        MSW->>MSW: deterministic embedding (SHA‑256)
        MSW-->>Client: embeddings envelope (vectors)
    end

    Client->>MSW: POST /accounts/:acct/vectorize/.../upsert (NDJSON or multipart)
    MSW->>MSW: parse NDJSON/multipart, validate
    MSW->>Store: insert/upsert vectors with metadata
    Store-->>MSW: ack
    MSW-->>Client: upsert envelope (success)

    Client->>MSW: POST /accounts/:acct/vectorize/.../query ({ query or vector })
    MSW->>MSW: embed query if text, fetch vectors
    MSW->>Store: retrieve candidates
    MSW->>MSW: compute cosine similarity, normalize, sort
    MSW-->>Client: query envelope (matches)

    Client->>MSW: POST /accounts/:acct/vectorize/.../delete_by_ids ({ ids })
    MSW->>Store: delete requested ids
    Store-->>MSW: deleted count
    MSW-->>Client: delete envelope (deleted count)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~70 minutes

Poem

🐰 I hopped through handlers, nibbling on code,
Mocked vectors and whispers in an in-memory node,
Tests boing and assert—upsert, query, delete,
Deterministic hashes make matches so neat,
A tiny rabbit cheers: the mocks are complete! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 3.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Cloudflare API mocks' accurately summarizes the main change: adding MSW mocks for Cloudflare APIs. It is concise, clear, and directly reflects the primary contribution of the changeset.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch cursor/cloudflare-api-mocks-10d3

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
mocks/cloudflare.ts (2)

30-33: Module-level mutable state with no reset mechanism.

vectorizeIndexes is a module-level singleton that accumulates state across test runs. There's no exported function to clear the store between tests (e.g., in afterEach). This will cause state leakage across tests, leading to flaky or order-dependent failures as the test suite grows.

Consider exporting a resetVectorizeStore() function:

Proposed addition
 const vectorizeIndexes = new Map<string, VectorizeIndexStore>()
+
+/** Clear all in-memory vector data. Call in `afterEach` / `beforeEach`. */
+export function resetVectorizeStore() {
+	vectorizeIndexes.clear()
+}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mocks/cloudflare.ts` around lines 30 - 33, The module-level Map
vectorizeIndexes currently accumulates state across tests; add and export a
resetVectorizeStore() function that clears its contents to allow test teardown
to reset state. Implement resetVectorizeStore to call vectorizeIndexes.clear()
(and also clear any nested VectorizeIndexStore Maps if you prefer deep-clearing)
and export it from the module so tests can call it in afterEach to prevent
cross-test leakage.

295-613: ~250 lines of near-identical v2/legacy handler duplication.

Each operation (query, insert, upsert, delete_by_ids) is copy-pasted for v2 and legacy paths, differing only in the URL. This could be reduced to ~half the code with a small helper that generates handlers for both path variants:

Sketch
function vectorizeRoute(
  subpath: string, // e.g. "query", "insert", …
  handler: (args: { request: Request; accountId: string; indexName: string }) => Promise<Response>,
): HttpHandler[] {
  const cb = async ({ request, params }: any) => {
    requiredHeader(request.headers, 'authorization')
    return handler({
      request,
      accountId: String(params.accountId),
      indexName: String(params.indexName),
    })
  }
  return [
    http.post(`${CLOUDFLARE_API_BASE}/accounts/:accountId/vectorize/v2/indexes/:indexName/${subpath}`, cb),
    http.post(`${CLOUDFLARE_API_BASE}/accounts/:accountId/vectorize/indexes/:indexName/${subpath}`, cb),
  ]
}

// Then flatten into cloudflareHandlers:
export const cloudflareHandlers = [
  aiRunHandler,
  ...vectorizeRoute('query', handleQuery),
  ...vectorizeRoute('insert', handleInsert),
  // …
]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mocks/cloudflare.ts` around lines 295 - 613, The file duplicates
near-identical v2 and legacy vectorize routes; create a small generator (e.g.,
vectorizeRoute) that takes the subpath ("query", "insert", "upsert",
"delete_by_ids") and a core handler (e.g., handleQuery, handleInsert,
handleUpsert, handleDeleteByIds) and returns two HttpHandler entries (v2 and
legacy paths) that share a single callback which calls requiredHeader and
extracts accountId/indexName from params; move the existing per-route logic into
those core handlers and reuse utilities already in the file
(parseVectorizeNdjsonVectors, ensureSeededIndex, getOrCreateIndexStore,
allVectors, scoreFromSimilarity, cosineSimilarity) and then replace the
duplicated http.post blocks with spread calls like ...vectorizeRoute('query',
handleQuery) inside cloudflareHandlers.
mocks/__tests__/cloudflare.test.ts (1)

1-13: Consider adding afterEach to reset mock server handlers and clear vector store.

The test suite has no afterEach cleanup. While the current tests use distinct accountId values to avoid collisions, this will become fragile as more tests are added. Once the store reset function is available (per the suggestion on mocks/cloudflare.ts), wiring it into afterEach would prevent order-dependent failures.

afterEach(() => {
  server.resetHandlers()
  resetVectorizeStore()
})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mocks/__tests__/cloudflare.test.ts` around lines 1 - 13, Add an afterEach
cleanup that calls server.resetHandlers() and the vector store reset function to
avoid test-order dependencies: in the test file where
setupServer(...cloudflareHandlers) creates server and beforeAll/afterAll are
defined, add an afterEach block that invokes server.resetHandlers() and
resetVectorizeStore() (or the actual reset function exported from the cloudflare
mocks) so mocks and in-memory vector state are cleared between tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@mocks/cloudflare.ts`:
- Around line 442-466: The insert handler in the mock (the HTTP POST for
`${CLOUDFLARE_API_BASE}/.../insert`) currently overwrites existing vectors via
ns.set(v.id, v) giving upsert semantics; update the handler in
mocks/cloudflare.ts (the insert route that uses parseVectorizeNdjsonVectors and
getOrCreateIndexStore) to detect duplicates before writing—either reject the
request or skip duplicates per real Cloudflare insert behavior (e.g., check
ns.has(v.id) and respond with an appropriate error or count skipped items), or
at minimum add a clear comment next to the insert handler documenting that this
mock intentionally uses upsert semantics and does not enforce duplicate-id
rejection.

---

Nitpick comments:
In `@mocks/__tests__/cloudflare.test.ts`:
- Around line 1-13: Add an afterEach cleanup that calls server.resetHandlers()
and the vector store reset function to avoid test-order dependencies: in the
test file where setupServer(...cloudflareHandlers) creates server and
beforeAll/afterAll are defined, add an afterEach block that invokes
server.resetHandlers() and resetVectorizeStore() (or the actual reset function
exported from the cloudflare mocks) so mocks and in-memory vector state are
cleared between tests.

In `@mocks/cloudflare.ts`:
- Around line 30-33: The module-level Map vectorizeIndexes currently accumulates
state across tests; add and export a resetVectorizeStore() function that clears
its contents to allow test teardown to reset state. Implement
resetVectorizeStore to call vectorizeIndexes.clear() (and also clear any nested
VectorizeIndexStore Maps if you prefer deep-clearing) and export it from the
module so tests can call it in afterEach to prevent cross-test leakage.
- Around line 295-613: The file duplicates near-identical v2 and legacy
vectorize routes; create a small generator (e.g., vectorizeRoute) that takes the
subpath ("query", "insert", "upsert", "delete_by_ids") and a core handler (e.g.,
handleQuery, handleInsert, handleUpsert, handleDeleteByIds) and returns two
HttpHandler entries (v2 and legacy paths) that share a single callback which
calls requiredHeader and extracts accountId/indexName from params; move the
existing per-route logic into those core handlers and reuse utilities already in
the file (parseVectorizeNdjsonVectors, ensureSeededIndex, getOrCreateIndexStore,
allVectors, scoreFromSimilarity, cosineSimilarity) and then replace the
duplicated http.post blocks with spread calls like ...vectorizeRoute('query',
handleQuery) inside cloudflareHandlers.

@cursor

This comment has been minimized.

cursoragent and others added 2 commits February 19, 2026 01:16
Co-authored-by: Kent C. Dodds <me+github@kentcdodds.com>
Co-authored-by: Kent C. Dodds <me+github@kentcdodds.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
mocks/cloudflare.ts (2)

340-382: handleVectorizeInsert and handleVectorizeUpsert are copy-pastes of each other.

The only difference between the two is the operation string in the jsonOk response. Any future bug fix or behavioural change has to be applied twice.

♻️ Proposed refactor
-const handleVectorizeInsert = async ({ request, params }) => {
-	requiredHeader(request.headers, 'authorization')
-	const accountId = String(params.accountId)
-	const indexName = String(params.indexName)
-	const parsed = await parseVectorizeNdjsonVectors(request)
-	if (!parsed.ok) return jsonError(400, parsed.error, 10005)
-
-	const store = getOrCreateIndexStore(accountId, indexName)
-	let updated = 0
-	for (const v of parsed.vectors) {
-		let ns = store.get(v.namespace)
-		if (!ns) {
-			ns = new Map()
-			store.set(v.namespace, ns)
-		}
-		ns.set(v.id, v)
-		updated++
-	}
-
-	return jsonOk({ operation: 'insert', updated })
-}
-
-const handleVectorizeUpsert = async ({ request, params }) => {
-	requiredHeader(request.headers, 'authorization')
-	const accountId = String(params.accountId)
-	const indexName = String(params.indexName)
-	const parsed = await parseVectorizeNdjsonVectors(request)
-	if (!parsed.ok) return jsonError(400, parsed.error, 10005)
-
-	const store = getOrCreateIndexStore(accountId, indexName)
-	let updated = 0
-	for (const v of parsed.vectors) {
-		let ns = store.get(v.namespace)
-		if (!ns) {
-			ns = new Map()
-			store.set(v.namespace, ns)
-		}
-		ns.set(v.id, v)
-		updated++
-	}
-
-	return jsonOk({ operation: 'upsert', updated })
-}
+const makeVectorizeWriteHandler =
+	(operation: 'insert' | 'upsert'): HttpResponseResolver =>
+	async ({ request, params }) => {
+		requiredHeader(request.headers, 'authorization')
+		const accountId = String(params.accountId)
+		const indexName = String(params.indexName)
+		const parsed = await parseVectorizeNdjsonVectors(request)
+		if (!parsed.ok) return jsonError(400, parsed.error, 10005)
+
+		const store = getOrCreateIndexStore(accountId, indexName)
+		let updated = 0
+		for (const v of parsed.vectors) {
+			let ns = store.get(v.namespace)
+			if (!ns) {
+				ns = new Map()
+				store.set(v.namespace, ns)
+			}
+			ns.set(v.id, v)
+			updated++
+		}
+
+		return jsonOk({ operation, updated })
+	}
+
+const handleVectorizeInsert = makeVectorizeWriteHandler('insert')
+const handleVectorizeUpsert = makeVectorizeWriteHandler('upsert')
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mocks/cloudflare.ts` around lines 340 - 382, handleVectorizeInsert and
handleVectorizeUpsert duplicate the same logic; extract the shared
parsing/store/upsert loop into a single helper (e.g., processVectorizeRequest)
that accepts (request, params) and an operation string or returns the updated
count, then have handleVectorizeInsert call the helper and return jsonOk({
operation: 'insert', updated }) and handleVectorizeUpsert do the same with
'upsert'; reference the existing functions handleVectorizeInsert,
handleVectorizeUpsert and the helpers
parseVectorizeNdjsonVectors/getOrCreateIndexStore/jsonOk when moving the shared
code.

210-215: Unquoted multipart boundary may include trailing whitespace.

The unquoted branch ([^;]+) stops at ; but not at whitespace, so a content-type like multipart/form-data; boundary=someboundary (trailing space before end-of-string) sets boundary to "someboundary ". The subsequent --${boundary} split then fails silently, returning a misleading 'Missing "vectors" multipart field.' error.

🛡️ Proposed fix
-	const boundary = boundaryMatch?.[1] ?? boundaryMatch?.[2] ?? null
+	const boundary = (boundaryMatch?.[1] ?? boundaryMatch?.[2] ?? '').trim() || null
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mocks/cloudflare.ts` around lines 210 - 215, The regex that extracts the
multipart boundary (used in the boundaryMatch/boundary code that parses
contentType) allows an unquoted boundary to include trailing whitespace, causing
later `--${boundary}` splits to fail; update the extraction so the boundary
value is normalized by trimming surrounding whitespace and quotes (or adjust the
regex to exclude trailing whitespace/semicolons) and then use the trimmed value
for subsequent splitting and checks like the "vectors" field validation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@mocks/cloudflare.ts`:
- Line 33: The module-level Map vectorizeIndexes is persisting state between
tests; add a reset helper to clear it and export that helper (or the Map) so
test suites can call it in beforeEach/afterEach. Specifically, add and export a
function like resetVectorizeIndexes() that calls vectorizeIndexes.clear(), then
update tests to call this helper before/after each test so ensureSeededIndex()
will behave deterministically per-test.

---

Duplicate comments:
In `@mocks/cloudflare.ts`:
- Around line 349-356: The mock insert loop unconditionally overwrites existing
vectors (ns.set(v.id, v)), so change the logic in the handler that processes
parsed.vectors to check for existing IDs (use store.get and ns.has(v.id)) and
reject duplicates instead of upserting; e.g., when a duplicate is found in the
parsed.vectors loop, return or throw an error (or increment a duplicate counter
and respond with an error result) so the mock mimics insert semantics rather
than upsert for the insert handler that uses parsed.vectors, store, and ns.set.

---

Nitpick comments:
In `@mocks/cloudflare.ts`:
- Around line 340-382: handleVectorizeInsert and handleVectorizeUpsert duplicate
the same logic; extract the shared parsing/store/upsert loop into a single
helper (e.g., processVectorizeRequest) that accepts (request, params) and an
operation string or returns the updated count, then have handleVectorizeInsert
call the helper and return jsonOk({ operation: 'insert', updated }) and
handleVectorizeUpsert do the same with 'upsert'; reference the existing
functions handleVectorizeInsert, handleVectorizeUpsert and the helpers
parseVectorizeNdjsonVectors/getOrCreateIndexStore/jsonOk when moving the shared
code.
- Around line 210-215: The regex that extracts the multipart boundary (used in
the boundaryMatch/boundary code that parses contentType) allows an unquoted
boundary to include trailing whitespace, causing later `--${boundary}` splits to
fail; update the extraction so the boundary value is normalized by trimming
surrounding whitespace and quotes (or adjust the regex to exclude trailing
whitespace/semicolons) and then use the trimmed value for subsequent splitting
and checks like the "vectors" field validation.

cursoragent and others added 2 commits February 19, 2026 01:18
Co-authored-by: Kent C. Dodds <me+github@kentcdodds.com>
Co-authored-by: Kent C. Dodds <me+github@kentcdodds.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
mocks/cloudflare.ts (1)

503-507: Unquoted boundary regex may capture trailing whitespace.

The unquoted capture group [^;]+ in the boundary regex retains any trailing whitespace before a semicolon or end-of-string. If the Content-Type header value has trailing spaces after the boundary token (e.g., multipart/form-data; boundary=abc ), the captured string includes those spaces, and --${boundary} won't match the actual CRLF-delimited delimiter in the body.

♻️ Proposed defensive fix
-const boundary = boundaryMatch?.[1] ?? boundaryMatch?.[2] ?? null
+const boundary = (boundaryMatch?.[1] ?? boundaryMatch?.[2] ?? '').trim() || null
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mocks/cloudflare.ts` around lines 503 - 507, The boundary regex can capture
trailing whitespace into boundaryMatch/[2], causing mismatched delimiters;
update the parsing in mocks/cloudflare.ts so the captured boundary is trimmed or
the regex excludes whitespace (e.g., use a token like [^;\s]+) before using
`boundary` in delimiter checks — specifically adjust the
`/\bboundary=(?:"([^"]+)"|([^;]+))/i.exec(contentType)` logic and ensure the
`boundary` variable (derived from `boundaryMatch?.[1] ?? boundaryMatch?.[2]`) is
normalized (trimmed/validated) so `--${boundary}` matches actual multipart
delimiters.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@mocks/__tests__/cloudflare.test.ts`:
- Around line 57-91: The test "Vectorize query uses match-sorter when embedding
text is known" is non-hermetic because it relies on getSearchCorpus() containing
a page titled "About KCD MCP"; make the test self-contained by explicitly
seeding the search corpus with that document before calling the query: create
and insert a document object with metadata.title = "About KCD MCP", metadata.url
containing "about-mcp", and the expected text into whatever mock data source or
index the handler uses (mirroring the approach in test 5), so matchSorter will
find it deterministically; update the test (the test block using fetch to the
vectorize/v2/indexes/semantic-index/query endpoint) to add the seeding step and
ensure getStaticDocs()/getSearchCorpus() are not relied upon for this assertion.

In `@mocks/cloudflare.ts`:
- Around line 455-458: The reset helper that clears vectorizeIndexes must also
clear the seededIndexKeys set to avoid the early-return in ensureSeededIndex
(function ensureSeededIndex) skipping reseeding; update the same reset/clear
utility to call seededIndexKeys.clear() (in addition to
vectorizeIndexes.clear()) so previously-seen index keys will be reseeded after a
reset and subsequent queries return correct results.

---

Duplicate comments:
In `@mocks/__tests__/cloudflare.test.ts`:
- Around line 7-13: Add an afterEach teardown that resets the module-level
singletons to avoid cross-test pollution: clear or reassign vectorizeIndexes and
seededIndexKeys (the module-level collections in mocks/cloudflare.ts) after each
test so tests don't share accumulated state; keep the existing
beforeAll/afterAll server lifecycle (server.listen/server.close) but add an
afterEach that empties vectorizeIndexes and seededIndexKeys (or reinitializes
them) to ensure isolation between tests.

In `@mocks/cloudflare.ts`:
- Around line 687-707: handleVectorizeInsert currently silently overwrites
existing vectors via ns.set(v.id, v); change the logic to prevent silent upserts
by checking for existing IDs before setting: use ns.has(v.id) to detect
duplicates (from getOrCreateIndexStore and parsed.vectors returned by
parseVectorizeNdjsonVectors) and either skip the write and count/skipping via a
separate counter or return a 409/appropriate error for duplicates; only call
ns.set(v.id, v) when the ID does not already exist and adjust the returned
payload (operation/updated/inserted/duplicates) accordingly.
- Around line 52-53: The Map and Set (vectorizeIndexes and seededIndexKeys)
persist across test runs causing cross-test pollution; add a small reset
function (e.g., resetMockStores or clearVectorizeMocks) that calls
vectorizeIndexes.clear() and seededIndexKeys.clear(), export it from the mock
module, and ensure tests invoke it in their beforeEach/afterEach (or call it
from your test harness) so the stores are reinitialized between tests.

---

Nitpick comments:
In `@mocks/cloudflare.ts`:
- Around line 503-507: The boundary regex can capture trailing whitespace into
boundaryMatch/[2], causing mismatched delimiters; update the parsing in
mocks/cloudflare.ts so the captured boundary is trimmed or the regex excludes
whitespace (e.g., use a token like [^;\s]+) before using `boundary` in delimiter
checks — specifically adjust the
`/\bboundary=(?:"([^"]+)"|([^;]+))/i.exec(contentType)` logic and ensure the
`boundary` variable (derived from `boundaryMatch?.[1] ?? boundaryMatch?.[2]`) is
normalized (trimmed/validated) so `--${boundary}` matches actual multipart
delimiters.

Comment on lines 57 to 91
test('Vectorize query uses match-sorter when embedding text is known', async () => {
const embedRes = await fetch(
'https://api.cloudflare.com/client/v4/accounts/acc123/ai/run/@cf/google/embeddinggemma-300m',
{
method: 'POST',
headers: {
Authorization: 'Bearer test-token',
'Content-Type': 'application/json',
},
body: JSON.stringify({ text: ['About KCD MCP'] }),
},
)
expect(embedRes.ok).toBe(true)
const embedJson = (await embedRes.json()) as any
const vector = embedJson.result.data[0]
expect(Array.isArray(vector)).toBe(true)

const queryRes = await fetch(
'https://api.cloudflare.com/client/v4/accounts/acc123/vectorize/v2/indexes/semantic-index/query',
{
method: 'POST',
headers: {
Authorization: 'Bearer test-token',
'Content-Type': 'application/json',
},
body: JSON.stringify({ vector, topK: 5, returnMetadata: 'all' }),
},
)
expect(queryRes.ok).toBe(true)
const queryJson = (await queryRes.json()) as any
expect(queryJson.success).toBe(true)
expect(queryJson.result.matches.length).toBeGreaterThan(0)
expect(queryJson.result.matches[0].metadata.title).toBe('About KCD MCP')
expect(String(queryJson.result.matches[0].metadata.url)).toContain('about-mcp')
})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Test 3 is non-hermetic — depends on a filesystem content file for "About KCD MCP".

The assertion expect(queryJson.result.matches[0].metadata.title).toBe('About KCD MCP') at line 89 passes only if getSearchCorpus() returns a document with that exact title. The static seed list (getStaticDocs) and faker-generated podcast docs contain no such entry; the match must come from a file under content/pages/ with that exact frontmatter title.

If content/pages/about-mcp.mdx (or equivalent) is missing, renamed, or if the test runs in an environment where the content directory is unavailable, matchSorter returns no match, the handler falls back to cosine-similarity scoring, and the assertion fails with no obvious link to the missing file.

Consider making this test self-contained by seeding the target document explicitly before querying, similar to the approach in test 5:

🛡️ Suggested hermetic approach
 test('Vectorize query uses match-sorter when embedding text is known', async () => {
+    // Seed the expected doc so the test is independent of the filesystem.
+    const seedBoundary = '----vitest-seed-boundary'
+    const seedNdjson = `${JSON.stringify({
+        id: '/about-mcp',
+        values: [0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
+        metadata: { type: 'page', title: 'About KCD MCP', url: '/about-mcp', snippet: 'About KCD MCP' },
+    })}\n`
+    const seedMultipart = [
+        `--${seedBoundary}\r\n`,
+        'Content-Disposition: form-data; name="vectors"; filename="vectors.ndjson"\r\n',
+        'Content-Type: application/x-ndjson\r\n',
+        '\r\n',
+        seedNdjson,
+        `\r\n--${seedBoundary}--\r\n`,
+    ].join('')
+    await fetch(
+        'https://api.cloudflare.com/client/v4/accounts/acc123/vectorize/v2/indexes/semantic-index/upsert',
+        {
+            method: 'POST',
+            headers: { Authorization: 'Bearer test-token', 'Content-Type': `multipart/form-data; boundary=${seedBoundary}` },
+            body: seedMultipart,
+        },
+    )
+
     const embedRes = await fetch( ...
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
test('Vectorize query uses match-sorter when embedding text is known', async () => {
const embedRes = await fetch(
'https://api.cloudflare.com/client/v4/accounts/acc123/ai/run/@cf/google/embeddinggemma-300m',
{
method: 'POST',
headers: {
Authorization: 'Bearer test-token',
'Content-Type': 'application/json',
},
body: JSON.stringify({ text: ['About KCD MCP'] }),
},
)
expect(embedRes.ok).toBe(true)
const embedJson = (await embedRes.json()) as any
const vector = embedJson.result.data[0]
expect(Array.isArray(vector)).toBe(true)
const queryRes = await fetch(
'https://api.cloudflare.com/client/v4/accounts/acc123/vectorize/v2/indexes/semantic-index/query',
{
method: 'POST',
headers: {
Authorization: 'Bearer test-token',
'Content-Type': 'application/json',
},
body: JSON.stringify({ vector, topK: 5, returnMetadata: 'all' }),
},
)
expect(queryRes.ok).toBe(true)
const queryJson = (await queryRes.json()) as any
expect(queryJson.success).toBe(true)
expect(queryJson.result.matches.length).toBeGreaterThan(0)
expect(queryJson.result.matches[0].metadata.title).toBe('About KCD MCP')
expect(String(queryJson.result.matches[0].metadata.url)).toContain('about-mcp')
})
test('Vectorize query uses match-sorter when embedding text is known', async () => {
// Seed the expected doc so the test is independent of the filesystem.
const seedBoundary = '----vitest-seed-boundary'
const seedNdjson = `${JSON.stringify({
id: '/about-mcp',
values: [0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
metadata: { type: 'page', title: 'About KCD MCP', url: '/about-mcp', snippet: 'About KCD MCP' },
})}\n`
const seedMultipart = [
`--${seedBoundary}\r\n`,
'Content-Disposition: form-data; name="vectors"; filename="vectors.ndjson"\r\n',
'Content-Type: application/x-ndjson\r\n',
'\r\n',
seedNdjson,
`\r\n--${seedBoundary}--\r\n`,
].join('')
await fetch(
'https://api.cloudflare.com/client/v4/accounts/acc123/vectorize/v2/indexes/semantic-index/upsert',
{
method: 'POST',
headers: { Authorization: 'Bearer test-token', 'Content-Type': `multipart/form-data; boundary=${seedBoundary}` },
body: seedMultipart,
},
)
const embedRes = await fetch(
'https://api.cloudflare.com/client/v4/accounts/acc123/ai/run/@cf/google/embeddinggemma-300m',
{
method: 'POST',
headers: {
Authorization: 'Bearer test-token',
'Content-Type': 'application/json',
},
body: JSON.stringify({ text: ['About KCD MCP'] }),
},
)
expect(embedRes.ok).toBe(true)
const embedJson = (await embedRes.json()) as any
const vector = embedJson.result.data[0]
expect(Array.isArray(vector)).toBe(true)
const queryRes = await fetch(
'https://api.cloudflare.com/client/v4/accounts/acc123/vectorize/v2/indexes/semantic-index/query',
{
method: 'POST',
headers: {
Authorization: 'Bearer test-token',
'Content-Type': 'application/json',
},
body: JSON.stringify({ vector, topK: 5, returnMetadata: 'all' }),
},
)
expect(queryRes.ok).toBe(true)
const queryJson = (await queryRes.json()) as any
expect(queryJson.success).toBe(true)
expect(queryJson.result.matches.length).toBeGreaterThan(0)
expect(queryJson.result.matches[0].metadata.title).toBe('About KCD MCP')
expect(String(queryJson.result.matches[0].metadata.url)).toContain('about-mcp')
})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mocks/__tests__/cloudflare.test.ts` around lines 57 - 91, The test "Vectorize
query uses match-sorter when embedding text is known" is non-hermetic because it
relies on getSearchCorpus() containing a page titled "About KCD MCP"; make the
test self-contained by explicitly seeding the search corpus with that document
before calling the query: create and insert a document object with
metadata.title = "About KCD MCP", metadata.url containing "about-mcp", and the
expected text into whatever mock data source or index the handler uses
(mirroring the approach in test 5), so matchSorter will find it
deterministically; update the test (the test block using fetch to the
vectorize/v2/indexes/semantic-index/query endpoint) to add the seeding step and
ensure getStaticDocs()/getSearchCorpus() are not relied upon for this assertion.

@cursor

This comment has been minimized.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
mocks/__tests__/cloudflare.test.ts (1)

7-17: Consider adding server.resetHandlers() in afterEach.

Standard MSW setup includes afterEach(() => server.resetHandlers()) to remove any runtime handlers added via server.use(). No current test uses server.use(), but adding this guard is cheap insurance against future test additions leaking handler overrides.

♻️ Suggested addition
+import { afterAll, afterEach, beforeAll, beforeEach, describe, expect, test } from 'vitest'

 afterAll(() => {
 	server.close()
 })

+afterEach(() => {
+	server.resetHandlers()
+})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mocks/__tests__/cloudflare.test.ts` around lines 7 - 17, Add a cleanup step
to reset any runtime MSW handlers after each test by calling
server.resetHandlers() in an afterEach; for example add afterEach(() =>
server.resetHandlers()) (or combine with your existing
resetCloudflareMockState() into afterEach(() => { server.resetHandlers();
resetCloudflareMockState(); })) so any server.use() overrides won't leak between
tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@mocks/cloudflare.ts`:
- Around line 434-445: buildSearchCorpus currently calls getPodcastDocs()
outside its defensive try/catch which can let any runtime error reject the
module-level searchCorpusPromise permanently; wrap the getPodcastDocs()
invocation in the same try/catch (or add a dedicated try/catch around
docs.push(...getPodcastDocs())) so any errors are caught/handled (e.g., ignore
or log) and do not cause buildSearchCorpus to reject, ensuring
searchCorpusPromise isn't poisoned and resetCloudflareMockState() isn't relied
on to clear it.

---

Nitpick comments:
In `@mocks/__tests__/cloudflare.test.ts`:
- Around line 7-17: Add a cleanup step to reset any runtime MSW handlers after
each test by calling server.resetHandlers() in an afterEach; for example add
afterEach(() => server.resetHandlers()) (or combine with your existing
resetCloudflareMockState() into afterEach(() => { server.resetHandlers();
resetCloudflareMockState(); })) so any server.use() overrides won't leak between
tests.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@mocks/cloudflare.ts`:
- Around line 57-65: resetCloudflareMockState currently doesn't clear the cached
searchCorpusPromise and buildSearchCorpus calls getPodcastDocs() outside a
try/catch, which can cache a rejected promise; update resetCloudflareMockState
to set searchCorpusPromise = null, and modify buildSearchCorpus/getSearchCorpus
so the call to getPodcastDocs() is wrapped in try/catch (or ensure any rejection
is caught), and on error clear searchCorpusPromise (set it to null) before
rethrowing the error so a poisoned promise isn't retained and retries via
ensureSeededIndex / seededIndexPromises can succeed.
- Around line 724-738: The current loop over parsed.vectors writes items to
store (ns.set) before detecting an insert conflict, allowing partial commits;
fix by pre-validating the entire batch before any mutation: in the handler that
uses parsed.vectors, when operation === 'insert' first iterate parsed.vectors to
check for duplicate IDs within the batch and for any existing IDs in
store.get(v.namespace) (use ns.has(v.id)), and if any conflict call
jsonError(409, ...) and return without modifying store or touching updated; only
after validation do the actual ns.set(v.id, v) and increment updated. Ensure
updated is only updated during the second (commit) loop.

cursoragent and others added 2 commits February 19, 2026 02:02
Co-authored-by: Kent C. Dodds <me+github@kentcdodds.com>
Co-authored-by: Kent C. Dodds <me+github@kentcdodds.com>
@kentcdodds kentcdodds merged commit 0ce48c0 into main Feb 19, 2026
7 checks passed
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is ON. A Cloud Agent has been kicked off to fix the reported issue.

vectorizeIndexes.clear()
seededIndexPromises.clear()
embeddingVectorToText.clear()
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reset function doesn't clear all module-level mutable state

Low Severity

resetCloudflareMockState() clears vectorizeIndexes, seededIndexPromises, and embeddingVectorToText, but does not clear searchCorpusPromise or docEmbeddingCache. The searchCorpusPromise depends on the mutable mockTransistorEpisodes array (which the transistor mock's POST handler mutates via episodes.push). Once the corpus is built and cached, subsequent resets won't reflect mutations to that shared array, creating a stale-data cross-test pollution vector that contradicts the function's documented purpose.

Additional Locations (1)

Fix in Cursor Fix in Web

@cursor
Copy link

cursor bot commented Feb 19, 2026

Bugbot Autofix prepared fixes for 1 of the 1 bugs found in the latest run.

  • ✅ Fixed: Reset function doesn't clear all module-level mutable state
    • Reset now clears the cached search corpus and document embedding map to fully reset mock state between tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments