Conversation
… `EncodedTextAst` and `QueryHandler`.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughThis pull request adds CMake build configuration for multiple dependencies (Boost, ystdlib, ANTLR, date, fmt, SIMDJSON, SPDLOG) and introduces a comprehensive test suite infrastructure for CLP stream readers with utilities, test data setup, and test cases for both structured and unstructured IR stream formats. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@package.json`:
- Around line 16-18: The "lint" npm script references a missing "lint:check"
script, causing npm run lint to fail; update package.json scripts so
"lint:check" exists (e.g., add a "lint:check" script that runs "npm run
lint:check:js") or change the "lint" script to call "lint:check:js" directly;
modify the scripts section to include the new "lint:check" key (or update
"lint") so "lint", "lint:check", "lint:check:js", and "lint:fix:js" are
consistent.
In `@test/StructuredIrStreamReader.test.ts`:
- Around line 171-183: The test uses a tautological assertion
(toBeGreaterThanOrEqual(0)); replace it with meaningful bounds by asserting the
filtered result length is between 0 and the total number of events: after
calling reader.deserializeStream(), call reader.filterLogEvents(...) and then
getFilteredLogEventMap(); assert combinedMap.length is >= 0 and <= the original
total event count obtained from the reader (e.g. reader.getLogEventMap() or
whatever method returns the unfiltered event list), or assert an expected exact
count when known; apply the same change to the other test block that uses the
same tautological check.
In `@test/UnstructuredIrStreamReader.test.ts`:
- Around line 75-93: Add an explicit guard before destructuring the decoded
events: after calling reader.decodeRange(...) and the
expect(...).toHaveLength(...) assertion, check that events is non-null and
events.length > 0 (or call expect(events && events.length).toBeGreaterThan(0))
before doing const [event] = events; this ensures the test fails with a clear
message rather than throwing on destructuring; reference reader.decodeRange,
events, DECODE_CHUNK_SIZE and numEvents when locating where to add the guard.
In `@test/utils.ts`:
- Around line 30-38: downloadFileIfNeeded currently performs an unbounded stream
download directly to the final path with no status check or hard timeout; change
it to create the directory, start an AbortController-based hard-deadline timer
(e.g., 30_000ms) and pass its signal into axios.get, validate response.status
(expect 200) before streaming, stream into a temporary file (e.g., filePath +
".tmp") via pipeline, clear the timer and atomically rename the temp file to
filePath on success, and on any error/abort ensure the controller is aborted,
the temp file is removed and the write stream is closed; keep references to
axios.get, AbortController, pipeline, fs.createWriteStream, fs.rename and
fs.unlink when implementing these steps.
| it("should decode first events with expected fields", () => { | ||
| const numEvents = reader.deserializeStream(); | ||
|
|
||
| const events = reader.decodeRange(0, Math.min(DECODE_CHUNK_SIZE, numEvents), false); | ||
|
|
||
| expect(events).not.toBeNull(); | ||
| expect(events).toHaveLength(Math.min(DECODE_CHUNK_SIZE, numEvents)); | ||
|
|
||
| const [event] = events as NonNullable<typeof events>; | ||
|
|
||
| expect(typeof event.logEventNum).toBe("number"); | ||
| expect(typeof event.logLevel).toBe("number"); | ||
| expect(typeof event.message).toBe("string"); | ||
| expect(typeof event.timestamp).toBe("bigint"); | ||
| expect([ | ||
| "bigint", | ||
| "number", | ||
| ]).toContain(typeof event.utcOffset); | ||
| }); |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider adding a guard before destructuring.
While the expect(events).toHaveLength(...) assertion on line 81 ensures the array has elements, the destructuring on line 83 happens unconditionally. If the assertion fails, the test would throw a different error on destructuring rather than a clear test failure.
This is a minor robustness concern, but acceptable given the assertion order.
♻️ Optional: Add explicit length check before destructuring
expect(events).not.toBeNull();
expect(events).toHaveLength(Math.min(DECODE_CHUNK_SIZE, numEvents));
- const [event] = events as NonNullable<typeof events>;
+ const nonNullEvents = events as NonNullable<typeof events>;
+ expect(nonNullEvents.length).toBeGreaterThan(0);
+ const [event] = nonNullEvents;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| it("should decode first events with expected fields", () => { | |
| const numEvents = reader.deserializeStream(); | |
| const events = reader.decodeRange(0, Math.min(DECODE_CHUNK_SIZE, numEvents), false); | |
| expect(events).not.toBeNull(); | |
| expect(events).toHaveLength(Math.min(DECODE_CHUNK_SIZE, numEvents)); | |
| const [event] = events as NonNullable<typeof events>; | |
| expect(typeof event.logEventNum).toBe("number"); | |
| expect(typeof event.logLevel).toBe("number"); | |
| expect(typeof event.message).toBe("string"); | |
| expect(typeof event.timestamp).toBe("bigint"); | |
| expect([ | |
| "bigint", | |
| "number", | |
| ]).toContain(typeof event.utcOffset); | |
| }); | |
| it("should decode first events with expected fields", () => { | |
| const numEvents = reader.deserializeStream(); | |
| const events = reader.decodeRange(0, Math.min(DECODE_CHUNK_SIZE, numEvents), false); | |
| expect(events).not.toBeNull(); | |
| expect(events).toHaveLength(Math.min(DECODE_CHUNK_SIZE, numEvents)); | |
| const nonNullEvents = events as NonNullable<typeof events>; | |
| expect(nonNullEvents.length).toBeGreaterThan(0); | |
| const [event] = nonNullEvents; | |
| expect(typeof event.logEventNum).toBe("number"); | |
| expect(typeof event.logLevel).toBe("number"); | |
| expect(typeof event.message).toBe("string"); | |
| expect(typeof event.timestamp).toBe("bigint"); | |
| expect([ | |
| "bigint", | |
| "number", | |
| ]).toContain(typeof event.utcOffset); | |
| }); |
🤖 Prompt for AI Agents
In `@test/UnstructuredIrStreamReader.test.ts` around lines 75 - 93, Add an
explicit guard before destructuring the decoded events: after calling
reader.decodeRange(...) and the expect(...).toHaveLength(...) assertion, check
that events is non-null and events.length > 0 (or call expect(events &&
events.length).toBeGreaterThan(0)) before doing const [event] = events; this
ensures the test fails with a clear message rather than throwing on
destructuring; reference reader.decodeRange, events, DECODE_CHUNK_SIZE and
numEvents when locating where to add the guard.
test/utils.ts
Outdated
| const downloadFileIfNeeded = async (url: string, filePath: string): Promise<void> => { | ||
| if (fs.existsSync(filePath)) { | ||
| return; | ||
| } | ||
|
|
||
| fs.mkdirSync(path.dirname(filePath), {recursive: true}); | ||
|
|
||
| const response = await axios.get(url, {responseType: "stream"}); | ||
| await pipeline(response.data, fs.createWriteStream(filePath)); |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Axios 1.13.4 timeout behavior responseType stream Node.js download
💡 Result:
In Axios v1.x (incl. 1.13.4) on Node.js with responseType: 'stream', timeout is not a “total download time limit.” In practice it behaves like a response/socket inactivity timeout: if the server keeps sending chunks, the timer effectively never trips, so a long streaming download can run well past timeout and only fail if the connection goes idle/stalls. This is a known/raised behavior for chunked/streaming responses. [1][2]
If you need a hard deadline for a streaming download, use cancellation (AbortController / signal) and abort on your own timer (or AbortSignal.timeout() where available). Axios docs explicitly recommend combining timeout with cancellation to avoid hangs. [3]
Example (hard 30s cap, then abort the stream request):
import axios from "axios";
import { pipeline } from "node:stream/promises";
import fs from "node:fs";
const controller = new AbortController();
const t = setTimeout(() => controller.abort(new Error("Download deadline exceeded")), 30_000);
try {
const res = await axios.get(url, { responseType: "stream", signal: controller.signal });
await pipeline(res.data, fs.createWriteStream("file.bin"));
} finally {
clearTimeout(t);
}Sources:
[1] Axios request config (timeout, responseType: 'stream') https://axios.rest/pages/advanced/request-config
[2] Axios GitHub issue: timeout not respected for chunked/streaming responses #5886 axios/axios#5886
[3] Axios docs: Cancellation (combine timeout + signal) https://axios-http.com/docs/cancellation
Harden test-data downloads to avoid hangs and partial files.
Right now there's no timeout, status validation, and the stream writes directly to the final path. This can hang CI or leave corrupted files if the download fails or multiple workers run in parallel.
The proposed timeout: 30_000 alone is insufficient—in Axios 1.13.4 with responseType: "stream", timeout is only a socket inactivity threshold, not a hard deadline. Long downloads can exceed it while chunks still arrive. Enforce a hard deadline using AbortController with a separate timer instead.
🛠️ Proposed fix (hard timeout + status validation + atomic write)
const downloadFileIfNeeded = async (url: string, filePath: string): Promise<void> => {
if (fs.existsSync(filePath)) {
return;
}
fs.mkdirSync(path.dirname(filePath), {recursive: true});
- const response = await axios.get(url, {responseType: "stream"});
- await pipeline(response.data, fs.createWriteStream(filePath));
+ const tmpPath = `${filePath}.tmp`;
+ const controller = new AbortController();
+ const timeout = setTimeout(() => controller.abort(new Error("Download timeout")), 30_000);
+ try {
+ const response = await axios.get(url, {
+ responseType: "stream",
+ validateStatus: (status) => 200 <= status && status < 300,
+ signal: controller.signal,
+ });
+ await pipeline(response.data, fs.createWriteStream(tmpPath));
+ fs.renameSync(tmpPath, filePath);
+ } catch (error) {
+ if (fs.existsSync(tmpPath)) {
+ fs.unlinkSync(tmpPath);
+ }
+ throw error;
+ } finally {
+ clearTimeout(timeout);
+ }
};📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const downloadFileIfNeeded = async (url: string, filePath: string): Promise<void> => { | |
| if (fs.existsSync(filePath)) { | |
| return; | |
| } | |
| fs.mkdirSync(path.dirname(filePath), {recursive: true}); | |
| const response = await axios.get(url, {responseType: "stream"}); | |
| await pipeline(response.data, fs.createWriteStream(filePath)); | |
| const downloadFileIfNeeded = async (url: string, filePath: string): Promise<void> => { | |
| if (fs.existsSync(filePath)) { | |
| return; | |
| } | |
| fs.mkdirSync(path.dirname(filePath), {recursive: true}); | |
| const tmpPath = `${filePath}.tmp`; | |
| const controller = new AbortController(); | |
| const timeout = setTimeout(() => controller.abort(new Error("Download timeout")), 30_000); | |
| try { | |
| const response = await axios.get(url, { | |
| responseType: "stream", | |
| validateStatus: (status) => 200 <= status && status < 300, | |
| signal: controller.signal, | |
| }); | |
| await pipeline(response.data, fs.createWriteStream(tmpPath)); | |
| fs.renameSync(tmpPath, filePath); | |
| } catch (error) { | |
| if (fs.existsSync(tmpPath)) { | |
| fs.unlinkSync(tmpPath); | |
| } | |
| throw error; | |
| } finally { | |
| clearTimeout(timeout); | |
| } | |
| }; |
🤖 Prompt for AI Agents
In `@test/utils.ts` around lines 30 - 38, downloadFileIfNeeded currently performs
an unbounded stream download directly to the final path with no status check or
hard timeout; change it to create the directory, start an AbortController-based
hard-deadline timer (e.g., 30_000ms) and pass its signal into axios.get,
validate response.status (expect 200) before streaming, stream into a temporary
file (e.g., filePath + ".tmp") via pipeline, clear the timer and atomically
rename the temp file to filePath on success, and on any error/abort ensure the
controller is aborted, the temp file is removed and the write stream is closed;
keep references to axios.get, AbortController, pipeline, fs.createWriteStream,
fs.rename and fs.unlink when implementing these steps.
… event extraction in stream reader tests
…add browser testing with Playwright
`vitest.config.ts` did not match `vitest.browser.config.ts` or `vitest.node.config.ts`, so changes to those files would not trigger CI.
…ey don't take effect
…structured dataset names
…m reader test cases
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@package.json`:
- Line 18: The lint:fix:js script incorrectly forwards --fix to lint:check:js
causing tsc to receive an unknown --fix flag; update package.json so lint:fix:js
does not append --fix to the command that runs tsc. For example, create a
dedicated lint:fix:js script that runs the type check and lint fix separately
(invoke tsc --noEmit or npm run lint:check:types, then run eslint . --fix
--max-warnings 0), or change lint:check:js and lint:fix:js so only the eslint
invocation receives --fix; reference the npm script names lint:fix:js,
lint:check:js and the tsc invocation to locate where to split the commands.
- Line 19: The package.json currently uses the "prepare" lifecycle script to run
"npx playwright install", which triggers heavy browser downloads for every npm
install; remove the "prepare" entry and instead add a developer-only script like
"setup:test" (or "setup:playwright") that runs "npx playwright install", update
README/contributing docs to instruct contributors to run that script before
running browser tests, or conditionally run the install only in CI/dev by
checking an env var if you need automation.
In `@tsconfig.json`:
- Around line 33-37: The tsconfig's "include" array currently lists
"eslint.config.mjs", which is a JavaScript module and has no effect for
TypeScript compilation; either remove "eslint.config.mjs" from the "include"
array in tsconfig.json or rename the ESLint config file to a TypeScript module
(e.g., "eslint.config.ts") if you intend TypeScript to type-check the ESLint
config—update the "include" entry accordingly.
- Around line 16-31: The tsconfig currently sets "strict": true and also
explicitly repeats individual strict flags (e.g., "alwaysStrict",
"strictNullChecks", "strictBindCallApply", "strictFunctionTypes",
"strictPropertyInitialization", "noImplicitAny", "noImplicitThis",
"useUnknownInCatchVariables"); remove the redundant explicit keys to reduce
verbosity by keeping only "strict": true (or alternatively remove "strict" and
keep the explicit list if you prefer self-documenting flags) — update the
tsconfig entries around "strict" and the listed option names accordingly.
…d improved module support
PR Split Plan for
vitestBranchPrerequisites
build(deps): Upgrade CLP to de9fc08) has been merged intomain.task) sodist/ClpFfiJs-node.jsexists.Source Branch
All changes originate from the
vitestbranch. Base comparison commit:fe1d1684b379ebc116e3e22e2bc7a361a9edfd46.Total diff: ~914 lines (excluding
package-lock.json).Checklist
build(deps): Upgrade CLP to de9fc08test: Add Vitest and ESLint infrastructure(PR 1 below)test: Add test utilities and ClpStreamReader tests(PR 2 below)test: Add Playwright browser config to Vitest; Add GitHub test workflow(PR 3 below)test: Add shared stream reader test suitetest: Add unstructured IR stream reader teststest: Add structured IR stream reader testsPR 1: Build infrastructure — MERGED
PR: #115
Status: Merged into
mainFiles
.gitignorenode_modules,/test/dataentriespackage.jsonpackage-lock.jsonnpm installtsconfig.jsoneslint.config.mjsvitest.config.tslint-tasks.ymlcheck-js,fix-jstasksTaskfile.ymltest: "taskfiles/test.yaml"includetaskfiles/test.yamlPR 2: Test utilities and globalSetup — OPEN
PR: #118
Branch name:
test/utilitiesBase:
mainTitle:
test: Add test utilities and ClpStreamReader testsDiff size: ~219 lines
Files
test/utils.tscreateModule,loadTestData,createReader,assertNonNulltest/globalSetup.tstest/data/before tests runtest/ClpStreamReader.test.tsIrStreamType,MERGED_KV_PAIRS_*)vitest.config.tsglobalSetupreferenceNotes
globalSetup.tsdownloads test data once; Node reads from disk, browser fetches from Vite'sdev server.
utils.tsuses runtime detection (isNodeRuntime()) to select Node.js or browser code paths.Validation
Expected: 2 tests pass.
PR 3: Playwright browser config and CI workflow — MERGED
PR: #125
Status: Merged into
mainFiles
.github/workflows/test.yamltask test:jspackage.json@vitest/browser-playwright,test:initscriptpackage-lock.jsonvitest.config.tstaskfiles/test.yamlinittask for Playwright browser installtest/ClpStreamReader.test.tsexpect(true).toBe(true))test/globalSetup.tsNotes
PR 4: Test constants
Branch name:
test/constantsBase:
main(after PRs #118 and #125 are merged)Title:
test: Add test constantsDiff size: ~86 lines
Files
test/constants.tsValidation
Expected: 2 tests pass (from PR #118).
PR 5: Common stream reader tests
Branch name:
test/common-stream-testsBase: PR 4's branch
Title:
test: Add shared stream reader test suiteDiff size: ~160 lines
Files
test/commonStreamReaderTests.tsdescribeCommonTests()— 9 shared tests for both structured and unstructured streamsNotes
describeCommonTests(params)which registers sharedit()tests.describe()blocks in both stream reader test files.filter, find nearest timestamp, out-of-bounds decodeRange.
Validation
No runnable tests yet (shared suite only).
PR 6: Unstructured IR stream tests
Branch name:
test/unstructured-irBase: PR 5's branch
Title:
test: Add unstructured IR stream reader testsDiff size: ~96 lines
Files
test/UnstructuredIrStreamReader.test.tsNotes
loadTestData("unstructured-yarn.clp.zst").Validation
Expected: 14 tests pass (2 + 12).
PR 7: Structured IR stream tests
Branch name:
test/structured-irBase: PR 6's branch
Title:
test: Add structured IR stream reader testsDiff size: ~160 lines
Files
test/StructuredIrStreamReader.test.tsNotes
loadTestData("structured-cockroachdb.clp.zst").describeblocks:without extracted keys).
logLevelKey: 4 tests (extract log levels, filter INFO events, filtered decoding,combined KQL + log level with extracted keys).
Validation
Expected: 30 tests pass (2 + 12 + 16).
Summary
| PR | Title | Status | Cumulative Tests |
|----|--------------------------------------------- --|---------|:----------------:|
| 1 | Build infrastructure (#115) | Merged | 0 |
| 2 | Test utilities and ClpStreamReader tests (#118) | Open | 2 |
| 3 | Playwright browser config and CI (#125) | Merged | 2 |
| 4 | Test constants | Pending | 2 |
| 5 | Common stream reader tests | Pending | 2 |
| 6 | Unstructured IR stream tests | Pending | 14 |
| 7 | Structured IR stream tests | Pending | 30 |
File Distribution
.gitignorepackage.jsonpackage-lock.jsontsconfig.jsoneslint.config.mjsvitest.config.tslint-tasks.ymltaskfile.yamltaskfiles/test.yamltest/utils.tstest/globalSetup.ts.github/workflows/test.yamltest/constants.tstest/ClpStreamReader.test.tstest/commonStreamReaderTests.tstest/UnstructuredIrStreamReader.test.tstest/StructuredIrStreamReader.test.ts