fix(ccusage): avoid RangeError when parsing large transcript JSONL files#875
fix(ccusage): avoid RangeError when parsing large transcript JSONL files#875MumuTW wants to merge 4 commits intoryoppippi:mainfrom
Conversation
📝 WalkthroughWalkthroughRewrites ccusage transcript parsing to stream-read JSONL files line-by-line with an early skip for Changes
Sequence Diagram(s)sequenceDiagram
participant FS as File System
participant Stream as Stream Reader
participant Parser as Per-line Parser/Validator
participant Aggregator as Usage Aggregator
participant Pricing as PricingFetcher
FS->>Stream: open JSONL (createReadStream)
Stream->>Parser: emit next line
Parser-->>Stream: parsed object or error
alt first-line indicates file-history-snapshot
Parser->>Aggregator: signal skip -> return null
else assistant usage line found
Parser->>Aggregator: update latestUsage (inputTokens, cacheTokens)
Aggregator->>Stream: continue reading
end
Stream->>Aggregator: EOF
Aggregator->>Pricing: request contextLimit (modelId)
Pricing-->>Aggregator: contextLimit or failure
Aggregator->>Caller: compute percentage or return null
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
apps/ccusage/src/data-loader.ts (1)
1295-1314: Type assignment may not fully narrowinput_tokensto required.The assignment at line 1309 assigns
obj.message.usage(whereinput_tokensis optional pertranscriptUsageSchema) tolatestUsage(whereinput_tokensis required). While the check at line 1307 ensuresinput_tokens != nullat runtime, TypeScript's property narrowing may not fully narrow the parent object type.Consider using a type assertion or explicit object construction to ensure type safety:
💡 Suggested refactor for explicit type construction
if ( obj.type === 'assistant' && obj.message != null && obj.message.usage != null && obj.message.usage.input_tokens != null ) { - latestUsage = obj.message.usage; + latestUsage = { + input_tokens: obj.message.usage.input_tokens, + cache_creation_input_tokens: obj.message.usage.cache_creation_input_tokens, + cache_read_input_tokens: obj.message.usage.cache_read_input_tokens, + }; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/ccusage/src/data-loader.ts` around lines 1295 - 1314, The assignment of obj.message.usage to latestUsage can leave TypeScript unconvinced that input_tokens is present because transcriptUsageSchema marks it optional; to fix, explicitly construct or cast a value with the required shape before assigning to latestUsage — e.g., after the runtime check (obj.message.usage != null && obj.message.usage.input_tokens != null) create a new object with the needed properties (or use a type assertion to the required type) and assign that to latestUsage; update the logic around transcriptMessageSchema, transcriptUsageSchema, obj, input_tokens, and latestUsage in the try block so the compiler sees a value that definitely satisfies latestUsage's required fields.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@apps/ccusage/src/data-loader.ts`:
- Around line 1295-1314: The assignment of obj.message.usage to latestUsage can
leave TypeScript unconvinced that input_tokens is present because
transcriptUsageSchema marks it optional; to fix, explicitly construct or cast a
value with the required shape before assigning to latestUsage — e.g., after the
runtime check (obj.message.usage != null && obj.message.usage.input_tokens !=
null) create a new object with the needed properties (or use a type assertion to
the required type) and assign that to latestUsage; update the logic around
transcriptMessageSchema, transcriptUsageSchema, obj, input_tokens, and
latestUsage in the try block so the compiler sees a value that definitely
satisfies latestUsage's required fields.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3f65d700-2b4a-40b7-bd2d-fc00166209e9
📒 Files selected for processing (1)
apps/ccusage/src/data-loader.ts
|
Follow-up for #829: silenced logger output in JSON mode for ccusage-opencode.\n\nWhat changed:\n- Set when is active in:\n - \n - \n - \n - \n\nValidation:\n- vinext | WARN The field "pnpm.peerDependencyRules" was found in /home/opc/.paperclip/instances/default/workspaces/7948d02f-b91e-4189-b9eb-32bf0b5923d2/vinext/package.json. This will not take effect. You should configure "pnpm.peerDependencyRules" at the root of the workspace instead.
RUN v4.0.15 /home/opc/.paperclip/instances/default/workspaces/7948d02f-b91e-4189-b9eb-32bf0b5923d2/ccusage/apps/opencode ✓ src/data-loader.ts (2 tests) 4ms Test Files 2 passed (2) |
|
Follow-up for #829: silenced logger output in JSON mode for What changed:
Validation:
Commit: |
|
thanks! lmc |
@ccusage/amp
ccusage
@ccusage/codex
@ccusage/mcp
@ccusage/opencode
@ccusage/pi
commit: |
…ement The latestUsage variable requires input_tokens as a non-optional number, but obj.message.usage has it as optional. Explicitly construct the object after the null check so TypeScript can see the narrowed type. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/ccusage/src/data-loader.ts`:
- Around line 1269-1288: The current fast-path checks the first non-empty line
via readline (createInterface) which forces Node to buffer the entire line and
crashes on huge single-line records; fix by reading a bounded prefix from the
file before creating the readline reader: open transcriptPath with fs (e.g.,
fs.open + filehandle.read or createReadStream with { start: 0, end: N-1 }), read
a small prefix (e.g., 4 KiB), trim leading whitespace, attempt to parse only
that prefix (or regex-extract the initial {"type":...} token) to detect if type
=== "file-history-snapshot", and if so log via logger.debug and return null;
otherwise close the temp handle/stream and then create the original
createReadStream + createInterface and continue as before. Ensure you properly
close file handles/streams (or destroy the temp stream) and preserve the
existing variables firstNonEmptyLineSeen and the rest of the processing flow.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: fc0d4887-4b7a-4c69-9bb9-34c1e89f1600
📒 Files selected for processing (1)
apps/ccusage/src/data-loader.ts
…tion Read only the first 4 KiB of the file to detect file-history-snapshot type instead of using readline, which buffers the entire first line and crashes on huge single-line records (e.g. 734 MB). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
apps/ccusage/src/data-loader.ts (1)
1299-1320: Keep the new transcript path in the repo’sResultstyle.This adds fresh
try/catchJSON parsing plus repeatedResult.isSuccess(...)checks. Switching the throwable parse toResult.try()and branching onResult.isFailure(contextLimitResult)would match the project’s byethrow conventions and keep the happy path flatter. As per coding guidelines, "Prefer@praha/byethrowResult type over traditional try-catch for functional error handling", "UseResult.try()for wrapping operations that may throw (JSON parsing, etc.)", and "UseResult.isFailure()for checking errors (more readable than!Result.isSuccess())".Also applies to: 1342-1352
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/ccusage/src/data-loader.ts`:
- Around line 1267-1284: The fast-path currently assumes the first 4KiB begins
with {"type":...} and misses snapshots when "type" is not the first property;
change the probe to extract the first line from prefixBuf (find first newline
CR/LF within PREFIX_SIZE), parse that first-line substring as JSON (safe
try/catch) and read its top-level "type" property (instead of regex anchored to
the start) to detect "file-history-snapshot" and short-circuit (symbols:
PREFIX_SIZE, prefixBuf, readSync, transcriptPath, typeMatch/logger.debug); if no
newline is present in the prefix keep the existing fallback to readline; add a
regression test that writes a snapshot line where "type" is not the first field
to ensure detection still works.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 2c75f78d-d9c4-4ca0-9955-a44ec79acec1
📒 Files selected for processing (1)
apps/ccusage/src/data-loader.ts
| // Fast-path: read a small prefix to detect file-history-snapshot without | ||
| // buffering a potentially huge first line via readline (see #873). | ||
| const PREFIX_SIZE = 4096; | ||
| const prefixBuf = Buffer.alloc(PREFIX_SIZE); | ||
| const fd = openSync(transcriptPath, 'r'); | ||
| let bytesRead: number; | ||
| try { | ||
| bytesRead = readSync(fd, prefixBuf, 0, PREFIX_SIZE, 0); | ||
| } finally { | ||
| closeSync(fd); | ||
| } | ||
| if (bytesRead > 0) { | ||
| const prefix = prefixBuf.subarray(0, bytesRead).toString('utf-8').trimStart(); | ||
| const typeMatch = prefix.match(/^\s*\{\s*"type"\s*:\s*"([^"]+)"/); | ||
| if (typeMatch != null && typeMatch[1] === 'file-history-snapshot') { | ||
| logger.debug('Skipping file-history-snapshot transcript file for context tokens'); | ||
| return null; | ||
| } |
There was a problem hiding this comment.
Don't make the snapshot fast-path depend on "type" being the first key.
The new probe only recognizes file-history-snapshot when the first record starts with {"type": ...} and that field appears inside the first 4 KiB. A valid snapshot line with another leading property will miss this check, fall back to readline, and reopen the huge-line crash path this PR is fixing. Please extract the first line’s top-level type without assuming field order, and add a regression where type is not serialized first.
Also applies to: 4780-4799
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/ccusage/src/data-loader.ts` around lines 1267 - 1284, The fast-path
currently assumes the first 4KiB begins with {"type":...} and misses snapshots
when "type" is not the first property; change the probe to extract the first
line from prefixBuf (find first newline CR/LF within PREFIX_SIZE), parse that
first-line substring as JSON (safe try/catch) and read its top-level "type"
property (instead of regex anchored to the start) to detect
"file-history-snapshot" and short-circuit (symbols: PREFIX_SIZE, prefixBuf,
readSync, transcriptPath, typeMatch/logger.debug); if no newline is present in
the prefix keep the existing fallback to readline; add a regression test that
writes a snapshot line where "type" is not the first field to ensure detection
still works.
Summary
calculateContextTokensfull-filereadFileparsing with streaming readline-based parsingtype: "file-history-snapshot"file-history-snapshottranscript inputsTesting
pnpm --dir ccusage --filter ccusage testFixes #873
Summary by CodeRabbit
Performance
Bug Fixes
Behavior Changes
Tests