fix(ccusage): avoid RangeError when parsing large transcript JSONL files by MumuTW · Pull Request #875 · ryoppippi/ccusage

MumuTW · 2026-03-06T03:58:12Z

Summary

replace calculateContextTokens full-file readFile parsing with streaming readline-based parsing
skip transcript files early when first non-empty line has type: "file-history-snapshot"
add regression test for file-history-snapshot transcript inputs

Testing

pnpm --dir ccusage --filter ccusage test

Fixes #873

Summary by CodeRabbit

Performance
- Reduced memory use and improved speed for large transcript processing via incremental streaming parsing.
Bug Fixes
- More robust error handling and resilience during data loading.
- More accurate context-token calculations and usage-percentage reporting.
- Early-skip handling for specific transcript types to avoid incorrect results.
Behavior Changes
- JSON output mode now suppresses standard log output for quieter machine-readable results.
Tests
- Added coverage for edge-case transcript processing and early-skip behavior.

…shots

coderabbitai · 2026-03-06T03:58:31Z

📝 Walkthrough

Walkthrough

Rewrites ccusage transcript parsing to stream-read JSONL files line-by-line with an early skip for file-history-snapshot entries and incremental assistant-usage extraction to compute context token percentages. Separately, opencode CLI commands now silence logs when JSON output is requested.

Changes

Cohort / File(s)	Summary
Streaming JSONL parser `apps/ccusage/src/data-loader.ts`	Replaces full-file reads with createReadStream + readline streaming; fast-prefix check to early-return on `file-history-snapshot`; per-line JSON parse + `transcriptMessageSchema` validation; track latest assistant usage (tokens, cacheTokens); request context limit from `PricingFetcher` when modelId present; robust per-line error handling; preserves null semantics for no usable data.
CLI JSON output logging changes `apps/opencode/src/commands/daily.ts`, `apps/opencode/src/commands/weekly.ts`, `apps/opencode/src/commands/monthly.ts`, `apps/opencode/src/commands/session.ts`	When `--json` / `jsonOutput` is set, set `logger.level = 0` at start to silence normal logging before loading/output.

Sequence Diagram(s)

sequenceDiagram
  participant FS as File System
  participant Stream as Stream Reader
  participant Parser as Per-line Parser/Validator
  participant Aggregator as Usage Aggregator
  participant Pricing as PricingFetcher

  FS->>Stream: open JSONL (createReadStream)
  Stream->>Parser: emit next line
  Parser-->>Stream: parsed object or error
  alt first-line indicates file-history-snapshot
    Parser->>Aggregator: signal skip -> return null
  else assistant usage line found
    Parser->>Aggregator: update latestUsage (inputTokens, cacheTokens)
    Aggregator->>Stream: continue reading
  end
  Stream->>Aggregator: EOF
  Aggregator->>Pricing: request contextLimit (modelId)
  Pricing-->>Aggregator: contextLimit or failure
  Aggregator->>Caller: compute percentage or return null

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

fix(ccusage): use streaming to handle large JSONL files #706: Also converts JSONL handling to line-by-line streaming — directly related to the streaming approach in data-loader.ts.
feat: add context token display to statusline command #480: Prior work on calculateContextTokens and transcript parsing/schemas — overlaps the same function and validation logic.

Suggested reviewers

ryoppippi

Poem

🐰 I hop through lines and parse with care,

Skipping snapshot mountains, light as air.
I count the tokens, gently, one by one,
Streaming safe and tidy — job well done. 🥕

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Out of Scope Changes check	⚠️ Warning	Changes to apps/opencode/src/commands files (daily.ts, monthly.ts, session.ts, weekly.ts) adding logger.level=0 for JSON output appear out of scope relative to the linked issue `#873`, which focuses solely on fixing RangeError in ccusage data-loader.ts.	Remove logger-level changes from opencode commands or link a separate issue documenting this JSON logging behavior as a requirement.
Linked Issues check	❓ Inconclusive	The PR addresses the core requirements from issue `#873`: streaming-based parsing with early file-type detection [`#873`], skip file-history-snapshot files [`#873`], and prevent RangeError crashes [`#873`]. However, changes to opencode commands (daily.ts, monthly.ts, session.ts, weekly.ts) adding logger.level=0 for JSON mode appear unrelated to the linked issue.	Clarify whether logger-level changes in opencode commands are part of issue `#873` or a separate concern, as they appear unrelated to the stated objective of fixing file parsing crashes.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: replacing file-read approach with streaming to avoid RangeError when parsing large JSONL files, which is the primary objective of this PR.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

apps/ccusage/src/data-loader.ts (1)

1295-1314: Type assignment may not fully narrow input_tokens to required.

The assignment at line 1309 assigns obj.message.usage (where input_tokens is optional per transcriptUsageSchema) to latestUsage (where input_tokens is required). While the check at line 1307 ensures input_tokens != null at runtime, TypeScript's property narrowing may not fully narrow the parent object type.

Consider using a type assertion or explicit object construction to ensure type safety:

💡 Suggested refactor for explicit type construction

 if (
     obj.type === 'assistant' &&
     obj.message != null &&
     obj.message.usage != null &&
     obj.message.usage.input_tokens != null
 ) {
-    latestUsage = obj.message.usage;
+    latestUsage = {
+        input_tokens: obj.message.usage.input_tokens,
+        cache_creation_input_tokens: obj.message.usage.cache_creation_input_tokens,
+        cache_read_input_tokens: obj.message.usage.cache_read_input_tokens,
+    };
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@apps/ccusage/src/data-loader.ts` around lines 1295 - 1314, The assignment of
obj.message.usage to latestUsage can leave TypeScript unconvinced that
input_tokens is present because transcriptUsageSchema marks it optional; to fix,
explicitly construct or cast a value with the required shape before assigning to
latestUsage — e.g., after the runtime check (obj.message.usage != null &&
obj.message.usage.input_tokens != null) create a new object with the needed
properties (or use a type assertion to the required type) and assign that to
latestUsage; update the logic around transcriptMessageSchema,
transcriptUsageSchema, obj, input_tokens, and latestUsage in the try block so
the compiler sees a value that definitely satisfies latestUsage's required
fields.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@apps/ccusage/src/data-loader.ts`:
- Around line 1295-1314: The assignment of obj.message.usage to latestUsage can
leave TypeScript unconvinced that input_tokens is present because
transcriptUsageSchema marks it optional; to fix, explicitly construct or cast a
value with the required shape before assigning to latestUsage — e.g., after the
runtime check (obj.message.usage != null && obj.message.usage.input_tokens !=
null) create a new object with the needed properties (or use a type assertion to
the required type) and assign that to latestUsage; update the logic around
transcriptMessageSchema, transcriptUsageSchema, obj, input_tokens, and
latestUsage in the try block so the compiler sees a value that definitely
satisfies latestUsage's required fields.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3f65d700-2b4a-40b7-bd2d-fc00166209e9

📥 Commits

Reviewing files that changed from the base of the PR and between c40ea6e and 0896c64.

📒 Files selected for processing (1)

apps/ccusage/src/data-loader.ts

MumuTW · 2026-03-06T05:10:23Z

Follow-up for #829: silenced logger output in JSON mode for ccusage-opencode.\n\nWhat changed:\n- Set when is active in:\n - \n - \n - \n - \n\nValidation:\n- vinext | WARN The field "pnpm.peerDependencyRules" was found in /home/opc/.paperclip/instances/default/workspaces/7948d02f-b91e-4189-b9eb-32bf0b5923d2/vinext/package.json. This will not take effect. You should configure "pnpm.peerDependencyRules" at the root of the workspace instead.
vinext | WARN The field "pnpm.onlyBuiltDependencies" was found in /home/opc/.paperclip/instances/default/workspaces/7948d02f-b91e-4189-b9eb-32bf0b5923d2/vinext/package.json. This will not take effect. You should configure "pnpm.onlyBuiltDependencies" at the root of the workspace instead.

@ccusage/opencode@18.0.8 test /home/opc/.paperclip/instances/default/workspaces/7948d02f-b91e-4189-b9eb-32bf0b5923d2/ccusage/apps/opencode
TZ=UTC vitest

RUN v4.0.15 /home/opc/.paperclip/instances/default/workspaces/7948d02f-b91e-4189-b9eb-32bf0b5923d2/ccusage/apps/opencode

✓ src/data-loader.ts (2 tests) 4ms
✓ src/commands/weekly.ts (4 tests) 4ms

Test Files 2 passed (2)
Tests 6 passed (6)
Start at 05:10:22
Duration 463ms (transform 238ms, setup 0ms, import 435ms, tests 8ms, environment 0ms)\n- (from )\n\nCommit:

MumuTW · 2026-03-06T05:10:33Z

Follow-up for #829: silenced logger output in JSON mode for ccusage-opencode.

What changed:

Set logger.level = 0 when --json is active in:
- apps/opencode/src/commands/daily.ts
- apps/opencode/src/commands/monthly.ts
- apps/opencode/src/commands/session.ts
- apps/opencode/src/commands/weekly.ts

Validation:

pnpm --filter @ccusage/opencode test
bun ./src/index.ts daily --json | jq . (run from apps/opencode)

Commit: 9995939

ryoppippi · 2026-03-06T08:27:38Z

thanks! lmc

pkg-pr-new · 2026-03-06T08:28:43Z

Open in StackBlitz

@ccusage/amp

npm i https://pkg.pr.new/ryoppippi/ccusage/@ccusage/amp@875

ccusage

npm i https://pkg.pr.new/ryoppippi/ccusage@875

@ccusage/codex

npm i https://pkg.pr.new/ryoppippi/ccusage/@ccusage/codex@875

@ccusage/mcp

npm i https://pkg.pr.new/ryoppippi/ccusage/@ccusage/mcp@875

@ccusage/opencode

npm i https://pkg.pr.new/ryoppippi/ccusage/@ccusage/opencode@875

@ccusage/pi

npm i https://pkg.pr.new/ryoppippi/ccusage/@ccusage/pi@875

commit: 9995939

…ement The latestUsage variable requires input_tokens as a non-optional number, but obj.message.usage has it as optional. Explicitly construct the object after the null check so TypeScript can see the narrowed type. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/ccusage/src/data-loader.ts`:
- Around line 1269-1288: The current fast-path checks the first non-empty line
via readline (createInterface) which forces Node to buffer the entire line and
crashes on huge single-line records; fix by reading a bounded prefix from the
file before creating the readline reader: open transcriptPath with fs (e.g.,
fs.open + filehandle.read or createReadStream with { start: 0, end: N-1 }), read
a small prefix (e.g., 4 KiB), trim leading whitespace, attempt to parse only
that prefix (or regex-extract the initial {"type":...} token) to detect if type
=== "file-history-snapshot", and if so log via logger.debug and return null;
otherwise close the temp handle/stream and then create the original
createReadStream + createInterface and continue as before. Ensure you properly
close file handles/streams (or destroy the temp stream) and preserve the
existing variables firstNonEmptyLineSeen and the rest of the processing flow.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fc0d4887-4b7a-4c69-9bb9-34c1e89f1600

📥 Commits

Reviewing files that changed from the base of the PR and between 9995939 and b9fd7cb.

📒 Files selected for processing (1)

apps/ccusage/src/data-loader.ts

apps/ccusage/src/data-loader.ts

…tion Read only the first 4 KiB of the file to detect file-history-snapshot type instead of using readline, which buffers the entire first line and crashes on huge single-line records (e.g. 734 MB). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

apps/ccusage/src/data-loader.ts (1)

1299-1320: Keep the new transcript path in the repo’s Result style.

This adds fresh try/catch JSON parsing plus repeated Result.isSuccess(...) checks. Switching the throwable parse to Result.try() and branching on Result.isFailure(contextLimitResult) would match the project’s byethrow conventions and keep the happy path flatter. As per coding guidelines, "Prefer @praha/byethrow Result type over traditional try-catch for functional error handling", "Use Result.try() for wrapping operations that may throw (JSON parsing, etc.)", and "Use Result.isFailure() for checking errors (more readable than !Result.isSuccess())".

Also applies to: 1342-1352

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/ccusage/src/data-loader.ts`:
- Around line 1267-1284: The fast-path currently assumes the first 4KiB begins
with {"type":...} and misses snapshots when "type" is not the first property;
change the probe to extract the first line from prefixBuf (find first newline
CR/LF within PREFIX_SIZE), parse that first-line substring as JSON (safe
try/catch) and read its top-level "type" property (instead of regex anchored to
the start) to detect "file-history-snapshot" and short-circuit (symbols:
PREFIX_SIZE, prefixBuf, readSync, transcriptPath, typeMatch/logger.debug); if no
newline is present in the prefix keep the existing fallback to readline; add a
regression test that writes a snapshot line where "type" is not the first field
to ensure detection still works.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2c75f78d-d9c4-4ca0-9955-a44ec79acec1

📥 Commits

Reviewing files that changed from the base of the PR and between b9fd7cb and a3bc6cb.

📒 Files selected for processing (1)

apps/ccusage/src/data-loader.ts

coderabbitai · 2026-03-06T11:44:11Z

apps/ccusage/src/data-loader.ts

+		// Fast-path: read a small prefix to detect file-history-snapshot without
+		// buffering a potentially huge first line via readline (see #873).
+		const PREFIX_SIZE = 4096;
+		const prefixBuf = Buffer.alloc(PREFIX_SIZE);
+		const fd = openSync(transcriptPath, 'r');
+		let bytesRead: number;
+		try {
+			bytesRead = readSync(fd, prefixBuf, 0, PREFIX_SIZE, 0);
+		} finally {
+			closeSync(fd);
+		}
+		if (bytesRead > 0) {
+			const prefix = prefixBuf.subarray(0, bytesRead).toString('utf-8').trimStart();
+			const typeMatch = prefix.match(/^\s*\{\s*"type"\s*:\s*"([^"]+)"/);
+			if (typeMatch != null && typeMatch[1] === 'file-history-snapshot') {
+				logger.debug('Skipping file-history-snapshot transcript file for context tokens');
+				return null;
+			}


⚠️ Potential issue | 🟠 Major

Don't make the snapshot fast-path depend on "type" being the first key.

The new probe only recognizes file-history-snapshot when the first record starts with {"type": ...} and that field appears inside the first 4 KiB. A valid snapshot line with another leading property will miss this check, fall back to readline, and reopen the huge-line crash path this PR is fixing. Please extract the first line’s top-level type without assuming field order, and add a regression where type is not serialized first.

Also applies to: 4780-4799

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@apps/ccusage/src/data-loader.ts` around lines 1267 - 1284, The fast-path currently assumes the first 4KiB begins with {"type":...} and misses snapshots when "type" is not the first property; change the probe to extract the first line from prefixBuf (find first newline CR/LF within PREFIX_SIZE), parse that first-line substring as JSON (safe try/catch) and read its top-level "type" property (instead of regex anchored to the start) to detect "file-history-snapshot" and short-circuit (symbols: PREFIX_SIZE, prefixBuf, readSync, transcriptPath, typeMatch/logger.debug); if no newline is present in the prefix keep the existing fallback to readline; add a regression test that writes a snapshot line where "type" is not the first field to ensure detection still works.

fix(ccusage): stream context token parsing and skip file-history snap…

0896c64

…shots

coderabbitai bot reviewed Mar 6, 2026

View reviewed changes

fix(opencode): silence logger in json output mode

9995939

coderabbitai bot reviewed Mar 6, 2026

View reviewed changes

apps/ccusage/src/data-loader.ts Outdated Show resolved Hide resolved

coderabbitai bot reviewed Mar 6, 2026

View reviewed changes

Uh oh!

Conversation

MumuTW commented Mar 6, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

MumuTW commented Mar 6, 2026

Uh oh!

MumuTW commented Mar 6, 2026

Uh oh!

ryoppippi commented Mar 6, 2026

Uh oh!

pkg-pr-new bot commented Mar 6, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MumuTW commented Mar 6, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 6, 2026 •

edited

Loading