fix(ccusage): Fix underreporting output tokens #835

jstasiak · 2026-02-01T20:44:42Z

Claude Code writes multiple JSONL entries per API response during streaming. Each entry shares the same messageId:requestId hash, but output_tokens accumulates incrementally, starting near 0 and reaching the final count when the response completes.

The old dedup logic kept the first entry per hash, resulting in using low output_tokens values for some responses. The impact of that varied.

In my Claude Code sessions sometimes there was no difference at all (presumably because of no incremental response streaming), sometimes it made ~15% difference (in cost terms) in a given 5-hour block.

Review note: I'm not sure the change to the should deduplicate entries across sessions test makes sense and matches the goals of the project. I figure we'll have a problem regardless of the way we assign tokens (session 1 vs session 2)? Is there something more sophisticated we can do here?

Summary by CodeRabbit

Bug Fixes
- Improved accuracy of streaming output token counts in usage data by ensuring the most complete entry state is retained during processing.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Claude Code writes multiple JSONL entries per API response during streaming. Each entry shares the same messageId:requestId hash, but output_tokens accumulates incrementally, starting near 0 and reaching the final count when the response completes. The old dedup logic kept the first entry per hash, resulting in using low output_tokens values for some responses. The impact of that varied. In my Claude Code sessions sometimes there was no difference at all (presumably because of no incremental response streaming), sometimes it made ~15% difference (in cost terms) in a given 5-hour block.

coderabbitai · 2026-02-01T20:45:00Z

📝 Walkthrough

Walkthrough

This PR refactors the deduplication logic in the data loader from per-entry helper functions to a hash-based Map approach. It replaces isDuplicateEntry and markAsProcessed with hash grouping across daily usage, session, and block data loading functions, ensuring the last complete entry is retained per unique identifier (message ID + request ID).

Changes

Cohort / File(s)	Summary
Data Deduplication Refactoring `apps/ccusage/src/data-loader.ts`	Replaces per-entry deduplication helpers with Map-based hash grouping; modifies loadDailyUsageData, loadSessionData, and loadSessionBlockData to group entries by uniqueHash and keep the last entry for each hash; entries without hash are collected separately and appended after hashed entries; updates error logging in line processing.
Test Expectations `apps/ccusage/src/...*test.ts`	Updates test expectations to reflect "keep last entry wins" deduplication semantics, including adjustments to date values, token counts, and last-entry behavior across multiple test cases.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related issues

Token count discrepancy - subagents/sub-tasks not fully captured #806: Token-count discrepancy; the hash-based deduplication changes to entry retention logic could affect token totals if entries are dropped or overwritten differently than before.

Possibly related PRs

feat: add data-aggregation module with helper functions #65: Directly reverses the deduplication helper approach by removing isDuplicateEntry and markAsProcessed and introducing hash-based Map deduplication in their place.
fix: implement chronological deduplication for branched conversations #58: Both PRs modify deduplication logic centered on message ID + request ID hashing; the earlier PR adds the createUniqueHash and chronological dedup with a Set, while this one replaces it with last-entry-wins Map-based deduplication.

Suggested reviewers

ryoppippi

Poem

🐰 Hop, hash, and gather round,
Last entry wins on dedup ground!
Maps replace the helpers old,
Streaming tokens now ring true and bold! 🔔

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The pull request title accurately and directly summarizes the main change: fixing underreported output tokens by switching from first-entry to last-entry deduplication.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(ccusage): Fix underreporting output tokens #835

fix(ccusage): Fix underreporting output tokens #835

Uh oh!

jstasiak commented Feb 1, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 1, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

fix(ccusage): Fix underreporting output tokens #835

Are you sure you want to change the base?

fix(ccusage): Fix underreporting output tokens #835

Uh oh!

Conversation

jstasiak commented Feb 1, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jstasiak commented Feb 1, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 1, 2026 •

edited

Loading