Skip to content

Conversation

@jstasiak
Copy link

@jstasiak jstasiak commented Feb 1, 2026

Claude Code writes multiple JSONL entries per API response during streaming. Each entry shares the same messageId:requestId hash, but output_tokens accumulates incrementally, starting near 0 and reaching the final count when the response completes.

The old dedup logic kept the first entry per hash, resulting in using low output_tokens values for some responses. The impact of that varied.

In my Claude Code sessions sometimes there was no difference at all (presumably because of no incremental response streaming), sometimes it made ~15% difference (in cost terms) in a given 5-hour block.


Review note: I'm not sure the change to the should deduplicate entries across sessions test makes sense and matches the goals of the project. I figure we'll have a problem regardless of the way we assign tokens (session 1 vs session 2)? Is there something more sophisticated we can do here?

Summary by CodeRabbit

  • Bug Fixes
    • Improved accuracy of streaming output token counts in usage data by ensuring the most complete entry state is retained during processing.

✏️ Tip: You can customize this high-level summary in your review settings.

Claude Code writes multiple JSONL entries per API response during
streaming. Each entry shares the same messageId:requestId hash, but
output_tokens accumulates incrementally, starting near 0 and reaching
the final count when the response completes.

The old dedup logic kept the first entry per hash, resulting in
using low output_tokens values for some responses. The impact of that
varied.

In my Claude Code sessions sometimes there was no difference at all
(presumably because of no incremental response streaming), sometimes it
made ~15% difference (in cost terms) in a given 5-hour block.
@coderabbitai
Copy link

coderabbitai bot commented Feb 1, 2026

📝 Walkthrough

Walkthrough

This PR refactors the deduplication logic in the data loader from per-entry helper functions to a hash-based Map approach. It replaces isDuplicateEntry and markAsProcessed with hash grouping across daily usage, session, and block data loading functions, ensuring the last complete entry is retained per unique identifier (message ID + request ID).

Changes

Cohort / File(s) Summary
Data Deduplication Refactoring
apps/ccusage/src/data-loader.ts
Replaces per-entry deduplication helpers with Map-based hash grouping; modifies loadDailyUsageData, loadSessionData, and loadSessionBlockData to group entries by uniqueHash and keep the last entry for each hash; entries without hash are collected separately and appended after hashed entries; updates error logging in line processing.
Test Expectations
apps/ccusage/src/...*test.ts
Updates test expectations to reflect "keep last entry wins" deduplication semantics, including adjustments to date values, token counts, and last-entry behavior across multiple test cases.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related issues

Possibly related PRs

Suggested reviewers

  • ryoppippi

Poem

🐰 Hop, hash, and gather round,
Last entry wins on dedup ground!
Maps replace the helpers old,
Streaming tokens now ring true and bold! 🔔

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title accurately and directly summarizes the main change: fixing underreported output tokens by switching from first-entry to last-entry deduplication.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant