Skip to content
This repository was archived by the owner on Nov 16, 2025. It is now read-only.

Commit 4960277

Browse files
steipeteclaude
andcommitted
Add documentation for token parsing fix
Document the token parsing bug fix and cache invalidation implementation: - Explain the root cause (incorrect JSON structure expectations) - Detail the changes made to support cache tokens and costUSD - Document the automatic cache invalidation system 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
1 parent 091c320 commit 4960277

File tree

2 files changed

+203
-0
lines changed

2 files changed

+203
-0
lines changed

TOKEN_PARSING_FIX_SUMMARY.md

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
# Token Parsing Fix Summary
2+
3+
## Problem
4+
The ClaudeCodeLogParser was returning 0 tokens even though the log files contained valid token data. The ccost app was able to parse the same logs correctly.
5+
6+
## Root Cause
7+
The parser was looking for the wrong JSON structure. The actual Claude Code log format has tokens inside `message.usage` with snake_case field names:
8+
9+
```json
10+
{
11+
"timestamp": "2025-06-03T21:55:26.847Z",
12+
"version": "1.0.10",
13+
"message": {
14+
"model": "claude-sonnet-4-20250514",
15+
"usage": {
16+
"input_tokens": 4,
17+
"output_tokens": 2,
18+
"cache_creation_input_tokens": 6755,
19+
"cache_read_input_tokens": 10177
20+
}
21+
},
22+
"costUSD": 0.123
23+
}
24+
```
25+
26+
## Changes Made
27+
28+
### 1. Updated ClaudeCodeLogParser.swift
29+
- Fixed the log line filtering to check for `message.usage` structure
30+
- Updated `parseClaudeCodeFormat` to match the actual log structure
31+
- Added support for cache tokens (cache_creation_input_tokens, cache_read_input_tokens)
32+
- Added support for costUSD field
33+
34+
### 2. Updated ClaudeUsageData.swift (ClaudeLogEntry model)
35+
- Added `cacheCreationTokens: Int?` field
36+
- Added `cacheReadTokens: Int?` field
37+
- Added `costUSD: Double?` field
38+
- Updated Codable implementation to handle these new fields
39+
- Updated convenience initializer with default parameters
40+
41+
### 3. Updated ClaudeDailyUsage aggregate properties
42+
- Added `totalCacheCreationTokens` computed property
43+
- Added `totalCacheReadTokens` computed property
44+
- Updated `totalTokens` to include cache tokens
45+
46+
## Key Differences from Previous Implementation
47+
1. The parser was looking for camelCase field names (inputTokens) instead of snake_case (input_tokens)
48+
2. The parser wasn't looking in the correct nested structure (message.usage)
49+
3. The model didn't support cache tokens or costUSD fields that are present in the logs
50+
51+
## Testing
52+
The fix should now correctly parse:
53+
- Regular input/output tokens
54+
- Cache creation tokens
55+
- Cache read tokens
56+
- Cost USD field
57+
- Model information
58+
59+
All token counts should now match what ccost reports.
60+
61+
## Cache Invalidation
62+
The fix includes automatic cache invalidation to ensure old cached data is cleared when the parser format changes:
63+
64+
- Added `currentCacheVersion = 2` in ClaudeLogManager
65+
- Cache is automatically cleared on app startup if the version is outdated
66+
- This ensures users will see the correct token counts immediately after updating
67+
- The cache versioning system prevents stale data from being shown after parser updates

TOKEN_PARSING_SUMMARY.md

Lines changed: 136 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,136 @@
1+
# Claude Token Parsing in VibeMeter
2+
3+
## Overview
4+
5+
VibeMeter parses Claude Code usage logs to track token consumption. The app expects JSONL (JSON Lines) format files located in the `.claude/projects` directory.
6+
7+
## Token Parsing Architecture
8+
9+
### 1. **ClaudeLogManager** (Main Orchestrator)
10+
- Manages access to Claude log files via security-scoped bookmarks
11+
- Coordinates the scanning and parsing process
12+
- Caches parsed results for 5 minutes to improve performance
13+
- Uses SHA-256 hashing to detect file changes
14+
15+
### 2. **ClaudeLogFileScanner**
16+
- Scans the `.claude/projects` directory for JSONL files
17+
- Filters out files older than 30 days based on filename dates or modification time
18+
- Returns files sorted by modification date (newest first)
19+
20+
### 3. **ClaudeCodeLogParser**
21+
- The core parsing logic that supports multiple log formats
22+
- Uses 4 different parsing strategies in order:
23+
1. Standard nested format (`message.usage`)
24+
2. Top-level usage format
25+
3. Claude Code specific formats
26+
4. Regex-based extraction as fallback
27+
28+
### 4. **ClaudeLogProcessor** (Background Actor)
29+
- Processes files asynchronously for better performance
30+
- Uses memory-mapped files for efficient reading
31+
- Processes data in 64KB chunks to manage memory
32+
- Skips files smaller than 100 bytes
33+
34+
## Supported Log Formats
35+
36+
### 1. Standard Nested Format (Original Claude API)
37+
```json
38+
{
39+
"timestamp": "2025-01-06T10:30:00.000Z",
40+
"model": "claude-3-5-sonnet",
41+
"message": {
42+
"usage": {
43+
"input_tokens": 100,
44+
"output_tokens": 50
45+
}
46+
}
47+
}
48+
```
49+
50+
### 2. Top-Level Usage Format
51+
```json
52+
{
53+
"timestamp": "2025-01-06T10:30:00.000Z",
54+
"model": "claude-3-5-sonnet",
55+
"usage": {
56+
"input_tokens": 100,
57+
"output_tokens": 50
58+
}
59+
}
60+
```
61+
62+
### 3. Claude Code Format with CamelCase
63+
```json
64+
{
65+
"timestamp": "2025-01-06T10:30:00.000Z",
66+
"model": "claude-3-5-sonnet",
67+
"message": {
68+
"usage": {
69+
"inputTokens": 100,
70+
"outputTokens": 50
71+
}
72+
}
73+
}
74+
```
75+
76+
### 4. Mixed Formats
77+
The parser can handle:
78+
- Both `input_tokens`/`output_tokens` and `inputTokens`/`outputTokens`
79+
- Usage data at top level or nested in `message.usage`
80+
- Additional fields that are ignored (type, event, metadata, etc.)
81+
82+
## Data Structure
83+
84+
### ClaudeLogEntry
85+
```swift
86+
struct ClaudeLogEntry {
87+
let timestamp: Date
88+
let model: String?
89+
let inputTokens: Int
90+
let outputTokens: Int
91+
}
92+
```
93+
94+
## Parsing Process
95+
96+
1. **File Discovery**
97+
- Looks for JSONL files in `~/.claude/projects/`
98+
- Filters files by age (30-day cutoff)
99+
- Sorts by modification date
100+
101+
2. **Line-by-Line Processing**
102+
- Each JSONL file contains one JSON object per line
103+
- Lines are processed individually
104+
- Non-relevant lines are skipped early (summary, user messages, etc.)
105+
106+
3. **Token Extraction**
107+
- First attempts structured JSON parsing
108+
- Falls back to regex extraction for malformed JSON
109+
- Supports multiple field name variations
110+
111+
4. **Filtering Rules**
112+
- Skip lines containing: `"type":"summary"`, `"type":"user"`, `leafUuid`, `sessionId`, `parentUuid`
113+
- Only process lines containing "tokens" or "Tokens"
114+
115+
## Performance Optimizations
116+
117+
1. **Caching**: 5-minute cache with SHA-256 file hashing
118+
2. **Memory-mapped files**: For efficient large file reading
119+
3. **Chunk processing**: 64KB chunks with autoreleasepool
120+
4. **Early filtering**: Skip small files and non-token lines
121+
5. **Parallel processing**: Background actor for async operations
122+
123+
## Error Handling
124+
125+
- Invalid JSON lines are skipped silently
126+
- Missing token fields result in line being skipped
127+
- Malformed timestamps use current date as fallback
128+
- File access errors are logged but don't stop processing
129+
130+
## Usage in UI
131+
132+
The parsed data is:
133+
- Grouped by day
134+
- Used to calculate costs based on token pricing
135+
- Displayed in the Claude Usage Report view
136+
- Used for 5-hour window calculations for Pro/Max tiers

0 commit comments

Comments
 (0)