-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Fix token usage / cost often being underreported #6122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| // PREV: We need to let the request finish for openrouter to | ||
| // get generation details. | ||
| // UPDATE: It's better UX to interrupt the request at the | ||
| // cost of the API cost not being retrieved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment calls out OpenRouter specifically, but I think if seen usage being misreported with other providers as well.
- Add error handling in drainStreamInBackgroundToFindAllUsage with try-catch - Add 30-second timeout to prevent potential memory leaks from hanging streams - Include model ID in console warning for better debugging - Fix race condition by moving updateApiReqMsg() call inside background task - Ensure saveClineMessages() is called after usage data is updated
- Handle unhandled promise rejection by adding .catch() to background task - Fix race condition by using local variables in background task before atomically updating shared state - Eliminate duplicate telemetry capture logic with captureUsageData helper function - Make usage collection timeout configurable via CLINE_USAGE_COLLECTION_TIMEOUT env var (default: 30s) - Improve code organization and readability - Include model ID in all warning messages for better debugging
…ce condition fixes - Add proper error handling for background promise with .catch() - Fix race condition by using local variables in background task - Make timeout configurable via DEFAULT_USAGE_COLLECTION_TIMEOUT_MS constant - Extract captureUsageData helper for better code organization - Ensure model ID is included in warning messages - Update comments to reflect the actual implementation
|
I've pushed some improvements to address the review comments:
One concern I noticed: with this implementation, when a user cancels a request, the background stream continues running to collect usage data. This differs from the original behavior where streams would be immediately terminated. Should we keep this new behavior for better usage tracking, or would it be better to add a check to stop the background task immediately when |
If it is decided to abort the stream I do feel there should be a visual indication that the cost figure is no longer reliable. Same goes for the timeout. (I can make a PR for this.) |
…dToFindAllUsage - Merged if (usageFound) and else if blocks that both called captureUsageData with identical parameters into a single condition - Eliminates code duplication and improves maintainability
91cf43b to
e8a6ac5
Compare
|
@chrarnoldus Let me know what you think! |
|
Ok, it's just that not every provider supports cancellation so costs could rack up anyway. |
|
@daniel-lxs can you consider taking this if we make the timeout very small? Then it shouldn't impact cost much. I can make a PR for the visual indicator afterwards. |
|
@chrarnoldus I can give this the green light and let the team decide if this is something they want, can you take a look at the failing CI workflow? |
|
@daniel-lxs I fixed the initial failure, I don't think the new one is related to the changes in this PR. |
Kilo Code PR: Kilo-Org/kilocode#1447
Description
Roo Code often underreports the total cost of a task, because the cost of some requests are missing. For example:
The first $0.00677 request is missing its cost in the UI, while requests 2-4 are not. Also the total cost (should be $0.0816) is off.
This is because a request is abandoned before the usage block has been processed when an LLM tries to use multiple tools (only one is allowed in the rules):
Roo-Code/src/core/task/Task.ts
Line 1430 in 714fafd
(this code is inherited from Cline)
Test Procedure
Run tasks with a variety of providers and models and verify all requests have a cost reported. Kimi K2 seems to be especially affected. Sonnet 4 sometimes, Gemini 2.5 Pro not so much.
I did test this with OpenRouter and costs are reported more reliably than before. Also tested with Anthropic and Ollama for regressions.
Pre-Submission Checklist
Screenshots / Videos
In this screenshot all costs are reported and the tally is correct ($0.0591):
Additional Notes
I'll add some comments to the code.
Get in Touch
Christiaan in shared Slack
Important
Fixes underreporting of token usage and cost in
Task.tsby processing all request chunks and updating usage data even if requests are interrupted.Taskclass inTask.tsby ensuring all request chunks are processed.drainStreamInBackgroundToFindAllUsage()to process remaining chunks and update usage data even if a request is interrupted.recursivelyMakeClineRequests()to handle incomplete requests by processing all chunks in the background.updateApiReqMsg()to include all token and cost data.TelemetryServicecall for capturing LLM completion in the main loop, now handled in the background task.This description was created by
for 60f0a03. You can customize this summary. It will automatically update as commits are pushed.