Skip to content

Conversation

@gn00295120
Copy link
Contributor

Summary

This PR fixes issue #1973 where usage tracking was completely lost when streaming fails mid-request.

Problem

When streaming fails (API errors, connection drops, context window exceeded, etc.), usage tracking was completely lost. The issue occurs because usage is only accumulated when ResponseCompletedEvent arrives. If the model provider raises an exception before yielding ResponseCompletedEvent, the async for loop exits without ever updating usage.

Why This Happens

In src/agents/run.py line 1254-1272, usage is only tracked when we receive ResponseCompletedEvent:

if isinstance(event, ResponseCompletedEvent):
    usage = Usage(...)
    context_wrapper.usage.add(usage)

If an error occurs before this event (e.g., connection drops, rate limits, context window exceeded), the loop exits and no usage is recorded at all.

Solution

Wrap the streaming loop in try-except to ensure at least the request count is tracked when streaming fails:

try:
    async for event in model.stream_response(...):
        # existing logic
except Exception:
    if final_response is None:
        context_wrapper.usage.add(Usage(requests=1))
    raise

This approach:

  • ✅ Tracks that a request was made (important for monitoring and cost estimation)
  • ✅ Doesn't introduce new dependencies (no tiktoken needed)
  • ✅ Preserves existing behavior for successful streaming
  • ✅ Accurately reflects unavailable data (token counts stay at 0)

While we cannot estimate token counts without introducing dependencies or making model-specific assumptions, tracking the request count provides valuable information for:

  • Monitoring API call frequency
  • Debugging failed requests
  • Cost estimation (users know a request was made)

Changes

src/agents/run.py:

  • Added try-except around streaming loop (lines 1237-1310)
  • Tracks request count on exception if no response received
  • Preserves existing behavior for successful streaming

tests/test_usage_tracking_on_error.py (new file):

  • Test request tracking on streaming error
  • Test normal usage tracking still works
  • Test multi-turn scenarios with error

Testing

All tests pass:

pytest tests/test_usage_tracking_on_error.py -v
# 3 passed

pytest tests/test_cancel_streaming.py tests/test_agent_runner_streamed.py -v  
# 29 passed

Fixes #1973

…ai#1973)

## Problem

When streaming fails (API errors, connection drops, context window exceeded, etc.),
usage tracking was completely lost. The issue occurs because usage is only
accumulated when ResponseCompletedEvent arrives. If the model provider raises
an exception before yielding ResponseCompletedEvent, the async for loop exits
without ever updating usage.

## Solution

Wrap the streaming loop in try-except to ensure at least the request count
is tracked when streaming fails. While we cannot estimate token counts without
introducing dependencies (like tiktoken) or making assumptions about the model,
tracking the request count provides valuable information for:

- Monitoring API call frequency
- Debugging failed requests
- Cost estimation (users know a request was made)

Token counts remain at 0 when streaming fails before ResponseCompletedEvent,
which accurately reflects that we don't have that information.

## Changes

- src/agents/run.py: Added try-except around streaming loop
  - Tracks request count (Usage(requests=1)) on exception if no response received
  - Preserves existing behavior for successful streaming

- tests/test_usage_tracking_on_error.py: Comprehensive test coverage
  - Test request tracking on streaming error
  - Test normal usage tracking still works
  - Test multi-turn scenarios with error

Fixes openai#1973
Copilot AI review requested due to automatic review settings October 22, 2025 15:50
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes issue #1973 by ensuring request counts are tracked even when streaming fails mid-request, preventing complete loss of usage tracking data.

Key Changes:

  • Wrapped the streaming loop in try-except to track request counts when exceptions occur before ResponseCompletedEvent
  • Added comprehensive test coverage for error scenarios including single-turn failures, successful streaming, and multi-turn scenarios with errors

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
src/agents/run.py Added try-except wrapper around streaming loop to track request count on failure before re-raising exception
tests/test_usage_tracking_on_error.py New test file covering request tracking on streaming errors, preserved success behavior, and multi-turn error scenarios

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@gn00295120
Copy link
Contributor Author

@codex review

@chatgpt-codex-connector
Copy link

Codex Review: Didn't find any major issues. 🚀

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@seratch
Copy link
Member

seratch commented Oct 22, 2025

Thanks but I don't think this is a right solution.

@seratch seratch closed this Oct 22, 2025
@gn00295120 gn00295120 deleted the fix-1973-usage-tracking branch October 23, 2025 00:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Usage tracking lost when streaming fails mid-request

2 participants