Skip to content

Conversation

@hannesrudolph
Copy link
Collaborator

@hannesrudolph hannesrudolph commented Aug 9, 2025

Summary

This PR implements comprehensive GPT-5 support including the Responses API, conversation continuity, and proper token management.

Major Changes

1. Complete GPT-5 Responses API Implementation

  • Full streaming support via OpenAI SDK with fallback to fetch-based SSE
  • Conversation continuity using previous_response_id for efficient multi-turn conversations
  • Automatic reasoning summaries with summary: "auto"
  • Support for all reasoning effort levels including new "minimal" level
  • Comprehensive event handling for 40+ different GPT-5 streaming event types
  • Response ID tracking and persistence

2. Token Management

  • Added explicit max_output_tokens parameter to prevent defaulting to excessive limits (e.g., 120k)
  • Uses Roo's calculated reserved output tokens via model.maxTokens
  • Applies 20% clamping rule for output tokens

3. Task Persistence Layer

  • Stores GPT-5 metadata per conversation turn:
    • previous_response_id for conversation continuity
    • Instructions used for each turn
    • Reasoning summaries
  • Enables efficient context management across long conversations

4. UI and Settings Updates

  • Updated thinking budget interface
  • Modified API options display
  • Added GPT-5-specific settings

5. Comprehensive Test Coverage

  • 950+ lines of new tests
  • Complete coverage of GPT-5 streaming events
  • Tests for both SDK and fallback SSE paths
  • Validation of max_output_tokens inclusion

6. Internationalization

  • Added GPT-5 related translations across all 18 supported locales

Files Changed

  • Core Implementation: src/api/providers/openai-native.ts (1000+ lines)
  • Tests: src/api/providers/__tests__/openai-native.spec.ts (950+ lines)
  • Task Persistence: src/core/task/Task.ts
  • Type Definitions: packages/types/src/ files
  • UI Components: webview-ui/src/components/settings/
  • Translations: All locale files (18 languages)

Testing

✅ All 40 tests in openai-native.spec.ts passing
✅ Verified GPT-5 streaming with all event types
✅ Confirmed max_output_tokens properly included
✅ Tested conversation continuity with previous_response_id
✅ Validated fallback from SDK to SSE

Impact

This is a significant feature addition that enables full GPT-5 support with proper token management, conversation continuity, and comprehensive error handling.

- Added max_output_tokens parameter to GPT-5 request body using model.maxTokens
- This prevents GPT-5 from defaulting to very large token limits (e.g., 120k)
- Updated tests to expect max_output_tokens in GPT-5 request bodies
- Fixed test for handling unhandled stream events by properly mocking SDK fallback
Copilot AI review requested due to automatic review settings August 9, 2025 01:00
@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. bug Something isn't working labels Aug 9, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a new minimal reasoning effort level for GPT-5 models and updates all internationalization files to include translations for this new option. The minimal effort provides the fastest time-to-first-token response from GPT-5 models.

  • Adds minimal reasoning effort support for GPT-5 models with internationalization
  • Updates UI components to conditionally show the minimal option for GPT-5 models
  • Modifies reasoning transformation logic to handle the new minimal effort level

Reviewed Changes

Copilot reviewed 31 out of 31 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
webview-ui/src/i18n/locales/*/settings.json Adds "minimal (fastest)" translation entries across 16 locales
webview-ui/src/components/settings/ThinkingBudget.tsx Adds GPT-5 model detection and conditional minimal reasoning effort option
webview-ui/src/components/settings/ApiOptions.tsx Clears reasoning effort when switching models and shows verbosity only for GPT-5
src/shared/api.ts Adds enableGpt5ReasoningSummary option to ApiHandlerOptions
src/core/task/Task.ts Implements GPT-5 conversation continuity with previous_response_id metadata
src/api/transform/reasoning.ts Updates reasoning transformations to exclude minimal effort from API calls
src/api/transform/model-params.ts Updates type definitions for ReasoningEffortWithMinimal
src/api/providers/requesty.ts Filters out minimal effort from API requests
src/api/providers/openai.ts Updates type casting for reasoning effort parameters
src/api/providers/openai-native.ts Major refactor for GPT-5 Responses API support with conversation continuity
src/api/providers/tests/openai-native.spec.ts Comprehensive test updates for GPT-5 Responses API behavior
src/api/index.ts Adds previousResponseId to metadata interface
packages/types/src/providers/openai.ts Adds GPT-5 model default reasoning effort and temperature
packages/types/src/provider-settings.ts Defines ReasoningEffortWithMinimal type and schema
packages/types/src/message.ts Adds GPT-5 metadata schema for conversation continuity

@hannesrudolph hannesrudolph changed the title fix: add explicit max_output_tokens for GPT-5 Responses API GPT5 OpenAI Fix Aug 9, 2025
@hannesrudolph hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Aug 9, 2025
Copy link
Contributor

@roomote roomote bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution! I've reviewed the changes and found that the PR successfully addresses the stated problem of adding explicit max_output_tokens for GPT-5 requests. The implementation goes beyond the initial scope by also adding minimal reasoning effort support and conversation continuity features.

Positive observations:

  • Excellent test coverage with 40+ tests including comprehensive GPT-5 functionality
  • Good separation of concerns with dedicated methods for GPT-5 handling
  • Proper implementation of the max_output_tokens parameter as intended
  • Comprehensive handling of various GPT-5 response formats

I've left some suggestions inline that could improve code maintainability and reduce duplication.

- Renamed metadata field from 'previous_response_id' to 'response_id' for clarity
- Fixed logic to correctly use the response_id from the previous message as previous_response_id for the next request
- This resolves the 'Previous response with id not found' errors that occurred after multiple turns in the same session
- Automatically retry without previous_response_id when it's not found (400 error)
- Clear stored lastResponseId to prevent reusing stale IDs
- Handle errors in both SDK and SSE fallback paths
- Log warnings when retrying to help with debugging
- Add promise-based synchronization for response ID persistence
- Wait for pending response ID from previous request before using it
- Resolve promise when response ID is received or cleared
- Add 100ms timeout to avoid blocking too long on ID resolution
- Properly clean up resolver on errors to prevent memory leaks

This fixes the race condition where fast nano model responses could cause
the next request to be initiated before the response ID was fully persisted.
Copy link
Collaborator Author

@hannesrudolph hannesrudolph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good Job ROO!

- Extract usage normalization helper to reduce duplication
- Suppress conversation continuity for first message (but respect explicit metadata)
- Deduplicate response ID resolver logic
- Remove dead enableGpt5ReasoningSummary option references
- DRY up GPT-5 event/usage handling with normalizeGpt5Usage helper
- Centralize default GPT-5 reasoning effort using model info
- Fix Indonesian locale minimal string misplacement
- Add clarifying comments for Developer prefix usage
- Add TODO for future verbosity UI capability gating
- Fix failing test in reasoning.spec.ts
…ndard GPT-5 SSE event types to shared processor to reduce duplication\n- Add JSDoc for response ID accessors\n- Standardize key error messages for GPT-5 Responses API fallback\n- Extract persistGpt5Metadata() in Task to simplify metadata writes\n- Add malformed JSON SSE parsing test\n
…tOpenAI incl. cache); enforce 'skip once' continuity via suppressPreviousResponseId; dedupe responseId resolver on SSE 400; feat: gate reasoning.summary by enableGpt5ReasoningSummary; centralize default reasoning effort; types/ui: add ModelInfo.supportsVerbosity and gate Verbosity UI by capability; refactor: avoid duplicate usage emission in SSE done/completed
…d align enableGpt5ReasoningSummary default docs
@daniel-lxs daniel-lxs moved this from Triage to PR [Needs Review] in Roo Code Roadmap Aug 9, 2025
Copy link
Member

@daniel-lxs daniel-lxs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Aug 9, 2025
@hannesrudolph hannesrudolph removed the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Aug 9, 2025
@mrubens mrubens merged commit cda67a8 into main Aug 9, 2025
16 checks passed
@mrubens mrubens deleted the fix/gpt5-max-output-tokens branch August 9, 2025 18:52
@github-project-automation github-project-automation bot moved this from PR [Needs Review] to Done in Roo Code Roadmap Aug 9, 2025
@github-project-automation github-project-automation bot moved this from New to Done in Roo Code Roadmap Aug 9, 2025
hannesrudolph added a commit that referenced this pull request Aug 11, 2025
Removes unused translation strings introduced in PR #6864 that are no longer needed after PR #6921 refactoring.

Keys removed: includeMaxOutputTokens, includeMaxOutputTokensDescription, maxOutputTokensLabel, maxTokensGenerateDescription
daniel-lxs added a commit that referenced this pull request Aug 11, 2025
daniel-lxs added a commit that referenced this pull request Aug 13, 2025
- Fix manual condensing bug by setting skipPrevResponseIdOnce flag in Task.ts
- Revert to string-based format for full conversations after condensing
- Add proper image support with structured format when using previous_response_id
- Update test suite to match new implementation where all models use Responses API
- Handle 400 errors gracefully when previous_response_id is not found

Fixes issues introduced in PR #6864
daniel-lxs added a commit that referenced this pull request Aug 18, 2025
- Fix manual condensing bug by setting skipPrevResponseIdOnce flag in Task.ts
- Revert to string-based format for full conversations after condensing
- Add proper image support with structured format when using previous_response_id
- Update test suite to match new implementation where all models use Responses API
- Handle 400 errors gracefully when previous_response_id is not found

Fixes issues introduced in PR #6864
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working lgtm This PR has been approved by a maintainer PR - Needs Review size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

4 participants