Skip to content

fix: add support for max_completion_tokens parameter in newer OpenAI models#35

Open
caffeinum wants to merge 6 commits intofuergaosi233:mainfrom
caffeinum:fix/gpt5-max-completion-tokens
Open

fix: add support for max_completion_tokens parameter in newer OpenAI models#35
caffeinum wants to merge 6 commits intofuergaosi233:mainfrom
caffeinum:fix/gpt5-max-completion-tokens

Conversation

@caffeinum
Copy link

@caffeinum caffeinum commented Aug 8, 2025

✨ sorry this all is AI-generated but the fix works for me ✨

Summary

  • Add support for OpenAI's newer reasoning models (o1, o3, o4) and GPT-5
  • Fix 400 errors caused by using deprecated max_tokens parameter
  • Properly detect and use max_completion_tokens for compatible models

Changes

  • Modified src/conversion/request_converter.py to detect newer model prefixes
  • Added conditional logic to use max_completion_tokens instead of max_tokens for:
    • o1 models (reasoning models)
    • o3 models (latest reasoning models)
    • o4 models
    • GPT-5 models (including gpt5 variant)
  • Maintained backward compatibility with older models using max_tokens

Problem Solved

The proxy was failing with 400 errors when used with newer OpenAI models:

Error code: 400 - {'error': {'message': "Unsupported parameter: 'max_tokens' is not supported with this model. Use 'max_completion_tokens' instead."

Test Plan

  • Test with GPT-5 models to ensure max_completion_tokens is used
  • Test with older GPT models to ensure max_tokens still works
  • Verify no 400 errors occur with reasoning models
  • Check that token limits are properly applied

🤖 Generated with yolocode

caffeinum and others added 6 commits August 7, 2025 21:47
…models

- Add detection for o1, o3, o4, and gpt-5 model prefixes
- Use max_completion_tokens instead of max_tokens for these models
- Fixes compatibility with OpenAI's newer reasoning models
- Resolves 400 errors about unsupported max_tokens parameter

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Force temperature=1 for o1/o3/o4/gpt-5 models (they don't support other values)
- Add warning when temperature is adjusted
- Add logging for streaming attempts with reasoning models
- Improve model detection with is_reasoning_model flag

These models have specific API restrictions that differ from traditional GPT models.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
… errors

- Automatically disable streaming for o1/o3/o4/gpt-5 models
- Prevents "organization must be verified" errors
- Logs warning when streaming is disabled
- Can be reverted once org is verified on OpenAI platform

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
ensure o1/o3/o4/gpt-5/gpt5 models use temp=1 and log coercion to comply with provider restrictions
Copy link

@Joonel Joonel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix actually does work, thank you @caffeinum

@TimDaub
Copy link

TimDaub commented Aug 10, 2025

I tried this and it seems to work

aaaronmiller pushed a commit to aaaronmiller/claude-code-proxy that referenced this pull request Nov 19, 2025
…ion, passthrough mode, Docker improvements

This commit implements changes from PRs fuergaosi233#35, fuergaosi233#24, fuergaosi233#39, fuergaosi233#37, and fuergaosi233#32:

## PR fuergaosi233#35 + fuergaosi233#24: OpenAI Newer Models Support (o1, o3, o4, gpt-5)
- Add `is_newer_openai_model()` method to ModelManager to detect o1, o3, o4, gpt-5
- Add `is_o3_model()` method for specific o3 model detection
- Use `max_completion_tokens` instead of `max_tokens` for newer models
- Enforce `temperature=1` for newer OpenAI reasoning models
- Update test-connection endpoint to handle newer models correctly
- Fix compatibility with OpenAI's newer model API requirements

## PR fuergaosi233#39: Enhanced Conversion & Error Handling
- Add comprehensive input validation to request converter
- Validate Claude request structure before processing
- Add enhanced error handling in response converter
- Validate OpenAI response structure with detailed error messages
- Add prompt cache support for non-streaming responses
- Extract and include `cache_read_input_tokens` from prompt_tokens_details
- Improve error messages for debugging

## PR fuergaosi233#37: Multi-Tenant Passthrough Mode
- Add passthrough mode support for multi-tenant deployments
- Make OPENAI_API_KEY optional - enable passthrough mode when not set
- Add per-request API key support via `openai-api-key` header
- Update OpenAIClient to accept per-request API keys
- Add API key format validation (sk- prefix, minimum length)
- Update endpoints to extract and validate OpenAI API keys from headers
- Support both proxy mode (server key) and passthrough mode (user keys)
- Enable separate billing per tenant in passthrough mode

## PR fuergaosi233#32: Docker Improvements
- Add comprehensive .dockerignore file
- Exclude development files, tests, and sensitive data from Docker builds
- Reduce Docker image size by excluding unnecessary files
- Improve build performance with proper ignore patterns

## Additional Improvements
- Better error messages and logging throughout
- Enhanced API key validation
- Improved request/response handling

All changes maintain backward compatibility with existing deployments.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants