Skip to content

Add gpt-5-3-codex model to resolve_model_config.py#2292

Merged
juanmichelini merged 2 commits intomainfrom
openhands/add-gpt-5-3-codex-v2
Mar 3, 2026
Merged

Add gpt-5-3-codex model to resolve_model_config.py#2292
juanmichelini merged 2 commits intomainfrom
openhands/add-gpt-5-3-codex-v2

Conversation

@juanmichelini
Copy link
Copy Markdown
Collaborator

@juanmichelini juanmichelini commented Mar 3, 2026

Summary

Adds the gpt-5-3-codex model to resolve_model_config.py for evaluation purposes.

This PR follows the guidelines in .github/run-eval/ADDINGMODEL.md by:

  • Only adding new content - No existing code was modified
  • Adding a test_model alias for backward compatibility instead of modifying existing test imports
  • Not modifying any existing test assertions or mock configurations
  • Adding only the necessary model configuration and test

Changes

  • Added gpt-5-3-codex to the MODELS dictionary in resolve_model_config.py
  • Added test_gpt_5_3_codex_config() test function in tests/github_workflows/test_resolve_model_config.py
  • Added test_model = check_model alias for backward compatibility with existing test imports

Configuration

  • Model ID: gpt-5-3-codex
  • Display Name: GPT-5.3 Codex
  • Model Path: litellm_proxy/gpt-5-3-codex
  • Special Parameters: None (uses default configuration)
  • Temperature: Not specified (uses provider default)

Why This Approach?

The existing branch openhands/add-gpt-5-3-codex violated the ADDINGMODEL.md guidelines by:

  1. Changing test_model to check_model in imports (explicitly forbidden in line 197)
  2. Modifying existing test assertions (explicitly forbidden in lines 11-13)
  3. Renaming test classes from TestTestModel to TestCheckModel (modifying existing code)

This PR takes a cleaner approach by:

  1. Adding a test_model alias to maintain backward compatibility
  2. Not modifying any existing tests or assertions
  3. Only adding new content as required by the guidelines

Additional Notes

  • Variant Detection: gpt-5.3-codex is already included in the GPT-5 codex variant patterns in model_prompt_spec.py (line 2)
  • Model Features: Pattern matching for "gpt-5" in model_features.py already covers this model, so no explicit additions needed

Test Results

$ uv run pytest tests/github_workflows/test_resolve_model_config.py::test_gpt_5_3_codex_config -v
tests/github_workflows/test_resolve_model_config.py::test_gpt_5_3_codex_config PASSED [100%]

$ uv run pytest tests/github_workflows/test_resolve_model_config.py::test_all_models_valid_with_pydantic -v
tests/github_workflows/test_resolve_model_config.py::test_all_models_valid_with_pydantic PASSED [100%]

Checklist

  • If the PR is changing/adding functionality, are there tests to reflect this? ✅ Added test_gpt_5_3_codex_config()
  • If there is an example, have you run the example to make sure that it works? N/A - No examples affected
  • If there are instructions on how to run the code, have you followed the instructions and made sure that it works? ✅ Ran manual verification with resolve_model_config.py
  • If the feature is significant enough to require documentation, is there a PR open on the OpenHands/docs repository with the same branch name? N/A - Internal model configuration
  • Is the github CI passing? ⏳ Waiting for CI

Fixes #2289


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.12-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:d872b05-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-d872b05-python \
  ghcr.io/openhands/agent-server:d872b05-python

All tags pushed for this build

ghcr.io/openhands/agent-server:d872b05-golang-amd64
ghcr.io/openhands/agent-server:d872b05-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:d872b05-golang-arm64
ghcr.io/openhands/agent-server:d872b05-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:d872b05-java-amd64
ghcr.io/openhands/agent-server:d872b05-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:d872b05-java-arm64
ghcr.io/openhands/agent-server:d872b05-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:d872b05-python-amd64
ghcr.io/openhands/agent-server:d872b05-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:d872b05-python-arm64
ghcr.io/openhands/agent-server:d872b05-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:d872b05-golang
ghcr.io/openhands/agent-server:d872b05-java
ghcr.io/openhands/agent-server:d872b05-python

About Multi-Architecture Support

  • Each variant tag (e.g., d872b05-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., d872b05-python-amd64) are also available if needed

- Added gpt-5-3-codex model configuration to MODELS dictionary
- Added test_gpt_5_3_codex_config() test function
- Added test_model alias for backward compatibility with tests

Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 3, 2026

API breakage checks (Griffe)

Result: Failed

Log excerpt (first 1000 characters)

============================================================
Checking openhands-sdk (openhands.sdk)
============================================================
Comparing openhands-sdk 1.11.5 against 1.11.4
::notice title=openhands-sdk API::Ignoring Field metadata-only change (non-breaking): load_public_skills
::notice title=openhands-sdk API::Ignoring Field metadata-only change (non-breaking): temperature
::warning file=openhands-sdk/openhands/sdk/llm/llm.py,line=196,title=LLM.top_p::Attribute value was changed: `Field(default=1.0, ge=0, le=1)` -> `Field(default=None, ge=0, le=1, description='Nucleus sampling parameter. Defaults to None (uses provider default). Set to a value between 0 and 1 to control diversity of outputs.')`
::error title=SemVer::Breaking changes detected (1); require at least minor version bump from 1.11.x, but new is 1.11.5

============================================================
Checking openhands-workspace (openhands.workspace)
============================

Action log

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 3, 2026

Agent server REST API breakage checks (OpenAPI)

Result: Passed

Action log

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Good taste - Clean addition following existing patterns.

Assessment:

  • Model config follows the exact pattern of gpt-5.2-codex - no surprises
  • The test_model = check_model alias is correctly preserving backward compatibility with existing test imports (used in TestTestModel class)
  • Test coverage mirrors other model tests - sufficient for a config entry

Verdict: ✅ LGTM - straightforward config addition with proper backward compatibility handling.

@juanmichelini juanmichelini enabled auto-merge (squash) March 3, 2026 23:19
@juanmichelini juanmichelini merged commit 10ccfad into main Mar 3, 2026
21 checks passed
@juanmichelini juanmichelini deleted the openhands/add-gpt-5-3-codex-v2 branch March 3, 2026 23:22
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 3, 2026

🧪 Integration Tests Results

Overall Success Rate: 0.0%
Total Cost: $0.00
Models Tested: 1
Timestamp: 2026-03-03 23:38:36 UTC

📊 Summary

Model Overall Tests Passed Skipped Total Cost Tokens
litellm_proxy_gpt_5_3_codex 0.0% 0/16 2 18 $0.00 0

📋 Detailed Results

litellm_proxy_gpt_5_3_codex

  • Success Rate: 0.0% (0/16)
  • Total Cost: $0.00
  • Token Usage: 0
  • Run Suffix: litellm_proxy_gpt_5_3_codex_10ccfad_gpt_5_3_codex_run_N18_20260303_233749
  • Skipped Tests: 2

Skipped Tests:

  • t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.
  • c01_thinking_block_condenser: Model litellm_proxy/gpt-5-3-codex does not support extended thinking or reasoning effort

Failed Tests:

  • t01_fix_simple_typo: Test execution failed: Conversation run failed for id=113db401-5498-41c1-a54b-b3054dea8dc5: litellm.BadRequestError: {"error":{"message":"litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=gpt-5-3-codex\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]","type":null,"param":null,"code":"400"}} (Cost: $0.00)
  • c02_hard_context_reset: Test execution failed: Conversation run failed for id=f1d6042e-dc63-4a2d-8cb7-9ef2180c8230: litellm.BadRequestError: {"error":{"message":"litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=gpt-5-3-codex\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]","type":null,"param":null,"code":"400"}} (Cost: $0.00)
  • t07_interactive_commands: Test execution failed: Conversation run failed for id=70f0bb12-1cff-4880-88ea-e3839b7cfd6d: litellm.BadRequestError: {"error":{"message":"litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=gpt-5-3-codex\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]","type":null,"param":null,"code":"400"}} (Cost: $0.00)
  • t06_github_pr_browsing: Test execution failed: Conversation run failed for id=79888d24-e91f-4aa5-91f6-5d9fdd7a0ebd: litellm.BadRequestError: {"error":{"message":"litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=gpt-5-3-codex\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]","type":null,"param":null,"code":"400"}} (Cost: $0.00)
  • c05_size_condenser: Test execution failed: Conversation run failed for id=bace673e-94ef-477b-8b83-dac4ad8ea67d: litellm.BadRequestError: {"error":{"message":"litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=gpt-5-3-codex\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]","type":null,"param":null,"code":"400"}} (Cost: $0.00)
  • b04_each_tool_call_has_a_concise_explanation: Test execution failed: Conversation run failed for id=40d8e4e2-202c-4b8b-8910-fdcb47761e4a: litellm.BadRequestError: {"error":{"message":"litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=gpt-5-3-codex\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]","type":null,"param":null,"code":"400"}} (Cost: $0.00)
  • t04_git_staging: Test execution failed: Conversation run failed for id=f73d14fb-550d-4805-a56c-a4176d22a258: litellm.BadRequestError: {"error":{"message":"litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=gpt-5-3-codex\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]","type":null,"param":null,"code":"400"}} (Cost: $0.00)
  • t03_jupyter_write_file: Test execution failed: Conversation run failed for id=9fddad9d-d0b3-4bb7-8a34-b32d1becd31e: litellm.BadRequestError: {"error":{"message":"litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=gpt-5-3-codex\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]","type":null,"param":null,"code":"400"}} (Cost: $0.00)
  • c03_delayed_condensation: Test execution failed: Conversation run failed for id=a5bfb73a-bebf-492e-adb7-5f5731259d85: litellm.BadRequestError: {"error":{"message":"litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=gpt-5-3-codex\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]","type":null,"param":null,"code":"400"}} (Cost: $0.00)
  • t05_simple_browsing: Test execution failed: Conversation run failed for id=16726033-fd7d-416d-91b4-0c288b8c54e3: litellm.BadRequestError: {"error":{"message":"litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=gpt-5-3-codex\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]","type":null,"param":null,"code":"400"}} (Cost: $0.00)
  • b02_no_oververification: Test execution failed: Conversation run failed for id=b35c94af-ec83-4151-90ca-8e63f45a668e: litellm.BadRequestError: {"error":{"message":"litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=gpt-5-3-codex\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]","type":null,"param":null,"code":"400"}} (Cost: $0.00)
  • t02_add_bash_hello: Test execution failed: Conversation run failed for id=39ca1ff3-495d-4745-b2c2-d9fb5b75e76b: litellm.BadRequestError: {"error":{"message":"litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=gpt-5-3-codex\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]","type":null,"param":null,"code":"400"}} (Cost: $0.00)
  • c04_token_condenser: Test execution failed: Conversation run failed for id=a231c2ec-80b4-49e7-9192-7f5f7981739a: litellm.BadRequestError: {"error":{"message":"litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=gpt-5-3-codex\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]","type":null,"param":null,"code":"400"}} (Cost: $0.00)
  • b01_no_premature_implementation: Test execution failed: Conversation run failed for id=18bec4f5-03ab-4961-84a7-08dfd9608359: litellm.BadRequestError: {"error":{"message":"litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=gpt-5-3-codex\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]","type":null,"param":null,"code":"400"}} (Cost: $0.00)
  • b03_no_useless_backward_compatibility: Test execution failed: Conversation run failed for id=01e3b8d7-d75b-4864-af31-a0b07917886d: litellm.BadRequestError: {"error":{"message":"litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=gpt-5-3-codex\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]","type":null,"param":null,"code":"400"}} (Cost: $0.00)
  • b05_do_not_create_redundant_files: Test execution failed: Conversation run failed for id=5187f133-03cb-457f-bd35-c6f0f50b5940: litellm.BadRequestError: {"error":{"message":"litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=gpt-5-3-codex\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]","type":null,"param":null,"code":"400"}} (Cost: $0.00)

zparnold added a commit to zparnold/software-agent-sdk that referenced this pull request Mar 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add gpt-5-3-codex to resolve_model_config.py

3 participants