Add gpt-5-3-codex model to resolve_model_config.py by juanmichelini · Pull Request #2292 · OpenHands/software-agent-sdk

juanmichelini · 2026-03-03T21:21:10Z

Summary

Adds the gpt-5-3-codex model to resolve_model_config.py for evaluation purposes.

This PR follows the guidelines in .github/run-eval/ADDINGMODEL.md by:

Only adding new content - No existing code was modified
Adding a test_model alias for backward compatibility instead of modifying existing test imports
Not modifying any existing test assertions or mock configurations
Adding only the necessary model configuration and test

Changes

Added gpt-5-3-codex to the MODELS dictionary in resolve_model_config.py
Added test_gpt_5_3_codex_config() test function in tests/github_workflows/test_resolve_model_config.py
Added test_model = check_model alias for backward compatibility with existing test imports

Configuration

Model ID: gpt-5-3-codex
Display Name: GPT-5.3 Codex
Model Path: litellm_proxy/gpt-5-3-codex
Special Parameters: None (uses default configuration)
Temperature: Not specified (uses provider default)

Why This Approach?

The existing branch openhands/add-gpt-5-3-codex violated the ADDINGMODEL.md guidelines by:

Changing test_model to check_model in imports (explicitly forbidden in line 197)
Modifying existing test assertions (explicitly forbidden in lines 11-13)
Renaming test classes from TestTestModel to TestCheckModel (modifying existing code)

This PR takes a cleaner approach by:

Adding a test_model alias to maintain backward compatibility
Not modifying any existing tests or assertions
Only adding new content as required by the guidelines

Additional Notes

Variant Detection: gpt-5.3-codex is already included in the GPT-5 codex variant patterns in model_prompt_spec.py (line 2)
Model Features: Pattern matching for "gpt-5" in model_features.py already covers this model, so no explicit additions needed

Test Results

$ uv run pytest tests/github_workflows/test_resolve_model_config.py::test_gpt_5_3_codex_config -v
tests/github_workflows/test_resolve_model_config.py::test_gpt_5_3_codex_config PASSED [100%]

$ uv run pytest tests/github_workflows/test_resolve_model_config.py::test_all_models_valid_with_pydantic -v
tests/github_workflows/test_resolve_model_config.py::test_all_models_valid_with_pydantic PASSED [100%]

Checklist

If the PR is changing/adding functionality, are there tests to reflect this? ✅ Added test_gpt_5_3_codex_config()
If there is an example, have you run the example to make sure that it works? N/A - No examples affected
If there are instructions on how to run the code, have you followed the instructions and made sure that it works? ✅ Ran manual verification with resolve_model_config.py
If the feature is significant enough to require documentation, is there a PR open on the OpenHands/docs repository with the same branch name? N/A - Internal model configuration
Is the github CI passing? ⏳ Waiting for CI

Fixes #2289

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Architectures	Base Image	Docs / Tags
java	amd64, arm64	`eclipse-temurin:17-jdk`	Link
python	amd64, arm64	`nikolaik/python-nodejs:python3.12-nodejs22`	Link
golang	amd64, arm64	`golang:1.21-bookworm`	Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:d872b05-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-d872b05-python \
  ghcr.io/openhands/agent-server:d872b05-python

All tags pushed for this build

ghcr.io/openhands/agent-server:d872b05-golang-amd64
ghcr.io/openhands/agent-server:d872b05-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:d872b05-golang-arm64
ghcr.io/openhands/agent-server:d872b05-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:d872b05-java-amd64
ghcr.io/openhands/agent-server:d872b05-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:d872b05-java-arm64
ghcr.io/openhands/agent-server:d872b05-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:d872b05-python-amd64
ghcr.io/openhands/agent-server:d872b05-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:d872b05-python-arm64
ghcr.io/openhands/agent-server:d872b05-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:d872b05-golang
ghcr.io/openhands/agent-server:d872b05-java
ghcr.io/openhands/agent-server:d872b05-python

About Multi-Architecture Support

Each variant tag (e.g., d872b05-python) is a multi-arch manifest supporting both amd64 and arm64
Docker automatically pulls the correct architecture for your platform
Individual architecture tags (e.g., d872b05-python-amd64) are also available if needed

- Added gpt-5-3-codex model configuration to MODELS dictionary - Added test_gpt_5_3_codex_config() test function - Added test_model alias for backward compatibility with tests Co-authored-by: openhands <openhands@all-hands.dev>

github-actions · 2026-03-03T21:21:38Z

API breakage checks (Griffe)

Result: Failed

Log excerpt (first 1000 characters)


============================================================
Checking openhands-sdk (openhands.sdk)
============================================================
Comparing openhands-sdk 1.11.5 against 1.11.4
::notice title=openhands-sdk API::Ignoring Field metadata-only change (non-breaking): load_public_skills
::notice title=openhands-sdk API::Ignoring Field metadata-only change (non-breaking): temperature
::warning file=openhands-sdk/openhands/sdk/llm/llm.py,line=196,title=LLM.top_p::Attribute value was changed: `Field(default=1.0, ge=0, le=1)` -> `Field(default=None, ge=0, le=1, description='Nucleus sampling parameter. Defaults to None (uses provider default). Set to a value between 0 and 1 to control diversity of outputs.')`
::error title=SemVer::Breaking changes detected (1); require at least minor version bump from 1.11.x, but new is 1.11.5

============================================================
Checking openhands-workspace (openhands.workspace)
============================

Action log

github-actions · 2026-03-03T21:22:00Z

Agent server REST API breakage checks (OpenAPI)

Result: Passed

Action log

all-hands-bot

🟢 Good taste - Clean addition following existing patterns.

Assessment:

Model config follows the exact pattern of gpt-5.2-codex - no surprises
The test_model = check_model alias is correctly preserving backward compatibility with existing test imports (used in TestTestModel class)
Test coverage mirrors other model tests - sufficient for a config entry

Verdict: ✅ LGTM - straightforward config addition with proper backward compatibility handling.

github-actions · 2026-03-03T23:38:45Z

🧪 Integration Tests Results

Overall Success Rate: 0.0%
Total Cost: $0.00
Models Tested: 1
Timestamp: 2026-03-03 23:38:36 UTC

📊 Summary

Model	Overall	Tests Passed	Skipped	Total	Cost	Tokens
litellm_proxy_gpt_5_3_codex	0.0%	0/16	2	18	$0.00	0

📋 Detailed Results

litellm_proxy_gpt_5_3_codex

Success Rate: 0.0% (0/16)
Total Cost: $0.00
Token Usage: 0
Run Suffix: litellm_proxy_gpt_5_3_codex_10ccfad_gpt_5_3_codex_run_N18_20260303_233749
Skipped Tests: 2

Skipped Tests:

t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.
c01_thinking_block_condenser: Model litellm_proxy/gpt-5-3-codex does not support extended thinking or reasoning effort

Failed Tests:

t01_fix_simple_typo: Test execution failed: Conversation run failed for id=113db401-5498-41c1-a54b-b3054dea8dc5: litellm.BadRequestError: {"error":{"message":"litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=gpt-5-3-codex\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]","type":null,"param":null,"code":"400"}} (Cost: $0.00)
c02_hard_context_reset: Test execution failed: Conversation run failed for id=f1d6042e-dc63-4a2d-8cb7-9ef2180c8230: litellm.BadRequestError: {"error":{"message":"litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=gpt-5-3-codex\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]","type":null,"param":null,"code":"400"}} (Cost: $0.00)
t07_interactive_commands: Test execution failed: Conversation run failed for id=70f0bb12-1cff-4880-88ea-e3839b7cfd6d: litellm.BadRequestError: {"error":{"message":"litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=gpt-5-3-codex\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]","type":null,"param":null,"code":"400"}} (Cost: $0.00)
t06_github_pr_browsing: Test execution failed: Conversation run failed for id=79888d24-e91f-4aa5-91f6-5d9fdd7a0ebd: litellm.BadRequestError: {"error":{"message":"litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=gpt-5-3-codex\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]","type":null,"param":null,"code":"400"}} (Cost: $0.00)
c05_size_condenser: Test execution failed: Conversation run failed for id=bace673e-94ef-477b-8b83-dac4ad8ea67d: litellm.BadRequestError: {"error":{"message":"litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=gpt-5-3-codex\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]","type":null,"param":null,"code":"400"}} (Cost: $0.00)
b04_each_tool_call_has_a_concise_explanation: Test execution failed: Conversation run failed for id=40d8e4e2-202c-4b8b-8910-fdcb47761e4a: litellm.BadRequestError: {"error":{"message":"litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=gpt-5-3-codex\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]","type":null,"param":null,"code":"400"}} (Cost: $0.00)
t04_git_staging: Test execution failed: Conversation run failed for id=f73d14fb-550d-4805-a56c-a4176d22a258: litellm.BadRequestError: {"error":{"message":"litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=gpt-5-3-codex\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]","type":null,"param":null,"code":"400"}} (Cost: $0.00)
t03_jupyter_write_file: Test execution failed: Conversation run failed for id=9fddad9d-d0b3-4bb7-8a34-b32d1becd31e: litellm.BadRequestError: {"error":{"message":"litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=gpt-5-3-codex\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]","type":null,"param":null,"code":"400"}} (Cost: $0.00)
c03_delayed_condensation: Test execution failed: Conversation run failed for id=a5bfb73a-bebf-492e-adb7-5f5731259d85: litellm.BadRequestError: {"error":{"message":"litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=gpt-5-3-codex\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]","type":null,"param":null,"code":"400"}} (Cost: $0.00)
t05_simple_browsing: Test execution failed: Conversation run failed for id=16726033-fd7d-416d-91b4-0c288b8c54e3: litellm.BadRequestError: {"error":{"message":"litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=gpt-5-3-codex\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]","type":null,"param":null,"code":"400"}} (Cost: $0.00)
b02_no_oververification: Test execution failed: Conversation run failed for id=b35c94af-ec83-4151-90ca-8e63f45a668e: litellm.BadRequestError: {"error":{"message":"litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=gpt-5-3-codex\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]","type":null,"param":null,"code":"400"}} (Cost: $0.00)
t02_add_bash_hello: Test execution failed: Conversation run failed for id=39ca1ff3-495d-4745-b2c2-d9fb5b75e76b: litellm.BadRequestError: {"error":{"message":"litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=gpt-5-3-codex\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]","type":null,"param":null,"code":"400"}} (Cost: $0.00)
c04_token_condenser: Test execution failed: Conversation run failed for id=a231c2ec-80b4-49e7-9192-7f5f7981739a: litellm.BadRequestError: {"error":{"message":"litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=gpt-5-3-codex\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]","type":null,"param":null,"code":"400"}} (Cost: $0.00)
b01_no_premature_implementation: Test execution failed: Conversation run failed for id=18bec4f5-03ab-4961-84a7-08dfd9608359: litellm.BadRequestError: {"error":{"message":"litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=gpt-5-3-codex\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]","type":null,"param":null,"code":"400"}} (Cost: $0.00)
b03_no_useless_backward_compatibility: Test execution failed: Conversation run failed for id=01e3b8d7-d75b-4864-af31-a0b07917886d: litellm.BadRequestError: {"error":{"message":"litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=gpt-5-3-codex\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]","type":null,"param":null,"code":"400"}} (Cost: $0.00)
b05_do_not_create_redundant_files: Test execution failed: Conversation run failed for id=5187f133-03cb-457f-bd35-c6f0f50b5940: litellm.BadRequestError: {"error":{"message":"litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=gpt-5-3-codex\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: You passed in model=gpt-5-3-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5-3-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]","type":null,"param":null,"code":"400"}} (Cost: $0.00)

Cherry-pick from upstream 10ccfad

Add gpt-5-3-codex model to resolve_model_config.py

6ffabfb

- Added gpt-5-3-codex model configuration to MODELS dictionary - Added test_gpt_5_3_codex_config() test function - Added test_model alias for backward compatibility with tests Co-authored-by: openhands <openhands@all-hands.dev>

openhands-ai bot mentioned this pull request Mar 3, 2026

Add gpt-5-3-codex to resolve_model_config.py #2289

Closed

all-hands-bot approved these changes Mar 3, 2026

View reviewed changes

Merge branch 'main' into openhands/add-gpt-5-3-codex-v2

7551887

juanmichelini enabled auto-merge (squash) March 3, 2026 23:19

juanmichelini merged commit 10ccfad into main Mar 3, 2026
21 checks passed

juanmichelini deleted the openhands/add-gpt-5-3-codex-v2 branch March 3, 2026 23:22

zparnold added a commit to zparnold/software-agent-sdk that referenced this pull request Mar 5, 2026

Add gpt-5-3-codex model to resolve_model_config.py (OpenHands#2292)

45540b0

Cherry-pick from upstream 10ccfad

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add gpt-5-3-codex model to resolve_model_config.py#2292

Add gpt-5-3-codex model to resolve_model_config.py#2292
juanmichelini merged 2 commits intomainfrom
openhands/add-gpt-5-3-codex-v2

juanmichelini commented Mar 3, 2026 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Mar 3, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 3, 2026 •

edited

Loading

Uh oh!

all-hands-bot left a comment

Uh oh!

Uh oh!

github-actions bot commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

juanmichelini commented Mar 3, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Configuration

Why This Approach?

Additional Notes

Test Results

Checklist

Uh oh!

github-actions bot commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

API breakage checks (Griffe)

Uh oh!

github-actions bot commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Agent server REST API breakage checks (OpenAPI)

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Mar 3, 2026

🧪 Integration Tests Results

📊 Summary

📋 Detailed Results

litellm_proxy_gpt_5_3_codex

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

juanmichelini commented Mar 3, 2026 •

edited by github-actions bot

Loading

github-actions bot commented Mar 3, 2026 •

edited

Loading

github-actions bot commented Mar 3, 2026 •

edited

Loading