|
4 | 4 |
|
5 | 5 | This file (`resolve_model_config.py`) defines models available for evaluation. Models must be added here before they can be used in integration tests or evaluations. |
6 | 6 |
|
| 7 | +## Critical Rules |
| 8 | + |
| 9 | +**ONLY ADD NEW CONTENT - DO NOT MODIFY EXISTING CODE** |
| 10 | + |
| 11 | +### What NOT to Do |
| 12 | + |
| 13 | +1. **Never modify existing model entries** - they are production code, already working |
| 14 | +2. **Never modify existing tests** - especially test assertions, mock configs, or expected values |
| 15 | +3. **Never reformat existing code** - preserve exact spacing, quotes, commas, formatting |
| 16 | +4. **Never reorder models or imports** - dictionary and import order must be preserved |
| 17 | +5. **Never "fix" existing code** - if it's in the file and tests pass, it works |
| 18 | +6. **Never change test assertions** - even if they "look wrong" to you |
| 19 | +7. **Never replace real model tests with mocked tests** - weakens validation |
| 20 | +8. **Never fix import names** - if `test_model` exists, don't change it to `check_model` |
| 21 | + |
| 22 | +### What These Rules Prevent |
| 23 | + |
| 24 | +**Example violations** (all found in real PRs): |
| 25 | +- Changing `assert result[0]["id"] == "claude-sonnet-4-5-20250929"` to `"gpt-4"` ❌ |
| 26 | +- Replacing real model config tests with mocked/custom model tests ❌ |
| 27 | +- "Fixing" `from resolve_model_config import test_model` to `check_model` ❌ |
| 28 | +- Adding "Fixed incorrect assertions" without explaining what was incorrect ❌ |
| 29 | +- Claiming to "fix test issues" when tests were already passing ❌ |
| 30 | + |
| 31 | +### What TO Do |
| 32 | + |
| 33 | +**When adding a model**: |
| 34 | +- Add ONE new entry to the MODELS dictionary |
| 35 | +- Add ONE new test function (follow existing pattern exactly) |
| 36 | +- Add to feature lists in model_features.py ONLY if needed for your model |
| 37 | +- Do not touch any other files, tests, imports, or configurations |
| 38 | +- If you think something is broken, it's probably not - leave it alone |
| 39 | + |
7 | 40 | ## Files to Modify |
8 | 41 |
|
9 | 42 | 1. **Always required**: |
@@ -137,19 +170,34 @@ See `model_features.py` for complete lists and additional documentation. |
137 | 170 |
|
138 | 171 | **File**: `tests/github_workflows/test_resolve_model_config.py` |
139 | 172 |
|
140 | | -**Important**: Python function names cannot contain hyphens. Convert model ID hyphens to underscores. |
| 173 | +**Important**: |
| 174 | +- Python function names cannot contain hyphens. Convert model ID hyphens to underscores. |
| 175 | +- **Do not modify any existing test functions** - only add your new one at the end of the file |
| 176 | +- **Do not change existing imports** - use what's already there |
| 177 | +- **Do not fix "incorrect" assertions** in other tests - they are correct |
| 178 | + |
| 179 | +**Test template** (copy and modify for your model): |
141 | 180 |
|
142 | 181 | ```python |
143 | | -def test_claude_sonnet_46_config(): # Note: hyphens -> underscores |
144 | | - """Test that claude-sonnet-4-6 has correct configuration.""" |
145 | | - model = MODELS["claude-sonnet-4-6"] # Dictionary key keeps hyphens |
| 182 | +def test_your_model_id_config(): # Replace hyphens with underscores in function name |
| 183 | + """Test that your-model-id has correct configuration.""" |
| 184 | + model = MODELS["your-model-id"] # Dictionary key keeps hyphens |
146 | 185 |
|
147 | | - assert model["id"] == "claude-sonnet-4-6" |
148 | | - assert model["display_name"] == "Claude Sonnet 4.6" |
149 | | - assert model["llm_config"]["model"] == "litellm_proxy/anthropic/claude-sonnet-4-6" |
150 | | - assert model["llm_config"]["temperature"] == 0.0 |
| 186 | + assert model["id"] == "your-model-id" |
| 187 | + assert model["display_name"] == "Your Model Display Name" |
| 188 | + assert model["llm_config"]["model"] == "litellm_proxy/provider/model-name" |
| 189 | + # Only add assertions for parameters YOU added in resolve_model_config.py |
| 190 | + # assert model["llm_config"]["temperature"] == 0.0 |
| 191 | + # assert model["llm_config"]["disable_vision"] is True |
151 | 192 | ``` |
152 | 193 |
|
| 194 | +**What NOT to do in tests**: |
| 195 | +- Don't change assertions in other test functions (even if model names "look wrong") |
| 196 | +- Don't replace real model tests with mocked tests |
| 197 | +- Don't change `test_model` to `check_model` in imports |
| 198 | +- Don't modify mock_models dictionaries in other tests |
| 199 | +- Don't add "fixes" to existing tests - they work as-is |
| 200 | + |
153 | 201 | ## Step 4: Update GPT Variant Detection (GPT models only) |
154 | 202 |
|
155 | 203 | **File**: `openhands-sdk/openhands/sdk/llm/utils/model_prompt_spec.py` |
@@ -211,20 +259,42 @@ MODEL_IDS="your-model-id" GITHUB_OUTPUT=/tmp/output.txt python resolve_model_con |
211 | 259 | ### Required in PR Description |
212 | 260 |
|
213 | 261 | ```markdown |
214 | | -## Integration Test Results |
215 | | -✅ Integration tests passed: [PASTE GITHUB ACTIONS RUN URL] |
| 262 | +## Summary |
| 263 | +Adds the `model-id` model to resolve_model_config.py. |
216 | 264 |
|
217 | | -[Summary table showing test results] |
| 265 | +## Changes |
| 266 | +- Added model-id to MODELS dictionary |
| 267 | +- Added test_model_id_config() test function |
| 268 | +- [Only if applicable] Added to [feature category] in model_features.py |
218 | 269 |
|
219 | 270 | ## Configuration |
220 | 271 | - Model ID: model-id |
221 | | -- Provider: Provider Name |
| 272 | +- Provider: Provider Name |
222 | 273 | - Temperature: [value] - [reasoning for choice] |
223 | | -- Feature categories: [list categories added to model_features.py] |
| 274 | +- [List any special parameters and why needed] |
| 275 | + |
| 276 | +## Integration Test Results |
| 277 | +✅ Integration tests passed: [PASTE GITHUB ACTIONS RUN URL] |
| 278 | + |
| 279 | +[Summary table showing test results] |
224 | 280 |
|
225 | 281 | Fixes #[issue-number] |
226 | 282 | ``` |
227 | 283 |
|
| 284 | +### What NOT to Include in PR Description |
| 285 | + |
| 286 | +**Do not claim to have "fixed" things unless they were actually broken**: |
| 287 | +- ❌ "Fixed test_model import issue" (if tests were passing, there was no issue) |
| 288 | +- ❌ "Fixed incorrect assertions in existing tests" (they were correct) |
| 289 | +- ❌ "Improved test coverage" (unless you actually added new test cases) |
| 290 | +- ❌ "Cleaned up code" (you shouldn't be cleaning up anything) |
| 291 | +- ❌ "Updated test approach" (you shouldn't be changing testing approach) |
| 292 | + |
| 293 | +**Only describe what you actually added**: |
| 294 | +- ✅ "Added gpt-5.3-codex model configuration" |
| 295 | +- ✅ "Added test for gpt-5.3-codex" |
| 296 | +- ✅ "Added gpt-5.3-codex to REASONING_EFFORT_MODELS" |
| 297 | + |
228 | 298 | ## Common Issues |
229 | 299 |
|
230 | 300 | ### Integration Tests Hang (6-8+ hours) |
|
0 commit comments