Skip to content

feat: add latest SOTA models and recalibrate intelligence scores#426

Open
Tony363 wants to merge 2 commits intoBeehiveInnovations:mainfrom
Tony363:feat/update-sota-models-april-2026
Open

feat: add latest SOTA models and recalibrate intelligence scores#426
Tony363 wants to merge 2 commits intoBeehiveInnovations:mainfrom
Tony363:feat/update-sota-models-april-2026

Conversation

@Tony363
Copy link
Copy Markdown

@Tony363 Tony363 commented Apr 6, 2026

Summary

  • New models added: GPT-5.4/mini/nano (OpenAI), Gemini 3.1 Pro (Google), Claude Opus/Sonnet 4.6 (Anthropic via OpenRouter), Llama 4 Scout/Maverick (Meta), Mistral Large 3, Gemma 4 31B/26B
  • Intelligence score recalibration: New flagships at score 20, previous top models shifted down (e.g., GPT-5.2 18→17, Gemini 3 Pro 18→17, Grok-4 18→17)
  • Alias migrations: opus→Opus 4.6, sonnet→Sonnet 4.6, llama→Llama 4 Maverick, mistral→Mistral Large 3 (versioned aliases preserved on older models)
  • Provider preferences: Updated OpenAI preference lists — GPT-5.4 for EXTENDED_REASONING/BALANCED, GPT-5.4-mini for FAST_RESPONSE
  • Tests updated: All 7 affected test files updated to match new preferences and alias mappings

Config changes by file

File New Models Score Changes
conf/openai_models.json 3 (GPT-5.4 family) 6 recalibrated
conf/gemini_models.json 1 (Gemini 3.1 Pro) 2 recalibrated
conf/openrouter_models.json 11 (mirrors + new vendors) 8 recalibrated
conf/xai_models.json 0 1 recalibrated
conf/nebius_models.json 0 4 recalibrated

Test plan

  • ./code_quality_checks.sh passes (866 tests, 0 failures)
  • Ruff linting, Black formatting, isort all clean
  • No duplicate alias collisions in _build_maps()
  • Run listmodels tool to verify new models appear correctly
  • Smoke test alias resolution (e.g., opus → claude-opus-4.6, llama → llama-4-maverick)

🤖 Generated with Claude Code

Tony363 and others added 2 commits February 6, 2026 23:56
Adds Nebius Token Factory as a new provider, giving access to open-source
models (Qwen3, DeepSeek, Llama, GLM, GPT-OSS, Kimi, Gemma, Nemotron)
through their OpenAI-compatible API.

Also includes:
- DeepSeek V3.2 cloud model in custom_models.json
- Grok-4 intelligence score adjustment (16 → 18)
- Auto-formatting fixes from black

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add GPT-5.4 family (flagship/mini/nano), Gemini 3.1 Pro, Claude Opus/Sonnet 4.6,
Llama 4 Scout/Maverick, Mistral Large 3, and Gemma 4 models across all provider
configs. Recalibrate intelligence_scores with new flagships at 20, migrate shorthand
aliases (opus, sonnet, llama, mistral) to newest models, and update OpenAI provider
preference lists and test expectations.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the Nebius Token Factory as a new model provider, adding the NebiusModelProvider and its associated registry. It expands the model catalogs for OpenAI, Gemini, and OpenRouter with new flagship models, including GPT-5.4, Gemini 3.1 Pro, and Claude 4.6, while updating intelligence scores and aliases for existing entries. Furthermore, a new excluded_tools capability has been added to the model metadata to support tool-specific restrictions, and the provider priority order has been modified to deprioritize Azure OpenAI. Formally, I have no feedback to provide.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d9c86e1656

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread providers/registry.py
Comment on lines 45 to +46
ProviderType.OPENROUTER, # Catch-all for cloud models
ProviderType.AZURE, # Azure-hosted OpenAI deployments (last)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Move Azure ahead of OpenRouter in provider priority

Placing ProviderType.AZURE after ProviderType.OPENROUTER changes routing for any model name/alias that both providers can resolve: ModelProviderRegistry.get_provider_for_model() now selects OpenRouter first, so requests that previously stayed on Azure deployments are silently sent to OpenRouter when both are configured. This breaks the documented “OpenRouter catch-all” behavior and can redirect production traffic/cost/compliance paths in mixed Azure+OpenRouter environments.

Useful? React with 👍 / 👎.

ENV_VARS = {
ProviderType.OPENAI: "OPENAI_ALLOWED_MODELS",
ProviderType.GOOGLE: "GOOGLE_ALLOWED_MODELS",
ProviderType.NEBIUS: "NEBIUS_ALLOWED_MODELS",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Validate Nebius allowlists with known-model checks

This commit enables NEBIUS_ALLOWED_MODELS, but startup allowlist validation still only checks a fixed provider subset (Google/OpenAI/XAI/DIAL), so Nebius typos are never surfaced as warnings. In Nebius-only or Nebius-priority setups, a misspelled allowlist entry can leave users with no usable models and a hard-to-diagnose failure path instead of the expected early validation feedback.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant