feat: add latest SOTA models and recalibrate intelligence scores#426
feat: add latest SOTA models and recalibrate intelligence scores#426Tony363 wants to merge 2 commits intoBeehiveInnovations:mainfrom
Conversation
Adds Nebius Token Factory as a new provider, giving access to open-source models (Qwen3, DeepSeek, Llama, GLM, GPT-OSS, Kimi, Gemma, Nemotron) through their OpenAI-compatible API. Also includes: - DeepSeek V3.2 cloud model in custom_models.json - Grok-4 intelligence score adjustment (16 → 18) - Auto-formatting fixes from black Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add GPT-5.4 family (flagship/mini/nano), Gemini 3.1 Pro, Claude Opus/Sonnet 4.6, Llama 4 Scout/Maverick, Mistral Large 3, and Gemma 4 models across all provider configs. Recalibrate intelligence_scores with new flagships at 20, migrate shorthand aliases (opus, sonnet, llama, mistral) to newest models, and update OpenAI provider preference lists and test expectations. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request introduces the Nebius Token Factory as a new model provider, adding the NebiusModelProvider and its associated registry. It expands the model catalogs for OpenAI, Gemini, and OpenRouter with new flagship models, including GPT-5.4, Gemini 3.1 Pro, and Claude 4.6, while updating intelligence scores and aliases for existing entries. Furthermore, a new excluded_tools capability has been added to the model metadata to support tool-specific restrictions, and the provider priority order has been modified to deprioritize Azure OpenAI. Formally, I have no feedback to provide.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d9c86e1656
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| ProviderType.OPENROUTER, # Catch-all for cloud models | ||
| ProviderType.AZURE, # Azure-hosted OpenAI deployments (last) |
There was a problem hiding this comment.
Move Azure ahead of OpenRouter in provider priority
Placing ProviderType.AZURE after ProviderType.OPENROUTER changes routing for any model name/alias that both providers can resolve: ModelProviderRegistry.get_provider_for_model() now selects OpenRouter first, so requests that previously stayed on Azure deployments are silently sent to OpenRouter when both are configured. This breaks the documented “OpenRouter catch-all” behavior and can redirect production traffic/cost/compliance paths in mixed Azure+OpenRouter environments.
Useful? React with 👍 / 👎.
| ENV_VARS = { | ||
| ProviderType.OPENAI: "OPENAI_ALLOWED_MODELS", | ||
| ProviderType.GOOGLE: "GOOGLE_ALLOWED_MODELS", | ||
| ProviderType.NEBIUS: "NEBIUS_ALLOWED_MODELS", |
There was a problem hiding this comment.
Validate Nebius allowlists with known-model checks
This commit enables NEBIUS_ALLOWED_MODELS, but startup allowlist validation still only checks a fixed provider subset (Google/OpenAI/XAI/DIAL), so Nebius typos are never surfaced as warnings. In Nebius-only or Nebius-priority setups, a misspelled allowlist entry can leave users with no usable models and a hard-to-diagnose failure path instead of the expected early validation feedback.
Useful? React with 👍 / 👎.
Summary
opus→Opus 4.6,sonnet→Sonnet 4.6,llama→Llama 4 Maverick,mistral→Mistral Large 3 (versioned aliases preserved on older models)Config changes by file
conf/openai_models.jsonconf/gemini_models.jsonconf/openrouter_models.jsonconf/xai_models.jsonconf/nebius_models.jsonTest plan
./code_quality_checks.shpasses (866 tests, 0 failures)_build_maps()listmodelstool to verify new models appear correctlyopus→ claude-opus-4.6,llama→ llama-4-maverick)🤖 Generated with Claude Code