-
-
Notifications
You must be signed in to change notification settings - Fork 982
feat: Add Cerebras Inference provider (ZAI-GLM, GPT-OSS, Qwen3, Llama) #429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
c740ffd
d2d06c5
c384dce
3c7b784
9c566c1
c54584d
c10f78d
c67bce3
a7c9386
de64532
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,111 @@ | ||
| { | ||
| "_README": { | ||
| "description": "Model metadata for Cerebras Inference API.", | ||
| "documentation": "https://inference-docs.cerebras.ai/models", | ||
| "usage": "Models listed here are exposed directly through the Cerebras provider. Aliases are case-insensitive.", | ||
| "field_notes": "Matches providers/shared/model_capabilities.py.", | ||
| "field_descriptions": { | ||
| "model_name": "The model identifier (e.g., 'zai-glm-4.7')", | ||
| "aliases": "Array of short names users can type instead of the full model name", | ||
| "context_window": "Total number of tokens the model can process (input + output combined)", | ||
| "max_output_tokens": "Maximum number of tokens the model can generate in a single response", | ||
| "supports_extended_thinking": "Whether the model supports extended reasoning tokens", | ||
| "supports_json_mode": "Whether the model can guarantee valid JSON output", | ||
| "supports_function_calling": "Whether the model supports function/tool calling", | ||
| "supports_images": "Whether the model can process images/visual input", | ||
| "supports_temperature": "Whether the model accepts temperature parameter in API calls", | ||
| "description": "Human-readable description of the model", | ||
| "intelligence_score": "1-20 human rating used as the primary signal for auto-mode model ordering" | ||
| } | ||
| }, | ||
| "models": [ | ||
| { | ||
| "model_name": "gpt-oss-120b", | ||
| "friendly_name": "Cerebras (gpt-oss-120b)", | ||
| "aliases": [ | ||
| "gpt-oss", | ||
| "oss-120b", | ||
| "openai-oss" | ||
| ], | ||
| "intelligence_score": 17, | ||
| "description": "OpenAI GPT-OSS 120B — ultra-fast inference (~3000 tok/s), 128K context, internal chain-of-thought reasoning, strong agentic/tool-use", | ||
| "context_window": 131072, | ||
| "max_output_tokens": 40000, | ||
| "supports_extended_thinking": false, | ||
| "supports_system_prompts": true, | ||
| "supports_streaming": true, | ||
| "supports_function_calling": true, | ||
| "supports_json_mode": true, | ||
| "supports_images": false, | ||
| "supports_temperature": true, | ||
| "temperature_constraint": "range" | ||
| }, | ||
| { | ||
| "model_name": "qwen-3-235b-a22b-instruct-2507", | ||
| "friendly_name": "Cerebras (qwen-3-235b-a22b-instruct-2507)", | ||
| "aliases": [ | ||
| "qwen3", | ||
| "qwen-3", | ||
| "qwen235b", | ||
| "qwen3-235b" | ||
| ], | ||
| "intelligence_score": 16, | ||
| "description": "Qwen3-235B-A22B Instruct 2507 — fast inference (~1400 tok/s), 128K context, strong coding/reasoning/tool use", | ||
| "context_window": 131072, | ||
| "max_output_tokens": 40000, | ||
| "supports_extended_thinking": false, | ||
| "supports_system_prompts": true, | ||
| "supports_streaming": true, | ||
| "supports_function_calling": true, | ||
| "supports_json_mode": true, | ||
| "supports_images": false, | ||
| "supports_temperature": true, | ||
| "temperature_constraint": "range" | ||
| }, | ||
| { | ||
| "model_name": "zai-glm-4.7", | ||
| "friendly_name": "Cerebras (zai-glm-4.7)", | ||
| "aliases": [ | ||
| "cerebras", | ||
| "glm", | ||
| "glm-4.7", | ||
| "zai", | ||
| "zai-glm" | ||
| ], | ||
| "intelligence_score": 14, | ||
| "description": "Cerebras ZAI-GLM 4.7 — fast inference (~1000 tok/s), 128K context, reasoning model with tool calling", | ||
| "context_window": 131072, | ||
| "max_output_tokens": 40000, | ||
| "supports_extended_thinking": false, | ||
| "supports_system_prompts": true, | ||
| "supports_streaming": true, | ||
| "supports_function_calling": true, | ||
| "supports_json_mode": true, | ||
| "supports_images": false, | ||
| "supports_temperature": true, | ||
| "temperature_constraint": "range" | ||
| }, | ||
| { | ||
| "model_name": "llama3.1-8b", | ||
| "friendly_name": "Cerebras (llama3.1-8b)", | ||
| "aliases": [ | ||
| "llama8b", | ||
| "llama-8b", | ||
| "llama3.1", | ||
| "llama3-8b" | ||
| ], | ||
| "intelligence_score": 9, | ||
| "description": "Meta Llama 3.1 8B — fastest small model on Cerebras (~2200 tok/s), 32K context, ideal for real-time and high-throughput tasks", | ||
| "context_window": 32768, | ||
| "max_output_tokens": 8192, | ||
| "supports_extended_thinking": false, | ||
| "supports_system_prompts": true, | ||
| "supports_streaming": true, | ||
| "supports_function_calling": true, | ||
| "supports_json_mode": true, | ||
| "supports_images": false, | ||
| "supports_temperature": true, | ||
| "temperature_constraint": "range" | ||
| } | ||
| ] | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,83 @@ | ||
| """Cerebras Inference model provider implementation.""" | ||
|
|
||
| from __future__ import annotations | ||
|
|
||
| import logging | ||
| from typing import TYPE_CHECKING, ClassVar | ||
|
|
||
| if TYPE_CHECKING: | ||
| from tools.models import ToolModelCategory | ||
|
|
||
| from .openai_compatible import OpenAICompatibleProvider | ||
| from .registries.cerebras import CerebrasModelRegistry | ||
| from .registry_provider_mixin import RegistryBackedProviderMixin | ||
| from .shared import ModelCapabilities, ProviderType | ||
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| class CerebrasModelProvider(RegistryBackedProviderMixin, OpenAICompatibleProvider): | ||
| """Integration for Cerebras Inference API. | ||
|
|
||
| Publishes capability metadata for the officially supported deployments and | ||
| maps tool-category preferences to the appropriate Cerebras model. | ||
|
|
||
| Model routing by category: | ||
| BALANCED → zai-glm-4.7 (default; only model on Cerebras Code plan) | ||
| EXTENDED_REASONING → gpt-oss-120b (strongest reasoning, ~3000 tok/s; paid tier) | ||
| FAST_RESPONSE → llama3.1-8b (fastest small model, ~2200 tok/s; paid tier) | ||
| """ | ||
|
|
||
| FRIENDLY_NAME = "Cerebras" | ||
|
|
||
| REGISTRY_CLASS = CerebrasModelRegistry | ||
| MODEL_CAPABILITIES: ClassVar[dict[str, ModelCapabilities]] = {} | ||
|
|
||
| # Category routing — ordered preference lists (first available wins). | ||
| # zai-glm-4.7 is the default: it is the only model on the Cerebras Code | ||
| # (free) plan and must always be the BALANCED fallback. | ||
| _REASONING_PREFERENCE = ["gpt-oss-120b", "qwen-3-235b-a22b-instruct-2507", "zai-glm-4.7", "llama3.1-8b"] | ||
| _BALANCED_PREFERENCE = ["zai-glm-4.7", "qwen-3-235b-a22b-instruct-2507", "gpt-oss-120b", "llama3.1-8b"] | ||
| _FAST_PREFERENCE = ["llama3.1-8b", "zai-glm-4.7", "qwen-3-235b-a22b-instruct-2507", "gpt-oss-120b"] | ||
|
|
||
| def __init__(self, api_key: str, **kwargs): | ||
| """Initialize Cerebras provider with API key.""" | ||
| kwargs.setdefault("base_url", "https://api.cerebras.ai/v1") | ||
| self._ensure_registry() | ||
| super().__init__(api_key, **kwargs) | ||
| self._invalidate_capability_cache() | ||
|
|
||
| def get_provider_type(self) -> ProviderType: | ||
| """Get the provider type.""" | ||
| return ProviderType.CEREBRAS | ||
|
|
||
| def get_preferred_model(self, category: ToolModelCategory, allowed_models: list[str]) -> str | None: | ||
| """Get Cerebras's preferred model for a given category from allowed models. | ||
|
|
||
| Args: | ||
| category: The tool category requiring a model | ||
| allowed_models: Pre-filtered list of models allowed by restrictions | ||
|
|
||
| Returns: | ||
| Preferred model name or None | ||
| """ | ||
| if not allowed_models: | ||
| return None | ||
|
|
||
| from tools.models import ToolModelCategory | ||
|
|
||
| if category == ToolModelCategory.EXTENDED_REASONING: | ||
| preference = self._REASONING_PREFERENCE | ||
| elif category == ToolModelCategory.FAST_RESPONSE: | ||
| preference = self._FAST_PREFERENCE | ||
| else: # BALANCED or default | ||
| preference = self._BALANCED_PREFERENCE | ||
|
|
||
| for model in preference: | ||
| if model in allowed_models: | ||
| return model | ||
|
Comment on lines
+76
to
+78
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This selector assumes Useful? React with 👍 / 👎.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Great catch — confirmed real bug and fixed in c10f78d.
Fix (smaller and lower-risk than restructuring the routing): added Regression coverage added in the same commit:
Both tests would have caught this bug. Thanks! |
||
| return allowed_models[0] | ||
|
|
||
|
|
||
| # Load registry data at import time | ||
| CerebrasModelProvider._ensure_registry() | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,19 @@ | ||
| """Registry loader for Cerebras model capabilities.""" | ||
|
|
||
| from __future__ import annotations | ||
|
|
||
| from ..shared import ProviderType | ||
| from .base import CapabilityModelRegistry | ||
|
|
||
|
|
||
| class CerebrasModelRegistry(CapabilityModelRegistry): | ||
| """Capability registry backed by ``conf/cerebras_models.json``.""" | ||
|
|
||
| def __init__(self, config_path: str | None = None) -> None: | ||
| super().__init__( | ||
| env_var_name="CEREBRAS_MODELS_CONFIG_PATH", | ||
| default_filename="cerebras_models.json", | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This registry assumes Useful? React with 👍 / 👎.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the catch — fixed in de64532. Critical assessment: the underlying observation is correct, but the runtime impact statement is not. Fixed: Added Verification: Rebuilt the wheel and confirmed Important nuance on the runtime impact claim: The PR did not actually break Cerebras for wheel installs as stated. The file was already shipped via the The wheel-build verification before the fix confirmed this: That said, the missing |
||
| provider=ProviderType.CEREBRAS, | ||
| friendly_prefix="Cerebras ({model})", | ||
| config_path=config_path, | ||
| ) | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -12,7 +12,6 @@ | |
| 5. Proper tool chaining with context | ||
| """ | ||
|
|
||
|
|
||
| from .conversation_base_test import ConversationBaseTest | ||
|
|
||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -9,7 +9,6 @@ | |
| - Model alias resolution for local models | ||
| """ | ||
|
|
||
|
|
||
| from .base_test import BaseSimulatorTest | ||
|
|
||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
CerebrasModelProviderclass is missing the_resolve_model_namemethod. While the baseModelProvidermight have a generic implementation, native providers in this repository typically implement this method to leverage the registry's optimizedalias_mapfor resolving shorthands (likecerebrasorglm) to canonical names before making API calls. This is critical for ensuring that aliases defined inconf/cerebras_models.jsonwork correctly ingenerate_content.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Respectfully pushing back on this one — I believe the suggestion is unnecessary and the suggested code would not work as drafted.
Why no override is needed: The base class
ModelProvider._resolve_model_name()(providers/base.py:404) already does proper alias resolution by callingget_all_model_capabilities()andModelCapabilities.collect_aliases().OpenAICompatibleProvidercallsself._resolve_model_name()in three places (providers/openai_compatible.py:83, 550, 725) before any API call.The reference XAI provider doesn't override it either.
providers/xai.pyextends the sameRegistryBackedProviderMixin + OpenAICompatibleProviderpair and relies on the base class implementation. The Cerebras provider intentionally mirrors that pattern for consistency with the existing codebase.It's already test-covered.
tests/test_cerebras_provider.py::test_generate_content_resolves_alias_before_api_callmocks the OpenAI client and asserts that whengenerate_content(model_name="cerebras")is called, the underlying API receives"zai-glm-4.7"(the resolved canonical name), not the alias.test_generate_content_other_aliasesdoes the same forglm,glm-4.7,zai, andzai-glm.test_resolve_model_nameverifies the resolution directly.It has been validated end-to-end against the live Cerebras API. Before opening this PR, all 16 aliases (4 for gpt-oss-120b, 4 for qwen3-235b, 5 for zai-glm-4.7, 4 for llama3.1-8b) were manually exercised via real
api.cerebras.ai/v1calls and confirmed to resolve correctly.The suggested code would not work as drafted.
self._registryis not the attribute used byRegistryBackedProviderMixin— the mixin exposes a class-level registry via_REGISTRYand the_ensure_registry()hook. Adopting the snippet as written would either crash or silently no-op on first call. If there's a different optimization path you have in mind, happy to discuss, but the current implementation works correctly and matches the established pattern in the repo.