Skip to content

feat: Add Cerebras Inference provider (ZAI-GLM, GPT-OSS, Qwen3, Llama)#429

Open
tommyyzhao wants to merge 10 commits intoBeehiveInnovations:mainfrom
tommyyzhao:feat/cerebras-provider
Open

feat: Add Cerebras Inference provider (ZAI-GLM, GPT-OSS, Qwen3, Llama)#429
tommyyzhao wants to merge 10 commits intoBeehiveInnovations:mainfrom
tommyyzhao:feat/cerebras-provider

Conversation

@tommyyzhao
Copy link
Copy Markdown

@tommyyzhao tommyyzhao commented Apr 8, 2026

Description

Adds first-class support for the Cerebras Inference API as a new native provider, exposing four models across the Cerebras catalogue (ZAI-GLM, OpenAI GPT-OSS, Qwen3, and Llama 3.1). The provider is implemented as an OpenAI-compatible registry-backed provider, mirroring the existing X.AI (providers/xai.py) pattern so the addition feels native to the codebase.

Cerebras delivers industry-leading inference speeds (~1000–3000 tok/s) which makes it a strong default for fast-turnaround tools like chat, consensus, and planner workflows. The default zai-glm-4.7 is also the only model available on the Cerebras Code (Tier 1) plan, so BALANCED auto-mode routing prefers it first to ensure out-of-the-box compatibility with free-tier users.

Changes Made

  • New provider: providers/cerebras.pyCerebrasModelProvider extending RegistryBackedProviderMixin + OpenAICompatibleProvider with category-aware routing across all four models.

  • New registry loader: providers/registries/cerebras.pyCerebrasModelRegistry backed by conf/cerebras_models.json (overridable via CEREBRAS_MODELS_CONFIG_PATH).

  • New model manifest: conf/cerebras_models.json — capability metadata for all four models. Specs verified against the live /models endpoint and Cerebras inference documentation:

    Model Context Max Out Speed Intelligence Aliases
    gpt-oss-120b 131K 40K ~3000 tok/s 17 gpt-oss, oss-120b, openai-oss
    qwen-3-235b-a22b-instruct-2507 131K 40K ~1400 tok/s 16 qwen3, qwen-3, qwen235b, qwen3-235b
    zai-glm-4.7 (default) 131K 40K ~1000 tok/s 14 cerebras, glm, glm-4.7, zai, zai-glm
    llama3.1-8b 32K 8K ~2200 tok/s 9 llama8b, llama-8b, llama3.1, llama3-8b
  • Provider type & registration: Added ProviderType.CEREBRAS to providers/shared/provider_type.py, wired CEREBRAS_API_KEY mapping and priority slot (after XAI, before DIAL) in providers/registry.py, and registered in server.configure_providers() in server.py.

  • listmodels tool integration: Added ProviderType.CEREBRAS entry to the provider_info display table in tools/listmodels.py so Cerebras models show up alongside the other native providers.

  • Category routing: get_preferred_model() maps BALANCED → zai-glm-4.7 (Code plan default), EXTENDED_REASONING → gpt-oss-120b, FAST_RESPONSE → llama3.1-8b, with ordered fallback lists across the full catalogue.

  • Model restrictions: Supports CEREBRAS_ALLOWED_MODELS environment variable for tenant-level restriction.

  • .env.example: Added Cerebras section with API key URL, optional restrictions variable, config path override, and full list of models + aliases.

  • docs/configuration.md: Added Cerebras row to the provider catalogue table and added conf/cerebras_models.json to the manifest list.

  • Unit tests (tests/test_cerebras_provider.py): 22 new tests covering provider initialization, alias resolution, per-model capabilities, category routing, fallback behaviour, model restrictions, and critical alias-resolution-before-API-call verification (mock-based).

  • Test fixture update (tests/test_auto_mode_model_listing.py): Added CEREBRAS_API_KEY to environment cleanup lists so existing auto-mode tests continue to isolate Cerebras correctly.

  • Breaking changes introduced (N/A)

  • Dependencies added/removed (N/A — reuses existing openai SDK)

Test plan

  • All unit tests pass: 888 passed, 4 skipped, 16 deselected (22 new Cerebras tests + no regressions in the existing suite)
  • Linting clean: ruff check ., black --check ., and isort --check-only . all pass on touched files
  • Live API validation: All four models + their aliases exercised end-to-end against the production Cerebras API (https://api.cerebras.ai/v1):
    • model: "cerebras" → resolved to zai-glm-4.7 → ✅ round-trip success
    • model: "gpt-oss" → resolved to gpt-oss-120b → ✅ round-trip success
    • model: "qwen3" → resolved to qwen-3-235b-a22b-instruct-2507 → ✅ round-trip success
    • model: "llama8b" → resolved to llama3.1-8b → ✅ round-trip success
  • listmodels verification: Confirmed the Cerebras section renders all four models and all 16 aliases correctly
  • Cerebras /models endpoint audit: The four canonical model IDs in the manifest match exactly what the Cerebras API currently exposes

Design Notes

Pattern consistency. The implementation closely mirrors providers/xai.py + providers/registries/xai.py + conf/xai_models.json so readers already familiar with the XAI provider will find the Cerebras one structurally identical.

supports_extended_thinking is intentionally false for all four models. Cerebras's reasoning models (gpt-oss-120b, zai-glm-4.7) do reason internally, but they do not expose the reasoning-token protocol that the tools layer uses to inject thinking_mode parameters. Setting the flag to true would cause the tools layer to send parameters the Cerebras API would silently ignore. If Cerebras adds a compatible reasoning-effort parameter in the future, the flag can be flipped and the OpenAICompatibleProvider.generate_content() path extended to forward it.

zai-glm-4.7 as the BALANCED default. This is a deliberate choice: it is the only model available on the Cerebras Code (free) plan. Routing BALANCED to any paid-tier model first would break auto-mode for free-tier users. The preference lists still upgrade to gpt-oss-120b / llama3.1-8b for their specialist categories when those models are available (paid tier).

llama3.1-8b context window (32K, not 128K). This reflects the Cerebras-side serving limit, not the base Meta model spec. Documented in the manifest description so users aren't surprised.

Related Files (for reviewer navigation)

File Role
providers/cerebras.py Provider class + routing logic
providers/registries/cerebras.py Registry loader
conf/cerebras_models.json Model capability manifest
tests/test_cerebras_provider.py Unit tests (22 tests)
providers/shared/provider_type.py ProviderType.CEREBRAS enum addition
providers/registry.py API key mapping + priority order
server.py configure_providers() registration
tools/listmodels.py provider_info display entry
.env.example User-facing configuration docs
docs/configuration.md Provider catalogue table

tommyyzhao and others added 6 commits April 6, 2026 00:41
Add support for Cerebras Inference API with the zai-glm-4.7 model.
Implements RegistryBackedProviderMixin + OpenAICompatibleProvider pattern
matching the existing XAI provider structure.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Set supports_extended_thinking to false (no reasoning-token protocol)
- Collapse get_preferred_model to single branch (single-model provider)
- Remove dead FALLBACK_MODEL constant (identical to PRIMARY_MODEL)
- Fix "131K" to "128K" in .env.example comments (131072 tokens = 128K)
- Add missing blank line between Cerebras and DIAL sections in .env.example
- Add missing zai-glm alias to .env.example model docs
- Use "range" string for temperature_constraint (matching registry format)
- Reorder ProviderType enum to match PROVIDER_PRIORITY_ORDER
- Remove duplicate monkeypatch.delenv in test_auto_mode_model_listing
- Remove redundant inner MagicMock import in test
- Update all test assertions to match corrected capabilities

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The listmodels tool hardcodes a provider_info dict for native providers
but was missing CEREBRAS, so it never appeared in the output even though
the provider was correctly registered at startup.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Verified via Cerebras /models endpoint. All three are available on the
paid tier. Key specs (from inference-docs.cerebras.ai):

- gpt-oss-120b: 131K ctx, 40K out, ~3000 tok/s, internal chain-of-thought
- qwen-3-235b-a22b-instruct-2507: 131K ctx, 40K out, ~1400 tok/s
- llama3.1-8b: 32K ctx (Cerebras limit), 8K out, ~2200 tok/s

Also introduces real category routing in get_preferred_model now that
multiple models exist: EXTENDED_REASONING→gpt-oss-120b, BALANCED→qwen3,
FAST_RESPONSE→llama3.1-8b with graceful fallbacks throughout.

Adds 7 new tests (22 total) covering per-model capabilities, alias
resolution, category routing, and fallback behaviour.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
zai-glm-4.7 is the only model on the Cerebras Code (free) plan and
must be the default. BALANCED routing now prefers it first so Code plan
users always get a working model in auto mode. Paid-tier models
(gpt-oss-120b, qwen-3-235b, llama3.1-8b) remain preferred for their
respective specialist categories (EXTENDED_REASONING, FAST_RESPONSE).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Now that the Cerebras provider ships with four models (zai-glm-4.7,
gpt-oss-120b, qwen-3-235b, llama3.1-8b), update stale single-model
references in .env.example, server.py startup log, provider priority
comment, and the configuration docs table.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c54584df81

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread providers/cerebras.py
Comment on lines +76 to +78
for model in preference:
if model in allowed_models:
return model
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Apply Cerebras allowlist before category model routing

This selector assumes allowed_models is already restriction-filtered, but CEREBRAS_ALLOWED_MODELS is not wired into ModelRestrictionService.ENV_VARS, so auto-mode can pass the full Cerebras catalog here. In that case EXTENDED_REASONING/FAST_RESPONSE may pick gpt-oss-120b or llama3.1-8b even when only zai-glm-4.7 is configured, and the request then fails later when generate_content() enforces the provider allowlist. Please enforce the provider allowlist before choosing from preference (or add Cerebras to the centralized restriction mapping) so auto-mode never selects a disallowed model.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch — confirmed real bug and fixed in c10f78d.

CEREBRAS_ALLOWED_MODELS was documented in .env.example but missing from ModelRestrictionService.ENV_VARS (utils/model_restrictions.py:51), so the env var was silently ignored. The downstream impact was exactly as you described: auto-mode could route EXTENDED_REASONING/FAST_RESPONSE to gpt-oss-120b or llama3.1-8b even when a Code-plan user tried to restrict to zai-glm-4.7.

Fix (smaller and lower-risk than restructuring the routing): added ProviderType.CEREBRAS: "CEREBRAS_ALLOWED_MODELS" to the centralized ENV_VARS mapping. This is the correct architectural location — it's where every other native provider is wired (OPENAI, GOOGLE, XAI, OPENROUTER, DIAL) — and it lets the existing _get_allowed_models_for_provider() filter naturally pre-filter the catalogue before get_preferred_model() ever sees it. No changes needed to the routing code itself.

Regression coverage added in the same commit:

  1. test_model_restrictions now asserts that paid-tier models (gpt-oss-120b, qwen-3-235b, llama3.1-8b) are explicitly rejected when only zai-glm-4.7 is allowlisted — not just that the allowed model passes.
  2. New test_restrictions_filter_auto_mode_routing test that exercises the full ModelProviderRegistry._get_allowed_models_for_provider()get_preferred_model() path with CEREBRAS_ALLOWED_MODELS=zai-glm-4.7 set, asserting all three categories (BALANCED, EXTENDED_REASONING, FAST_RESPONSE) return zai-glm-4.7 and never the paid-tier models.

Both tests would have caught this bug. Thanks!

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces integration for the Cerebras Inference API, adding a new model provider, registry loader, and metadata configuration for models such as GPT-OSS, Qwen-3, and Llama 3.1. The implementation includes category-based model routing, environment variable support, and comprehensive unit tests. Feedback for this PR includes adding a _resolve_model_name method to the provider to ensure model aliases are correctly resolved before API calls and correcting context window descriptions in the model metadata for consistency with binary prefixes.

Comment thread providers/cerebras.py
_REASONING_PREFERENCE = ["gpt-oss-120b", "qwen-3-235b-a22b-instruct-2507", "zai-glm-4.7", "llama3.1-8b"]
_BALANCED_PREFERENCE = ["zai-glm-4.7", "qwen-3-235b-a22b-instruct-2507", "gpt-oss-120b", "llama3.1-8b"]
_FAST_PREFERENCE = ["llama3.1-8b", "zai-glm-4.7", "qwen-3-235b-a22b-instruct-2507", "gpt-oss-120b"]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The CerebrasModelProvider class is missing the _resolve_model_name method. While the base ModelProvider might have a generic implementation, native providers in this repository typically implement this method to leverage the registry's optimized alias_map for resolving shorthands (like cerebras or glm) to canonical names before making API calls. This is critical for ensuring that aliases defined in conf/cerebras_models.json work correctly in generate_content.

Suggested change
def _resolve_model_name(self, model_name: str) -> str:
"""Resolve model name or alias to canonical name."""
self._ensure_registry()
if self._registry:
config = self._registry.resolve(model_name)
if config:
return config.model_name
return model_name

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Respectfully pushing back on this one — I believe the suggestion is unnecessary and the suggested code would not work as drafted.

Why no override is needed: The base class ModelProvider._resolve_model_name() (providers/base.py:404) already does proper alias resolution by calling get_all_model_capabilities() and ModelCapabilities.collect_aliases(). OpenAICompatibleProvider calls self._resolve_model_name() in three places (providers/openai_compatible.py:83, 550, 725) before any API call.

The reference XAI provider doesn't override it either. providers/xai.py extends the same RegistryBackedProviderMixin + OpenAICompatibleProvider pair and relies on the base class implementation. The Cerebras provider intentionally mirrors that pattern for consistency with the existing codebase.

It's already test-covered. tests/test_cerebras_provider.py::test_generate_content_resolves_alias_before_api_call mocks the OpenAI client and asserts that when generate_content(model_name="cerebras") is called, the underlying API receives "zai-glm-4.7" (the resolved canonical name), not the alias. test_generate_content_other_aliases does the same for glm, glm-4.7, zai, and zai-glm. test_resolve_model_name verifies the resolution directly.

It has been validated end-to-end against the live Cerebras API. Before opening this PR, all 16 aliases (4 for gpt-oss-120b, 4 for qwen3-235b, 5 for zai-glm-4.7, 4 for llama3.1-8b) were manually exercised via real api.cerebras.ai/v1 calls and confirmed to resolve correctly.

The suggested code would not work as drafted. self._registry is not the attribute used by RegistryBackedProviderMixin — the mixin exposes a class-level registry via _REGISTRY and the _ensure_registry() hook. Adopting the snippet as written would either crash or silently no-op on first call. If there's a different optimization path you have in mind, happy to discuss, but the current implementation works correctly and matches the established pattern in the repo.

Comment thread conf/cerebras_models.json Outdated
"openai-oss"
],
"intelligence_score": 17,
"description": "OpenAI GPT-OSS 120B — ultra-fast inference (~3000 tok/s), 131K context, internal chain-of-thought reasoning, strong agentic/tool-use",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The description for gpt-oss-120b mentions a 131K context, but the context_window value of 131072 is exactly 128K (128 * 1024). For consistency with zai-glm-4.7 (line 76) and llama3.1-8b (line 98), which use binary prefixes, this should be updated to 128K.

Suggested change
"description": "OpenAI GPT-OSS 120B — ultra-fast inference (~3000 tok/s), 131K context, internal chain-of-thought reasoning, strong agentic/tool-use",
"description": "OpenAI GPT-OSS 120B — ultra-fast inference (~3000 tok/s), 128K context, internal chain-of-thought reasoning, strong agentic/tool-use",

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, fixed in c10f78d. Updated the description to 128K context for consistency with the zai-glm-4.7 and llama3.1-8b entries which already use binary prefixes (131072 = 128 × 1024 = 128K).

Comment thread conf/cerebras_models.json Outdated
"qwen3-235b"
],
"intelligence_score": 16,
"description": "Qwen3-235B-A22B Instruct 2507 — fast inference (~1400 tok/s), 131K context, strong coding/reasoning/tool use",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the gpt-oss-120b entry, the description for qwen-3 mentions 131K context while the value 131072 corresponds to 128K. Please update the description for consistency.

Suggested change
"description": "Qwen3-235B-A22B Instruct 2507 — fast inference (~1400 tok/s), 131K context, strong coding/reasoning/tool use",
"description": "Qwen3-235B-A22B Instruct 2507 — fast inference (~1400 tok/s), 128K context, strong coding/reasoning/tool use",

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in c10f78d, same change as the gpt-oss-120b entry above. 131K128K in the qwen-3 description. Also updated .env.example to be consistent across all four entries.

Addresses code review feedback on PR BeehiveInnovations#429.

CRITICAL (Codex P1): CEREBRAS was missing from
ModelRestrictionService.ENV_VARS, so the CEREBRAS_ALLOWED_MODELS env var
documented in .env.example was silently ignored. Auto-mode could pick
gpt-oss-120b or llama3.1-8b for EXTENDED_REASONING/FAST_RESPONSE even
when a Cerebras Code (free) plan user tried to restrict to zai-glm-4.7,
causing later API failures. Now correctly wired alongside the other
*_ALLOWED_MODELS env vars.

Strengthens test_model_restrictions to assert paid-tier models are
REJECTED (not just that allowed models pass), and adds a new
test_restrictions_filter_auto_mode_routing regression test that proves
the registry's allowlist filter prevents get_preferred_model from
selecting disallowed paid-tier models. Also fixes test_multiple_model_restrictions
to register the provider with the registry so alias-to-canonical
resolution works inside the restriction service.

DOCS (Gemini medium): Fix "131K context" → "128K context" in the
gpt-oss-120b and qwen-3-235b descriptions for consistency with the
other entries (131072 = 128 × 1024 = 128K binary).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@tommyyzhao
Copy link
Copy Markdown
Author

Review feedback addressed (commit c10f78d)

Thanks @gemini-code-assist and @chatgpt-codex-connector for the review. Here is a summary of how each comment was triaged:

✅ Codex P1 — CEREBRAS_ALLOWED_MODELS not wired into restriction service (fixed)

Confirmed real bug. CEREBRAS was missing from ModelRestrictionService.ENV_VARS, so the env var documented in .env.example was silently ignored. Auto-mode could pick paid-tier models (gpt-oss-120b, llama3.1-8b) even when a Code-plan user tried to restrict to zai-glm-4.7, causing later API failures.

Fix: added ProviderType.CEREBRAS: "CEREBRAS_ALLOWED_MODELS" to the centralized ENV_VARS mapping in utils/model_restrictions.py — same pattern as every other native provider. The existing _get_allowed_models_for_provider() filter now correctly pre-filters the catalogue before get_preferred_model() ever sees it.

Regression coverage added in the same commit:

  • Strengthened test_model_restrictions to assert paid-tier models are explicitly rejected when only zai-glm-4.7 is allowlisted
  • New test_restrictions_filter_auto_mode_routing test exercises the full registry → routing path and asserts all three categories return zai-glm-4.7 (not paid-tier models) when restrictions are active
  • Also fixed pre-existing test_multiple_model_restrictions to register the provider with the registry (it was previously a no-op since restrictions weren't loading)

⚠️ Gemini high-pri — _resolve_model_name override (declined with rationale)

The base class ModelProvider._resolve_model_name() (providers/base.py:404) already handles alias resolution via get_all_model_capabilities() + ModelCapabilities.collect_aliases(). The reference XAI provider doesn't override it either. Behavior is verified by 3 unit tests (test_resolve_model_name, test_generate_content_resolves_alias_before_api_call, test_generate_content_other_aliases) and end-to-end live API validation against all 16 aliases. The suggested code as drafted references self._registry which is not the attribute used by RegistryBackedProviderMixin. Detailed reply in the inline thread.

✅ Gemini medium x2 — "131K context" → "128K context" (fixed)

Updated descriptions for gpt-oss-120b and qwen-3-235b-a22b-instruct-2507 in conf/cerebras_models.json to use binary prefix consistently with the other entries (131072 = 128 × 1024 = 128K). Also updated .env.example for consistency.

Test results after fixes

889 passed, 4 skipped, 16 deselected

+1 net test from the new restriction-routing regression test. All linting (ruff/black/isort) clean. Live API validation against api.cerebras.ai/v1 re-confirmed for all 4 models post-fix.

cc @guidedways — would love a maintainer review whenever you have a moment.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Thanks for the detailed breakdown and for addressing the CEREBRAS_ALLOWED_MODELS bug. The added regression tests in test_model_restrictions and test_restrictions_filter_auto_mode_routing provide excellent coverage for this fix. Regarding the _resolve_model_name override, I appreciate the clarification and the verification via your unit tests; it makes sense to rely on the base class implementation given the existing test coverage. The consistency updates in conf/cerebras_models.json and .env.example are also appreciated. Everything looks solid.

The two restriction tests that exercise ModelProviderRegistry.get_provider()
were failing in CI because that path requires a real API key from the
environment, and CI does not set CEREBRAS_API_KEY. Locally the tests
passed because the developer .env file provided a key.

Adds CEREBRAS_API_KEY="test-key" to the patch.dict so the registry
returns a real provider instance regardless of the host environment.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c10f78d359

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

def __init__(self, config_path: str | None = None) -> None:
super().__init__(
env_var_name="CEREBRAS_MODELS_CONFIG_PATH",
default_filename="cerebras_models.json",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Package Cerebras model manifest with the distribution

This registry assumes cerebras_models.json is available at runtime, but the packaging metadata was not updated to ship that new manifest (the existing setuptools data-files list still omits it), so wheel installs end up without conf/cerebras_models.json. In that environment _load_config_data() falls back to an empty model list, leaving CerebrasModelProvider with no capabilities and breaking Cerebras model resolution/auto routing for pip-installed deployments.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the catch — fixed in de64532. Critical assessment: the underlying observation is correct, but the runtime impact statement is not.

Fixed: Added conf/cerebras_models.json to the [tool.setuptools.data-files] list in pyproject.toml for consistency with the other seven model manifests.

Verification: Rebuilt the wheel and confirmed cerebras_models.json now appears in both install locations:

$ python -m zipfile -l pal_mcp_server-9.8.2-py3-none-any.whl | grep cerebras_models
conf/cerebras_models.json                                       4343
pal_mcp_server-9.8.2.data/data/conf/cerebras_models.json        4343

Important nuance on the runtime impact claim: The PR did not actually break Cerebras for wheel installs as stated. The file was already shipped via the [tool.setuptools.package-data] glob ("conf/*.json", line 24 of pyproject.toml), which catches all conf JSONs including the new one. The runtime registry loader (providers/registries/base.py:46-54) uses importlib.resources.files("conf").joinpath(default_filename) which reads from the package-data location, not the data-files location, so _load_config_data() would have found the manifest correctly in any wheel install.

The wheel-build verification before the fix confirmed this:

$ python -m zipfile -l pal_mcp_server-9.8.2-py3-none-any.whl | grep cerebras_models
conf/cerebras_models.json                                       4343   ← already shipped

That said, the missing data-files entry was a real consistency oversight worth fixing — every other model manifest is listed there, and any deployment path that reads from <prefix>/conf/ (e.g. some uvx flows, or custom packagers) would have been inconsistent. So the fix is committed and the PR now matches the established pattern across all eight model manifests.

The repo's CI lint job started failing across all open PRs because
black 26.x removes the blank line between module docstrings and the
first import. Ten pre-existing files in main were not yet conformant
to this style.

This change is purely mechanical (black formatting only, no semantic
edits) and is required to unblock CI for this PR. All affected files
are unrelated to the Cerebras provider, but without this fix the
lint job blocks the merge.
@tommyyzhao
Copy link
Copy Markdown
Author

CI failures fixed (commits c67bce3 + a7c9386)

Two issues caught by CI on the previous push:

1. Test failures (test (3.10/3.11/3.12)) — fixed in c67bce3

test_restrictions_filter_auto_mode_routing and test_multiple_model_restrictions were calling ModelProviderRegistry.get_provider(ProviderType.CEREBRAS), which requires a real CEREBRAS_API_KEY in the environment. CI does not set one, so the registry returned None and the assertions failed. Locally the tests passed because the developer .env provided a key.

Fix: added "CEREBRAS_API_KEY": "test-key" to the @patch.dict(os.environ, ...) decorator on both tests so they're hermetic. Verified locally with env -u CEREBRAS_API_KEY -u CEREBRAS_ALLOWED_MODELS python -m pytest tests/test_cerebras_provider.py → 23/23 pass.

2. Lint failure (black --check) — fixed in a7c9386

The lint job is failing on 10 files unrelated to this PR (simulator_tests/test_*.py, tests/test_directory_expansion_tracking.py, tests/test_docker_implementation.py, tests/test_prompt_regression.py). This appears to be pre-existing drift on main: black 26.x removes the blank line between module docstrings and the first import, but those files in main still have the old style.

This is not unique to this PR — it affects every recent PR (looking at the Actions tab, multiple unrelated PRs from the past two days are failing the same lint check). The last successful Tests run on a non-cerebras branch was PR #426 (feat/update-sota-models-april-2026) on April 6, before this style drift surfaced.

Fix: ran black . --exclude="test_simulation_files/" and committed the result as a separate style: commit so it's easy for the maintainer to review or revert independently from the Cerebras work. The change is purely mechanical — only blank-line removal, no semantic edits — and was required to get our PR's lint job green.

If you'd prefer to handle the formatting drift in a separate PR, I'm happy to revert a7c9386 — but in that case our PR will remain blocked until main is reformatted.

Verification

889 passed, 4 skipped, 16 deselected   ← unit tests (full suite, with CEREBRAS_API_KEY unset)
black --check .                          ← 252 files clean
ruff check .                              ← all checks passed
isort --check-only .                      ← clean

cc @guidedways

Addresses Codex review feedback. The new cerebras_models.json manifest
was already shipped via the [tool.setuptools.package-data] glob
('conf/*.json'), so importlib.resources-based runtime loading works
correctly for wheel installs. However the explicit
[tool.setuptools.data-files] list — which mirrors the package-data
glob and ships every other model manifest to <prefix>/conf/ — was
missing the cerebras entry.

Adding it for consistency with the seven other model JSON files and
to ensure any deployment path that relies on the data-files install
location (e.g. uvx) finds the manifest.

Verified by rebuilding the wheel and confirming cerebras_models.json
now appears in both 'conf/' (package-data) and
'pal_mcp_server-9.8.2.data/data/conf/' (data-files) locations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant