feat: Add Cerebras Inference provider (ZAI-GLM, GPT-OSS, Qwen3, Llama) by tommyyzhao · Pull Request #429 · BeehiveInnovations/pal-mcp-server

tommyyzhao · 2026-04-08T01:04:21Z

Description

Adds first-class support for the Cerebras Inference API as a new native provider, exposing four models across the Cerebras catalogue (ZAI-GLM, OpenAI GPT-OSS, Qwen3, and Llama 3.1). The provider is implemented as an OpenAI-compatible registry-backed provider, mirroring the existing X.AI (providers/xai.py) pattern so the addition feels native to the codebase.

Cerebras delivers industry-leading inference speeds (~1000–3000 tok/s) which makes it a strong default for fast-turnaround tools like chat, consensus, and planner workflows. The default zai-glm-4.7 is also the only model available on the Cerebras Code (Tier 1) plan, so BALANCED auto-mode routing prefers it first to ensure out-of-the-box compatibility with free-tier users.

Changes Made

New provider: providers/cerebras.py — CerebrasModelProvider extending RegistryBackedProviderMixin + OpenAICompatibleProvider with category-aware routing across all four models.
New registry loader: providers/registries/cerebras.py — CerebrasModelRegistry backed by conf/cerebras_models.json (overridable via CEREBRAS_MODELS_CONFIG_PATH).

New model manifest: conf/cerebras_models.json — capability metadata for all four models. Specs verified against the live /models endpoint and Cerebras inference documentation:

Model	Context	Max Out	Speed	Intelligence	Aliases
`gpt-oss-120b`	131K	40K	~3000 tok/s	17	`gpt-oss`, `oss-120b`, `openai-oss`
`qwen-3-235b-a22b-instruct-2507`	131K	40K	~1400 tok/s	16	`qwen3`, `qwen-3`, `qwen235b`, `qwen3-235b`
`zai-glm-4.7` (default)	131K	40K	~1000 tok/s	14	`cerebras`, `glm`, `glm-4.7`, `zai`, `zai-glm`
`llama3.1-8b`	32K	8K	~2200 tok/s	9	`llama8b`, `llama-8b`, `llama3.1`, `llama3-8b`

Provider type & registration: Added ProviderType.CEREBRAS to providers/shared/provider_type.py, wired CEREBRAS_API_KEY mapping and priority slot (after XAI, before DIAL) in providers/registry.py, and registered in server.configure_providers() in server.py.
listmodels tool integration: Added ProviderType.CEREBRAS entry to the provider_info display table in tools/listmodels.py so Cerebras models show up alongside the other native providers.
Category routing: get_preferred_model() maps BALANCED → zai-glm-4.7 (Code plan default), EXTENDED_REASONING → gpt-oss-120b, FAST_RESPONSE → llama3.1-8b, with ordered fallback lists across the full catalogue.
Model restrictions: Supports CEREBRAS_ALLOWED_MODELS environment variable for tenant-level restriction.
.env.example: Added Cerebras section with API key URL, optional restrictions variable, config path override, and full list of models + aliases.
docs/configuration.md: Added Cerebras row to the provider catalogue table and added conf/cerebras_models.json to the manifest list.
Unit tests (tests/test_cerebras_provider.py): 22 new tests covering provider initialization, alias resolution, per-model capabilities, category routing, fallback behaviour, model restrictions, and critical alias-resolution-before-API-call verification (mock-based).
Test fixture update (tests/test_auto_mode_model_listing.py): Added CEREBRAS_API_KEY to environment cleanup lists so existing auto-mode tests continue to isolate Cerebras correctly.
Breaking changes introduced (N/A)
Dependencies added/removed (N/A — reuses existing openai SDK)

Test plan

All unit tests pass: 888 passed, 4 skipped, 16 deselected (22 new Cerebras tests + no regressions in the existing suite)
Linting clean: ruff check ., black --check ., and isort --check-only . all pass on touched files
Live API validation: All four models + their aliases exercised end-to-end against the production Cerebras API (https://api.cerebras.ai/v1):
- model: "cerebras" → resolved to zai-glm-4.7 → ✅ round-trip success
- model: "gpt-oss" → resolved to gpt-oss-120b → ✅ round-trip success
- model: "qwen3" → resolved to qwen-3-235b-a22b-instruct-2507 → ✅ round-trip success
- model: "llama8b" → resolved to llama3.1-8b → ✅ round-trip success
listmodels verification: Confirmed the Cerebras section renders all four models and all 16 aliases correctly
Cerebras /models endpoint audit: The four canonical model IDs in the manifest match exactly what the Cerebras API currently exposes

Design Notes

Pattern consistency. The implementation closely mirrors providers/xai.py + providers/registries/xai.py + conf/xai_models.json so readers already familiar with the XAI provider will find the Cerebras one structurally identical.

supports_extended_thinking is intentionally false for all four models. Cerebras's reasoning models (gpt-oss-120b, zai-glm-4.7) do reason internally, but they do not expose the reasoning-token protocol that the tools layer uses to inject thinking_mode parameters. Setting the flag to true would cause the tools layer to send parameters the Cerebras API would silently ignore. If Cerebras adds a compatible reasoning-effort parameter in the future, the flag can be flipped and the OpenAICompatibleProvider.generate_content() path extended to forward it.

zai-glm-4.7 as the BALANCED default. This is a deliberate choice: it is the only model available on the Cerebras Code (free) plan. Routing BALANCED to any paid-tier model first would break auto-mode for free-tier users. The preference lists still upgrade to gpt-oss-120b / llama3.1-8b for their specialist categories when those models are available (paid tier).

llama3.1-8b context window (32K, not 128K). This reflects the Cerebras-side serving limit, not the base Meta model spec. Documented in the manifest description so users aren't surprised.

Related Files (for reviewer navigation)

File	Role
`providers/cerebras.py`	Provider class + routing logic
`providers/registries/cerebras.py`	Registry loader
`conf/cerebras_models.json`	Model capability manifest
`tests/test_cerebras_provider.py`	Unit tests (22 tests)
`providers/shared/provider_type.py`	`ProviderType.CEREBRAS` enum addition
`providers/registry.py`	API key mapping + priority order
`server.py`	`configure_providers()` registration
`tools/listmodels.py`	`provider_info` display entry
`.env.example`	User-facing configuration docs
`docs/configuration.md`	Provider catalogue table

Add support for Cerebras Inference API with the zai-glm-4.7 model. Implements RegistryBackedProviderMixin + OpenAICompatibleProvider pattern matching the existing XAI provider structure. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Set supports_extended_thinking to false (no reasoning-token protocol) - Collapse get_preferred_model to single branch (single-model provider) - Remove dead FALLBACK_MODEL constant (identical to PRIMARY_MODEL) - Fix "131K" to "128K" in .env.example comments (131072 tokens = 128K) - Add missing blank line between Cerebras and DIAL sections in .env.example - Add missing zai-glm alias to .env.example model docs - Use "range" string for temperature_constraint (matching registry format) - Reorder ProviderType enum to match PROVIDER_PRIORITY_ORDER - Remove duplicate monkeypatch.delenv in test_auto_mode_model_listing - Remove redundant inner MagicMock import in test - Update all test assertions to match corrected capabilities Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The listmodels tool hardcodes a provider_info dict for native providers but was missing CEREBRAS, so it never appeared in the output even though the provider was correctly registered at startup. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Verified via Cerebras /models endpoint. All three are available on the paid tier. Key specs (from inference-docs.cerebras.ai): - gpt-oss-120b: 131K ctx, 40K out, ~3000 tok/s, internal chain-of-thought - qwen-3-235b-a22b-instruct-2507: 131K ctx, 40K out, ~1400 tok/s - llama3.1-8b: 32K ctx (Cerebras limit), 8K out, ~2200 tok/s Also introduces real category routing in get_preferred_model now that multiple models exist: EXTENDED_REASONING→gpt-oss-120b, BALANCED→qwen3, FAST_RESPONSE→llama3.1-8b with graceful fallbacks throughout. Adds 7 new tests (22 total) covering per-model capabilities, alias resolution, category routing, and fallback behaviour. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

zai-glm-4.7 is the only model on the Cerebras Code (free) plan and must be the default. BALANCED routing now prefers it first so Code plan users always get a working model in auto mode. Paid-tier models (gpt-oss-120b, qwen-3-235b, llama3.1-8b) remain preferred for their respective specialist categories (EXTENDED_REASONING, FAST_RESPONSE). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Now that the Cerebras provider ships with four models (zai-glm-4.7, gpt-oss-120b, qwen-3-235b, llama3.1-8b), update stale single-model references in .env.example, server.py startup log, provider priority comment, and the configuration docs table. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c54584df81

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-04-08T01:08:53Z

+        for model in preference:
+            if model in allowed_models:
+                return model


Apply Cerebras allowlist before category model routing

This selector assumes allowed_models is already restriction-filtered, but CEREBRAS_ALLOWED_MODELS is not wired into ModelRestrictionService.ENV_VARS, so auto-mode can pass the full Cerebras catalog here. In that case EXTENDED_REASONING/FAST_RESPONSE may pick gpt-oss-120b or llama3.1-8b even when only zai-glm-4.7 is configured, and the request then fails later when generate_content() enforces the provider allowlist. Please enforce the provider allowlist before choosing from preference (or add Cerebras to the centralized restriction mapping) so auto-mode never selects a disallowed model.

Useful? React with 👍 / 👎.

Great catch — confirmed real bug and fixed in c10f78d.

CEREBRAS_ALLOWED_MODELS was documented in .env.example but missing from ModelRestrictionService.ENV_VARS (utils/model_restrictions.py:51), so the env var was silently ignored. The downstream impact was exactly as you described: auto-mode could route EXTENDED_REASONING/FAST_RESPONSE to gpt-oss-120b or llama3.1-8b even when a Code-plan user tried to restrict to zai-glm-4.7.

Fix (smaller and lower-risk than restructuring the routing): added ProviderType.CEREBRAS: "CEREBRAS_ALLOWED_MODELS" to the centralized ENV_VARS mapping. This is the correct architectural location — it's where every other native provider is wired (OPENAI, GOOGLE, XAI, OPENROUTER, DIAL) — and it lets the existing _get_allowed_models_for_provider() filter naturally pre-filter the catalogue before get_preferred_model() ever sees it. No changes needed to the routing code itself.

Regression coverage added in the same commit:

test_model_restrictions now asserts that paid-tier models (gpt-oss-120b, qwen-3-235b, llama3.1-8b) are explicitly rejected when only zai-glm-4.7 is allowlisted — not just that the allowed model passes.

New test_restrictions_filter_auto_mode_routing test that exercises the full ModelProviderRegistry._get_allowed_models_for_provider() → get_preferred_model() path with CEREBRAS_ALLOWED_MODELS=zai-glm-4.7 set, asserting all three categories (BALANCED, EXTENDED_REASONING, FAST_RESPONSE) return zai-glm-4.7 and never the paid-tier models.

Both tests would have caught this bug. Thanks!

gemini-code-assist

Code Review

This pull request introduces integration for the Cerebras Inference API, adding a new model provider, registry loader, and metadata configuration for models such as GPT-OSS, Qwen-3, and Llama 3.1. The implementation includes category-based model routing, environment variable support, and comprehensive unit tests. Feedback for this PR includes adding a _resolve_model_name method to the provider to ensure model aliases are correctly resolved before API calls and correcting context window descriptions in the model metadata for consistency with binary prefixes.

gemini-code-assist · 2026-04-08T01:11:10Z

+    _REASONING_PREFERENCE = ["gpt-oss-120b", "qwen-3-235b-a22b-instruct-2507", "zai-glm-4.7", "llama3.1-8b"]
+    _BALANCED_PREFERENCE = ["zai-glm-4.7", "qwen-3-235b-a22b-instruct-2507", "gpt-oss-120b", "llama3.1-8b"]
+    _FAST_PREFERENCE = ["llama3.1-8b", "zai-glm-4.7", "qwen-3-235b-a22b-instruct-2507", "gpt-oss-120b"]
+


The CerebrasModelProvider class is missing the _resolve_model_name method. While the base ModelProvider might have a generic implementation, native providers in this repository typically implement this method to leverage the registry's optimized alias_map for resolving shorthands (like cerebras or glm) to canonical names before making API calls. This is critical for ensuring that aliases defined in conf/cerebras_models.json work correctly in generate_content.

Suggested change

def _resolve_model_name(self, model_name: str) -> str:

"""Resolve model name or alias to canonical name."""

self._ensure_registry()

if self._registry:

config = self._registry.resolve(model_name)

if config:

return config.model_name

return model_name

Respectfully pushing back on this one — I believe the suggestion is unnecessary and the suggested code would not work as drafted.

Why no override is needed: The base class ModelProvider._resolve_model_name() (providers/base.py:404) already does proper alias resolution by calling get_all_model_capabilities() and ModelCapabilities.collect_aliases(). OpenAICompatibleProvider calls self._resolve_model_name() in three places (providers/openai_compatible.py:83, 550, 725) before any API call.

The reference XAI provider doesn't override it either. providers/xai.py extends the same RegistryBackedProviderMixin + OpenAICompatibleProvider pair and relies on the base class implementation. The Cerebras provider intentionally mirrors that pattern for consistency with the existing codebase.

It's already test-covered. tests/test_cerebras_provider.py::test_generate_content_resolves_alias_before_api_call mocks the OpenAI client and asserts that when generate_content(model_name="cerebras") is called, the underlying API receives "zai-glm-4.7" (the resolved canonical name), not the alias. test_generate_content_other_aliases does the same for glm, glm-4.7, zai, and zai-glm. test_resolve_model_name verifies the resolution directly.

It has been validated end-to-end against the live Cerebras API. Before opening this PR, all 16 aliases (4 for gpt-oss-120b, 4 for qwen3-235b, 5 for zai-glm-4.7, 4 for llama3.1-8b) were manually exercised via real api.cerebras.ai/v1 calls and confirmed to resolve correctly.

The suggested code would not work as drafted. self._registry is not the attribute used by RegistryBackedProviderMixin — the mixin exposes a class-level registry via _REGISTRY and the _ensure_registry() hook. Adopting the snippet as written would either crash or silently no-op on first call. If there's a different optimization path you have in mind, happy to discuss, but the current implementation works correctly and matches the established pattern in the repo.

gemini-code-assist · 2026-04-08T01:11:10Z

+        "openai-oss"
+      ],
+      "intelligence_score": 17,
+      "description": "OpenAI GPT-OSS 120B — ultra-fast inference (~3000 tok/s), 131K context, internal chain-of-thought reasoning, strong agentic/tool-use",


The description for gpt-oss-120b mentions a 131K context, but the context_window value of 131072 is exactly 128K (128 * 1024). For consistency with zai-glm-4.7 (line 76) and llama3.1-8b (line 98), which use binary prefixes, this should be updated to 128K.

Suggested change

"description": "OpenAI GPT-OSS 120B — ultra-fast inference (~3000 tok/s), 131K context, internal chain-of-thought reasoning, strong agentic/tool-use",

"description": "OpenAI GPT-OSS 120B — ultra-fast inference (~3000 tok/s), 128K context, internal chain-of-thought reasoning, strong agentic/tool-use",

Good catch, fixed in c10f78d. Updated the description to 128K context for consistency with the zai-glm-4.7 and llama3.1-8b entries which already use binary prefixes (131072 = 128 × 1024 = 128K).

gemini-code-assist · 2026-04-08T01:11:10Z

+        "qwen3-235b"
+      ],
+      "intelligence_score": 16,
+      "description": "Qwen3-235B-A22B Instruct 2507 — fast inference (~1400 tok/s), 131K context, strong coding/reasoning/tool use",


Similar to the gpt-oss-120b entry, the description for qwen-3 mentions 131K context while the value 131072 corresponds to 128K. Please update the description for consistency.

Suggested change

"description": "Qwen3-235B-A22B Instruct 2507 — fast inference (~1400 tok/s), 131K context, strong coding/reasoning/tool use",

"description": "Qwen3-235B-A22B Instruct 2507 — fast inference (~1400 tok/s), 128K context, strong coding/reasoning/tool use",

Fixed in c10f78d, same change as the gpt-oss-120b entry above. 131K → 128K in the qwen-3 description. Also updated .env.example to be consistent across all four entries.

Addresses code review feedback on PR BeehiveInnovations#429. CRITICAL (Codex P1): CEREBRAS was missing from ModelRestrictionService.ENV_VARS, so the CEREBRAS_ALLOWED_MODELS env var documented in .env.example was silently ignored. Auto-mode could pick gpt-oss-120b or llama3.1-8b for EXTENDED_REASONING/FAST_RESPONSE even when a Cerebras Code (free) plan user tried to restrict to zai-glm-4.7, causing later API failures. Now correctly wired alongside the other *_ALLOWED_MODELS env vars. Strengthens test_model_restrictions to assert paid-tier models are REJECTED (not just that allowed models pass), and adds a new test_restrictions_filter_auto_mode_routing regression test that proves the registry's allowlist filter prevents get_preferred_model from selecting disallowed paid-tier models. Also fixes test_multiple_model_restrictions to register the provider with the registry so alias-to-canonical resolution works inside the restriction service. DOCS (Gemini medium): Fix "131K context" → "128K context" in the gpt-oss-120b and qwen-3-235b descriptions for consistency with the other entries (131072 = 128 × 1024 = 128K binary). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

tommyyzhao · 2026-04-08T05:50:14Z

Review feedback addressed (commit `c10f78d`)

Thanks @gemini-code-assist and @chatgpt-codex-connector for the review. Here is a summary of how each comment was triaged:

✅ Codex P1 — `CEREBRAS_ALLOWED_MODELS` not wired into restriction service (fixed)

Confirmed real bug. CEREBRAS was missing from ModelRestrictionService.ENV_VARS, so the env var documented in .env.example was silently ignored. Auto-mode could pick paid-tier models (gpt-oss-120b, llama3.1-8b) even when a Code-plan user tried to restrict to zai-glm-4.7, causing later API failures.

Fix: added ProviderType.CEREBRAS: "CEREBRAS_ALLOWED_MODELS" to the centralized ENV_VARS mapping in utils/model_restrictions.py — same pattern as every other native provider. The existing _get_allowed_models_for_provider() filter now correctly pre-filters the catalogue before get_preferred_model() ever sees it.

Regression coverage added in the same commit:

Strengthened test_model_restrictions to assert paid-tier models are explicitly rejected when only zai-glm-4.7 is allowlisted
New test_restrictions_filter_auto_mode_routing test exercises the full registry → routing path and asserts all three categories return zai-glm-4.7 (not paid-tier models) when restrictions are active
Also fixed pre-existing test_multiple_model_restrictions to register the provider with the registry (it was previously a no-op since restrictions weren't loading)

⚠️ Gemini high-pri — `_resolve_model_name` override (declined with rationale)

The base class ModelProvider._resolve_model_name() (providers/base.py:404) already handles alias resolution via get_all_model_capabilities() + ModelCapabilities.collect_aliases(). The reference XAI provider doesn't override it either. Behavior is verified by 3 unit tests (test_resolve_model_name, test_generate_content_resolves_alias_before_api_call, test_generate_content_other_aliases) and end-to-end live API validation against all 16 aliases. The suggested code as drafted references self._registry which is not the attribute used by RegistryBackedProviderMixin. Detailed reply in the inline thread.

✅ Gemini medium x2 — "131K context" → "128K context" (fixed)

Updated descriptions for gpt-oss-120b and qwen-3-235b-a22b-instruct-2507 in conf/cerebras_models.json to use binary prefix consistently with the other entries (131072 = 128 × 1024 = 128K). Also updated .env.example for consistency.

Test results after fixes

889 passed, 4 skipped, 16 deselected

+1 net test from the new restriction-routing regression test. All linting (ruff/black/isort) clean. Live API validation against api.cerebras.ai/v1 re-confirmed for all 4 models post-fix.

cc @guidedways — would love a maintainer review whenever you have a moment.

gemini-code-assist · 2026-04-08T05:50:30Z

Thanks for the detailed breakdown and for addressing the CEREBRAS_ALLOWED_MODELS bug. The added regression tests in test_model_restrictions and test_restrictions_filter_auto_mode_routing provide excellent coverage for this fix. Regarding the _resolve_model_name override, I appreciate the clarification and the verification via your unit tests; it makes sense to rely on the base class implementation given the existing test coverage. The consistency updates in conf/cerebras_models.json and .env.example are also appreciated. Everything looks solid.

The two restriction tests that exercise ModelProviderRegistry.get_provider() were failing in CI because that path requires a real API key from the environment, and CI does not set CEREBRAS_API_KEY. Locally the tests passed because the developer .env file provided a key. Adds CEREBRAS_API_KEY="test-key" to the patch.dict so the registry returns a real provider instance regardless of the host environment.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c10f78d359

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-04-08T05:59:23Z

+    def __init__(self, config_path: str | None = None) -> None:
+        super().__init__(
+            env_var_name="CEREBRAS_MODELS_CONFIG_PATH",
+            default_filename="cerebras_models.json",


Package Cerebras model manifest with the distribution

This registry assumes cerebras_models.json is available at runtime, but the packaging metadata was not updated to ship that new manifest (the existing setuptools data-files list still omits it), so wheel installs end up without conf/cerebras_models.json. In that environment _load_config_data() falls back to an empty model list, leaving CerebrasModelProvider with no capabilities and breaking Cerebras model resolution/auto routing for pip-installed deployments.

Useful? React with 👍 / 👎.

Thanks for the catch — fixed in de64532. Critical assessment: the underlying observation is correct, but the runtime impact statement is not.

Fixed: Added conf/cerebras_models.json to the [tool.setuptools.data-files] list in pyproject.toml for consistency with the other seven model manifests.

Verification: Rebuilt the wheel and confirmed cerebras_models.json now appears in both install locations:

$ python -m zipfile -l pal_mcp_server-9.8.2-py3-none-any.whl | grep cerebras_models conf/cerebras_models.json 4343 pal_mcp_server-9.8.2.data/data/conf/cerebras_models.json 4343

Important nuance on the runtime impact claim: The PR did not actually break Cerebras for wheel installs as stated. The file was already shipped via the [tool.setuptools.package-data] glob ("conf/*.json", line 24 of pyproject.toml), which catches all conf JSONs including the new one. The runtime registry loader (providers/registries/base.py:46-54) uses importlib.resources.files("conf").joinpath(default_filename) which reads from the package-data location, not the data-files location, so _load_config_data() would have found the manifest correctly in any wheel install.

The wheel-build verification before the fix confirmed this:

$ python -m zipfile -l pal_mcp_server-9.8.2-py3-none-any.whl | grep cerebras_models conf/cerebras_models.json 4343 ← already shipped

That said, the missing data-files entry was a real consistency oversight worth fixing — every other model manifest is listed there, and any deployment path that reads from <prefix>/conf/ (e.g. some uvx flows, or custom packagers) would have been inconsistent. So the fix is committed and the PR now matches the established pattern across all eight model manifests.

The repo's CI lint job started failing across all open PRs because black 26.x removes the blank line between module docstrings and the first import. Ten pre-existing files in main were not yet conformant to this style. This change is purely mechanical (black formatting only, no semantic edits) and is required to unblock CI for this PR. All affected files are unrelated to the Cerebras provider, but without this fix the lint job blocks the merge.

tommyyzhao · 2026-04-08T05:59:46Z

CI failures fixed (commits `c67bce3` + `a7c9386`)

Two issues caught by CI on the previous push:

1. Test failures (`test (3.10/3.11/3.12)`) — fixed in `c67bce3`

test_restrictions_filter_auto_mode_routing and test_multiple_model_restrictions were calling ModelProviderRegistry.get_provider(ProviderType.CEREBRAS), which requires a real CEREBRAS_API_KEY in the environment. CI does not set one, so the registry returned None and the assertions failed. Locally the tests passed because the developer .env provided a key.

Fix: added "CEREBRAS_API_KEY": "test-key" to the @patch.dict(os.environ, ...) decorator on both tests so they're hermetic. Verified locally with env -u CEREBRAS_API_KEY -u CEREBRAS_ALLOWED_MODELS python -m pytest tests/test_cerebras_provider.py → 23/23 pass.

2. Lint failure (`black --check`) — fixed in `a7c9386`

The lint job is failing on 10 files unrelated to this PR (simulator_tests/test_*.py, tests/test_directory_expansion_tracking.py, tests/test_docker_implementation.py, tests/test_prompt_regression.py). This appears to be pre-existing drift on main: black 26.x removes the blank line between module docstrings and the first import, but those files in main still have the old style.

This is not unique to this PR — it affects every recent PR (looking at the Actions tab, multiple unrelated PRs from the past two days are failing the same lint check). The last successful Tests run on a non-cerebras branch was PR #426 (feat/update-sota-models-april-2026) on April 6, before this style drift surfaced.

Fix: ran black . --exclude="test_simulation_files/" and committed the result as a separate style: commit so it's easy for the maintainer to review or revert independently from the Cerebras work. The change is purely mechanical — only blank-line removal, no semantic edits — and was required to get our PR's lint job green.

If you'd prefer to handle the formatting drift in a separate PR, I'm happy to revert a7c9386 — but in that case our PR will remain blocked until main is reformatted.

Verification

889 passed, 4 skipped, 16 deselected   ← unit tests (full suite, with CEREBRAS_API_KEY unset)
black --check .                          ← 252 files clean
ruff check .                              ← all checks passed
isort --check-only .                      ← clean

cc @guidedways

Addresses Codex review feedback. The new cerebras_models.json manifest was already shipped via the [tool.setuptools.package-data] glob ('conf/*.json'), so importlib.resources-based runtime loading works correctly for wheel installs. However the explicit [tool.setuptools.data-files] list — which mirrors the package-data glob and ships every other model manifest to <prefix>/conf/ — was missing the cerebras entry. Adding it for consistency with the seven other model JSON files and to ensure any deployment path that relies on the data-files install location (e.g. uvx) finds the manifest. Verified by rebuilding the wheel and confirming cerebras_models.json now appears in both 'conf/' (package-data) and 'pal_mcp_server-9.8.2.data/data/conf/' (data-files) locations.

tommyyzhao and others added 6 commits April 6, 2026 00:41

chatgpt-codex-connector bot reviewed Apr 8, 2026

View reviewed changes

gemini-code-assist bot reviewed Apr 8, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Apr 8, 2026

View reviewed changes

+    def _resolve_model_name(self, model_name: str) -> str:
+        """Resolve model name or alias to canonical name."""
+        self._ensure_registry()
+        if self._registry:
+            config = self._registry.resolve(model_name)
+            if config:
+                return config.model_name
+        return model_name

	"description": "OpenAI GPT-OSS 120B — ultra-fast inference (~3000 tok/s), 131K context, internal chain-of-thought reasoning, strong agentic/tool-use",
	"description": "OpenAI GPT-OSS 120B — ultra-fast inference (~3000 tok/s), 128K context, internal chain-of-thought reasoning, strong agentic/tool-use",

	"description": "Qwen3-235B-A22B Instruct 2507 — fast inference (~1400 tok/s), 131K context, strong coding/reasoning/tool use",
	"description": "Qwen3-235B-A22B Instruct 2507 — fast inference (~1400 tok/s), 128K context, strong coding/reasoning/tool use",

Uh oh!

Conversation

tommyyzhao commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes Made

Test plan

Design Notes

Related Files (for reviewer navigation)

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

tommyyzhao Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

tommyyzhao Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

tommyyzhao Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

tommyyzhao Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

tommyyzhao commented Apr 8, 2026

Review feedback addressed (commit c10f78d)

✅ Codex P1 — CEREBRAS_ALLOWED_MODELS not wired into restriction service (fixed)

⚠️ Gemini high-pri — _resolve_model_name override (declined with rationale)

✅ Gemini medium x2 — "131K context" → "128K context" (fixed)

Test results after fixes

Uh oh!

gemini-code-assist bot commented Apr 8, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

tommyyzhao Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

tommyyzhao commented Apr 8, 2026

CI failures fixed (commits c67bce3 + a7c9386)

1. Test failures (test (3.10/3.11/3.12)) — fixed in c67bce3

2. Lint failure (black --check) — fixed in a7c9386

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tommyyzhao commented Apr 8, 2026 •

edited

Loading

Review feedback addressed (commit `c10f78d`)

✅ Codex P1 — `CEREBRAS_ALLOWED_MODELS` not wired into restriction service (fixed)

⚠️ Gemini high-pri — `_resolve_model_name` override (declined with rationale)

CI failures fixed (commits `c67bce3` + `a7c9386`)

1. Test failures (`test (3.10/3.11/3.12)`) — fixed in `c67bce3`

2. Lint failure (`black --check`) — fixed in `a7c9386`