Skip to content
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 65 additions & 0 deletions docs/models/cerebras.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# Cerebras

Cerebras provides ultra-fast inference using their Wafer-Scale Engine (WSE), delivering predictable performance for any workload.

## Installation

To use Cerebras, you need to either install `pydantic-ai`, or install `pydantic-ai-slim` with the `cerebras` optional group:

```bash
# pip
pip install "pydantic-ai-slim[cerebras]"

# uv
uv add "pydantic-ai-slim[cerebras]"
```

## Configuration

To use Cerebras, go to [cloud.cerebras.ai](https://cloud.cerebras.ai/?utm_source=3pi_pydantic-ai&utm_campaign=partner_doc) to get an API key.

### Environment Variable

Set your API key as an environment variable:

```bash
export CEREBRAS_API_KEY='your-api-key'
```

### Available Models

Cerebras supports the following models:

- `llama-3.3-70b` (recommended) - Latest Llama 3.3 model
- `llama-3.1-8b` - Llama 3.1 8B (faster, smaller)
- `qwen-3-235b-a22b-instruct-2507` - Qwen 3 235B
- `qwen-3-32b` - Qwen 3 32B
- `gpt-oss-120b` - GPT-OSS 120B
- `zai-glm-4.6` - GLM 4.6 model


See the [Cerebras documentation](https://inference-docs.cerebras.ai/introduction?utm_source=3pi_pydantic-ai&utm_campaign=partner_doc) for the latest models.

## Usage

```python
from pydantic_ai import Agent

agent = Agent('cerebras:llama-3.3-70b')
result = agent.run_sync('What is the capital of France?')
print(result.output)
#> The capital of France is Paris.
```

## Why Cerebras?

- **Ultra-fast inference** - Powered by the world's largest AI chip (WSE)
- **Predictable performance** - Consistent latency for any workload
- **OpenAI-compatible** - Drop-in replacement for OpenAI API
- **Cost-effective** - Competitive pricing with superior performance

## Resources

- [Cerebras Inference Documentation](https://inference-docs.cerebras.ai?utm_source=3pi_pydantic-ai&utm_campaign=partner_doc)
- [Get API Key](https://cloud.cerebras.ai/?utm_source=3pi_pydantic-ai&utm_campaign=partner_doc)
- [Model Pricing](https://cerebras.ai/pricing?utm_source=3pi_pydantic-ai&utm_campaign=partner_doc)
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ nav:
- models/anthropic.md
- models/google.md
- models/bedrock.md
- models/cerebras.md
- models/cohere.md
- models/groq.md
- models/mistral.md
Expand Down
26 changes: 8 additions & 18 deletions pydantic_ai_slim/pydantic_ai/providers/cerebras.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@
from pydantic_ai.models import cached_async_http_client
from pydantic_ai.profiles.harmony import harmony_model_profile
from pydantic_ai.profiles.meta import meta_model_profile
from pydantic_ai.profiles.openai import OpenAIJsonSchemaTransformer, OpenAIModelProfile
from pydantic_ai.profiles.qwen import qwen_model_profile
from pydantic_ai.providers import Provider

Expand Down Expand Up @@ -39,27 +38,18 @@ def client(self) -> AsyncOpenAI:
return self._client

def model_profile(self, model_name: str) -> ModelProfile | None:
prefix_to_profile = {'llama': meta_model_profile, 'qwen': qwen_model_profile, 'gpt-oss': harmony_model_profile}
prefix_to_profile = {
'llama': meta_model_profile,
'qwen': qwen_model_profile,
'gpt-oss': harmony_model_profile,
}

profile = None
for prefix, profile_func in prefix_to_profile.items():
model_name = model_name.lower()
if model_name.startswith(prefix):
profile = profile_func(model_name)

# According to https://inference-docs.cerebras.ai/resources/openai#currently-unsupported-openai-features,
# Cerebras doesn't support some model settings.
unsupported_model_settings = (
'frequency_penalty',
'logit_bias',
'presence_penalty',
'parallel_tool_calls',
'service_tier',
)
return OpenAIModelProfile(
json_schema_transformer=OpenAIJsonSchemaTransformer,
openai_unsupported_model_settings=unsupported_model_settings,
).update(profile)
return profile_func(model_name)

return None

@overload
def __init__(self) -> None: ...
Expand Down
1 change: 1 addition & 0 deletions pydantic_ai_slim/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,7 @@ cohere = ["cohere>=5.18.0; platform_system != 'Emscripten'"]
vertexai = ["google-auth>=2.36.0", "requests>=2.32.2"]
google = ["google-genai>=1.51.0"]
anthropic = ["anthropic>=0.70.0"]
cerebras = ["openai>=1.107.2"]
groq = ["groq>=0.25.0"]
mistral = ["mistralai>=1.9.10"]
bedrock = ["boto3>=1.40.14"]
Expand Down
3 changes: 1 addition & 2 deletions tests/providers/test_cerebras.py
Original file line number Diff line number Diff line change
Expand Up @@ -82,5 +82,4 @@ def test_cerebras_provider_model_profile(mocker: MockerFixture):
assert openai_profile.json_schema_transformer == OpenAIJsonSchemaTransformer

unknown_profile = provider.model_profile('unknown-model')
assert unknown_profile is not None
assert unknown_profile.json_schema_transformer == OpenAIJsonSchemaTransformer
assert unknown_profile is None