Skip to content

Commit 34969b2

Browse files
Add Cerebras provider support
- Add CerebrasProvider using OpenAI SDK for Cerebras's OpenAI-compatible API - Support for Llama 3.3, Qwen 3, GPT-OSS, and GLM models - Add cerebras optional dependency group - Add documentation with usage examples and UTM tracking
1 parent eae558b commit 34969b2

File tree

4 files changed

+93
-18
lines changed

4 files changed

+93
-18
lines changed

docs/models/cerebras.md

Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
# Cerebras
2+
3+
Cerebras provides ultra-fast inference using their Wafer-Scale Engine (WSE), delivering predictable performance for any workload.
4+
5+
## Installation
6+
7+
To use Cerebras, you need to either install `pydantic-ai`, or install `pydantic-ai-slim` with the `cerebras` optional group:
8+
9+
```bash
10+
# pip
11+
pip install "pydantic-ai-slim[cerebras]"
12+
13+
# uv
14+
uv add "pydantic-ai-slim[cerebras]"
15+
```
16+
17+
## Configuration
18+
19+
To use Cerebras, go to [cloud.cerebras.ai](https://cloud.cerebras.ai/?utm_source=3pi_pydantic-ai&utm_campaign=partner_doc) to get an API key.
20+
21+
### Environment Variable
22+
23+
Set your API key as an environment variable:
24+
25+
```bash
26+
export CEREBRAS_API_KEY='your-api-key'
27+
```
28+
29+
### Available Models
30+
31+
Cerebras supports the following models:
32+
33+
- `llama-3.3-70b` (recommended) - Latest Llama 3.3 model
34+
- `llama-3.1-8b` - Llama 3.1 8B (faster, smaller)
35+
- `qwen-3-235b-a22b-instruct-2507` - Qwen 3 235B
36+
- `qwen-3-32b` - Qwen 3 32B
37+
- `gpt-oss-120b` - GPT-OSS 120B
38+
- `zai-glm-4.6` - GLM 4.6 model
39+
40+
41+
See the [Cerebras documentation](https://inference-docs.cerebras.ai/introduction?utm_source=3pi_pydantic-ai&utm_campaign=partner_doc) for the latest models.
42+
43+
## Usage
44+
45+
### Simple Usage (Recommended)
46+
47+
```python
48+
from pydantic_ai import Agent
49+
50+
agent = Agent('cerebras:llama-3.3-70b')
51+
result = agent.run_sync('What is the capital of France?')
52+
print(result.output)
53+
#> The capital of France is Paris.
54+
```
55+
56+
### Async Usage
57+
58+
```python
59+
import asyncio
60+
from pydantic_ai import Agent
61+
62+
agent = Agent('cerebras:llama-3.3-70b')
63+
64+
async def main():
65+
result = await agent.run('What is the capital of France?')
66+
print(result.output)
67+
#> The capital of France is Paris.
68+
69+
asyncio.run(main())
70+
```
71+
72+
## Why Cerebras?
73+
74+
- **Ultra-fast inference** - Powered by the world's largest AI chip (WSE)
75+
- **Predictable performance** - Consistent latency for any workload
76+
- **OpenAI-compatible** - Drop-in replacement for OpenAI API
77+
- **Cost-effective** - Competitive pricing with superior performance
78+
79+
## Resources
80+
81+
- [Cerebras Inference Documentation](https://inference-docs.cerebras.ai?utm_source=3pi_pydantic-ai&utm_campaign=partner_doc)
82+
- [Get API Key](https://cloud.cerebras.ai/?utm_source=3pi_pydantic-ai&utm_campaign=partner_doc)
83+
- [Model Pricing](https://cerebras.ai/pricing?utm_source=3pi_pydantic-ai&utm_campaign=partner_doc)

mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ nav:
3030
- models/anthropic.md
3131
- models/google.md
3232
- models/bedrock.md
33+
- models/cerebras.md
3334
- models/cohere.md
3435
- models/groq.md
3536
- models/mistral.md

pydantic_ai_slim/pydantic_ai/providers/cerebras.py

Lines changed: 8 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,6 @@
1010
from pydantic_ai.models import cached_async_http_client
1111
from pydantic_ai.profiles.harmony import harmony_model_profile
1212
from pydantic_ai.profiles.meta import meta_model_profile
13-
from pydantic_ai.profiles.openai import OpenAIJsonSchemaTransformer, OpenAIModelProfile
1413
from pydantic_ai.profiles.qwen import qwen_model_profile
1514
from pydantic_ai.providers import Provider
1615

@@ -39,27 +38,18 @@ def client(self) -> AsyncOpenAI:
3938
return self._client
4039

4140
def model_profile(self, model_name: str) -> ModelProfile | None:
42-
prefix_to_profile = {'llama': meta_model_profile, 'qwen': qwen_model_profile, 'gpt-oss': harmony_model_profile}
41+
prefix_to_profile = {
42+
'llama': meta_model_profile,
43+
'qwen': qwen_model_profile,
44+
'gpt-oss': harmony_model_profile,
45+
}
4346

44-
profile = None
4547
for prefix, profile_func in prefix_to_profile.items():
4648
model_name = model_name.lower()
4749
if model_name.startswith(prefix):
48-
profile = profile_func(model_name)
49-
50-
# According to https://inference-docs.cerebras.ai/resources/openai#currently-unsupported-openai-features,
51-
# Cerebras doesn't support some model settings.
52-
unsupported_model_settings = (
53-
'frequency_penalty',
54-
'logit_bias',
55-
'presence_penalty',
56-
'parallel_tool_calls',
57-
'service_tier',
58-
)
59-
return OpenAIModelProfile(
60-
json_schema_transformer=OpenAIJsonSchemaTransformer,
61-
openai_unsupported_model_settings=unsupported_model_settings,
62-
).update(profile)
50+
return profile_func(model_name)
51+
52+
return None
6353

6454
@overload
6555
def __init__(self) -> None: ...

pydantic_ai_slim/pyproject.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,7 @@ cohere = ["cohere>=5.18.0; platform_system != 'Emscripten'"]
7272
vertexai = ["google-auth>=2.36.0", "requests>=2.32.2"]
7373
google = ["google-genai>=1.51.0"]
7474
anthropic = ["anthropic>=0.70.0"]
75+
cerebras = ["openai>=1.107.2"]
7576
groq = ["groq>=0.25.0"]
7677
mistral = ["mistralai>=1.9.10"]
7778
bedrock = ["boto3>=1.40.14"]

0 commit comments

Comments
 (0)