Skip to content

Commit a70c2a5

Browse files
authored
Add OutlinesModel to run Transformers, Llama.cpp, MLXLM, SGLang and vLLM via Outlines (#2623)
1 parent bfbf2ca commit a70c2a5

File tree

24 files changed

+4361
-35
lines changed

24 files changed

+4361
-35
lines changed

.github/workflows/ci.yml

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -169,6 +169,15 @@ jobs:
169169

170170
- run: uv run mcp-run-python example --deps=numpy
171171
- run: uv sync --only-dev
172+
173+
- name: cache HuggingFace models
174+
uses: actions/cache@v4
175+
with:
176+
path: ~/.cache/huggingface
177+
key: hf-${{ runner.os }}-${{ hashFiles('**/pyproject.toml') }}
178+
restore-keys: |
179+
hf-${{ runner.os }}-
180+
172181
- run: uv run ${{ matrix.install.command }} coverage run -m pytest --durations=100 -n auto --dist=loadgroup
173182
env:
174183
COVERAGE_FILE: .coverage/.coverage.${{ matrix.python-version }}-${{ matrix.install.name }}
@@ -206,6 +215,15 @@ jobs:
206215
- run: mkdir .coverage
207216

208217
- run: uv sync --group dev
218+
219+
- name: cache HuggingFace models
220+
uses: actions/cache@v4
221+
with:
222+
path: ~/.cache/huggingface
223+
key: hf-${{ runner.os }}-${{ hashFiles('**/pyproject.toml') }}
224+
restore-keys: |
225+
hf-${{ runner.os }}-
226+
209227
- run: uv run mcp-run-python example --deps=numpy
210228

211229
- run: unset UV_FROZEN
@@ -239,6 +257,14 @@ jobs:
239257
with:
240258
enable-cache: true
241259

260+
- name: cache HuggingFace models
261+
uses: actions/cache@v4
262+
with:
263+
path: ~/.cache/huggingface
264+
key: hf-${{ runner.os }}-${{ hashFiles('**/pyproject.toml') }}
265+
restore-keys: |
266+
hf-${{ runner.os }}-
267+
242268
- run: uv run --all-extras python tests/import_examples.py
243269

244270
coverage:

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ We built Pydantic AI with one simple aim: to bring that FastAPI feeling to GenAI
3939
[Pydantic Validation](https://docs.pydantic.dev/latest/) is the validation layer of the OpenAI SDK, the Google ADK, the Anthropic SDK, LangChain, LlamaIndex, AutoGPT, Transformers, CrewAI, Instructor and many more. _Why use the derivative when you can go straight to the source?_ :smiley:
4040

4141
2. **Model-agnostic**:
42-
Supports virtually every [model](https://ai.pydantic.dev/models/overview) and provider: OpenAI, Anthropic, Gemini, DeepSeek, Grok, Cohere, Mistral, and Perplexity; Azure AI Foundry, Amazon Bedrock, Google Vertex AI, Ollama, LiteLLM, Groq, OpenRouter, Together AI, Fireworks AI, Cerebras, Hugging Face, GitHub, Heroku, Vercel, Nebius, OVHcloud. If your favorite model or provider is not listed, you can easily implement a [custom model](https://ai.pydantic.dev/models/overview#custom-models).
42+
Supports virtually every [model](https://ai.pydantic.dev/models/overview) and provider: OpenAI, Anthropic, Gemini, DeepSeek, Grok, Cohere, Mistral, and Perplexity; Azure AI Foundry, Amazon Bedrock, Google Vertex AI, Ollama, LiteLLM, Groq, OpenRouter, Together AI, Fireworks AI, Cerebras, Hugging Face, GitHub, Heroku, Vercel, Nebius, OVHcloud, and Outlines. If your favorite model or provider is not listed, you can easily implement a [custom model](https://ai.pydantic.dev/models/overview#custom-models).
4343

4444
3. **Seamless Observability**:
4545
Tightly [integrates](https://ai.pydantic.dev/logfire) with [Pydantic Logfire](https://pydantic.dev/logfire), our general-purpose OpenTelemetry observability platform, for real-time debugging, evals-based performance monitoring, and behavior, tracing, and cost tracking. If you already have an observability platform that supports OTel, you can [use that too](https://ai.pydantic.dev/logfire#alternative-observability-backends).

docs/api/models/outlines.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
# `pydantic_ai.models.outlines`
2+
3+
## Setup
4+
5+
For details on how to set up this model, see [model configuration for Outlines](../../models/outlines.md).
6+
7+
::: pydantic_ai.models.outlines

docs/builtin-tools.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ making it ideal for queries that require up-to-date data.
3838
| Mistral || Not supported |
3939
| Cohere || Not supported |
4040
| HuggingFace || Not supported |
41+
| Outlines || Not supported |
4142

4243
### Usage
4344

@@ -129,6 +130,7 @@ in a secure environment, making it perfect for computational tasks, data analysi
129130
| Mistral || |
130131
| Cohere || |
131132
| HuggingFace || |
133+
| Outlines || |
132134

133135
### Usage
134136

@@ -321,6 +323,7 @@ allowing it to pull up-to-date information from the web.
321323
| Mistral || |
322324
| Cohere || |
323325
| HuggingFace || |
326+
| Outlines || |
324327

325328
### Usage
326329

docs/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ We built Pydantic AI with one simple aim: to bring that FastAPI feeling to GenAI
1414
[Pydantic Validation](https://docs.pydantic.dev/latest/) is the validation layer of the OpenAI SDK, the Google ADK, the Anthropic SDK, LangChain, LlamaIndex, AutoGPT, Transformers, CrewAI, Instructor and many more. _Why use the derivative when you can go straight to the source?_ :smiley:
1515

1616
2. **Model-agnostic**:
17-
Supports virtually every [model](models/overview.md) and provider: OpenAI, Anthropic, Gemini, DeepSeek, Grok, Cohere, Mistral, and Perplexity; Azure AI Foundry, Amazon Bedrock, Google Vertex AI, Ollama, LiteLLM, Groq, OpenRouter, Together AI, Fireworks AI, Cerebras, Hugging Face, GitHub, Heroku, Vercel, Nebius, OVHcloud. If your favorite model or provider is not listed, you can easily implement a [custom model](models/overview.md#custom-models).
17+
Supports virtually every [model](models/overview.md) and provider: OpenAI, Anthropic, Gemini, DeepSeek, Grok, Cohere, Mistral, and Perplexity; Azure AI Foundry, Amazon Bedrock, Google Vertex AI, Ollama, LiteLLM, Groq, OpenRouter, Together AI, Fireworks AI, Cerebras, Hugging Face, GitHub, Heroku, Vercel, Nebius, OVHcloud, and Outlines. If your favorite model or provider is not listed, you can easily implement a [custom model](models/overview.md#custom-models).
1818

1919
3. **Seamless Observability**:
2020
Tightly [integrates](logfire.md) with [Pydantic Logfire](https://pydantic.dev/logfire), our general-purpose OpenTelemetry observability platform, for real-time debugging, evals-based performance monitoring, and behavior, tracing, and cost tracking. If you already have an observability platform that supports OTel, you can [use that too](logfire.md#alternative-observability-backends).

docs/install.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,11 @@ pip/uv-add "pydantic-ai-slim[openai]"
5151
* `cohere` - installs `cohere` [PyPI ↗](https://pypi.org/project/cohere){:target="_blank"}
5252
* `bedrock` - installs `boto3` [PyPI ↗](https://pypi.org/project/boto3){:target="_blank"}
5353
* `huggingface` - installs `huggingface-hub[inference]` [PyPI ↗](https://pypi.org/project/huggingface-hub){:target="_blank"}
54+
* `outlines-transformers` - installs `outlines[transformers]` [PyPI ↗](https://pypi.org/project/outlines){:target="_blank"}
55+
* `outlines-llamacpp` - installs `outlines[llamacpp]` [PyPI ↗](https://pypi.org/project/outlines){:target="_blank"}
56+
* `outlines-mlxlm` - installs `outlines[mlxlm]` [PyPI ↗](https://pypi.org/project/outlines){:target="_blank"}
57+
* `outlines-sglang` - installs `outlines[sglang]` [PyPI ↗](https://pypi.org/project/outlines){:target="_blank"}
58+
* `outlines-vllm-offline` - installs `outlines[vllm-offline]` [PyPI ↗](https://pypi.org/project/outlines){:target="_blank"}
5459
* `duckduckgo` - installs `ddgs` [PyPI ↗](https://pypi.org/project/ddgs){:target="_blank"}
5560
* `tavily` - installs `tavily-python` [PyPI ↗](https://pypi.org/project/tavily-python){:target="_blank"}
5661
* `cli` - installs `rich` [PyPI ↗](https://pypi.org/project/rich){:target="_blank"}, `prompt-toolkit` [PyPI ↗](https://pypi.org/project/prompt-toolkit){:target="_blank"}, and `argcomplete` [PyPI ↗](https://pypi.org/project/argcomplete){:target="_blank"}

docs/models/outlines.md

Lines changed: 241 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,241 @@
1+
# Outlines
2+
3+
## Install
4+
5+
As Outlines is a library allowing you to run models from various different providers, it does not include the necessary dependencies for any provider by default. As a result, to use the [`OutlinesModel`][pydantic_ai.models.OutlinesModel], you must install `pydantic-ai-slim` with an optional group composed of outlines, a dash, and the name of the specific model provider you would use through Outlines. For instance:
6+
7+
```bash
8+
pip/uv-add "pydantic-ai-slim[outlines-transformers]"
9+
```
10+
11+
Or
12+
13+
```bash
14+
pip/uv-add "pydantic-ai-slim[outlines-mlxlm]"
15+
```
16+
17+
There are 5 optional groups for the 5 model providers supported through Outlines:
18+
19+
- `outlines-transformers`
20+
- `outlines-llamacpp`
21+
- `outlines-mlxlm`
22+
- `outlines-sglang`
23+
- `outlines-vllm-offline`
24+
25+
## Model Initialization
26+
27+
As Outlines is not an inference provider, but instead a library allowing you to run both local and API-based models, instantiating the model is a bit different from the other models available on Pydantic AI.
28+
29+
To initialize the `OutlinesModel` through the `__init__` method, the first argument you must provide has to be an `outlines.Model` or an `outlines.AsyncModel` instance.
30+
31+
For instance:
32+
33+
```python {test="skip"}
34+
import outlines
35+
from transformers import AutoModelForCausalLM, AutoTokenizer
36+
37+
from pydantic_ai.models.outlines import OutlinesModel
38+
39+
outlines_model = outlines.from_transformers(
40+
AutoModelForCausalLM.from_pretrained('erwanf/gpt2-mini'),
41+
AutoTokenizer.from_pretrained('erwanf/gpt2-mini')
42+
)
43+
model = OutlinesModel(outlines_model)
44+
```
45+
46+
As you already providing an Outlines model instance, there is no need to provide an `OutlinesProvider` yourself.
47+
48+
### Model Loading Methods
49+
50+
Alternatively, you can use some `OutlinesModel` class methods made to load a specific type of Outlines model directly. To do so, you must provide as arguments the same arguments you would have given to the associated Outlines model loading function (except in the case of SGLang).
51+
52+
There are methods for the 5 Outlines models that are officially supported in the integration into Pydantic AI:
53+
54+
- [`from_transformers`][pydantic_ai.models.OutlinesModel.from_transformers]
55+
- [`from_llamacpp`][pydantic_ai.models.OutlinesModel.from_llamacpp]
56+
- [`from_mlxlm`][pydantic_ai.models.OutlinesModel.from_mlxlm]
57+
- [`from_sglang`][pydantic_ai.models.OutlinesModel.from_sglang]
58+
- [`from_vllm_offline`][pydantic_ai.models.OutlinesModel.from_vllm_offline]
59+
60+
#### Transformers
61+
62+
```python {test="skip"}
63+
from transformers import AutoModelForCausalLM, AutoTokenizer
64+
65+
from pydantic_ai.models.outlines import OutlinesModel
66+
67+
model = OutlinesModel.from_transformers(
68+
AutoModelForCausalLM.from_pretrained('microsoft/Phi-3-mini-4k-instruct'),
69+
AutoTokenizer.from_pretrained('microsoft/Phi-3-mini-4k-instruct')
70+
)
71+
```
72+
73+
#### LlamaCpp
74+
75+
```python {test="skip"}
76+
from llama_cpp import Llama
77+
78+
from pydantic_ai.models.outlines import OutlinesModel
79+
80+
model = OutlinesModel.from_llamacpp(
81+
Llama.from_pretrained(
82+
repo_id='TheBloke/Mistral-7B-Instruct-v0.2-GGUF',
83+
filename='mistral-7b-instruct-v0.2.Q5_K_M.gguf',
84+
)
85+
)
86+
```
87+
88+
#### MLXLM
89+
90+
```python {test="skip"}
91+
from mlx_lm import load
92+
93+
from pydantic_ai.models.outlines import OutlinesModel
94+
95+
model = OutlinesModel.from_mlxlm(
96+
*load('mlx-community/TinyLlama-1.1B-Chat-v1.0-4bit')
97+
)
98+
```
99+
100+
#### SGLang
101+
102+
```python {test="skip"}
103+
from pydantic_ai.models.outlines import OutlinesModel
104+
105+
model = OutlinesModel.from_sglang(
106+
'http://localhost:11434',
107+
'api_key',
108+
'meta-llama/Llama-3.1-8B'
109+
)
110+
```
111+
112+
#### vLLM Offline
113+
114+
```python {test="skip"}
115+
from vllm import LLM
116+
117+
from pydantic_ai.models.outlines import OutlinesModel
118+
119+
model = OutlinesModel.from_vllm_offline(
120+
LLM('microsoft/Phi-3-mini-4k-instruct')
121+
)
122+
```
123+
124+
## Running the model
125+
126+
Once you have initialized an `OutlinesModel`, you can use it with an Agent as with all other Pydantic AI models.
127+
128+
As Outlines is focused on structured output, this provider supports the `output_type` component through the [`NativeOutput`][pydantic_ai.outputs.NativeOutput] format. There is not need to include information on the required output format in your prompt, instructions based on the `output_type` will be included automatically.
129+
130+
```python {test="skip"}
131+
from pydantic import BaseModel
132+
from transformers import AutoModelForCausalLM, AutoTokenizer
133+
134+
from pydantic_ai import Agent
135+
from pydantic_ai.models.outlines import OutlinesModel
136+
from pydantic_ai.settings import ModelSettings
137+
138+
139+
class Box(BaseModel):
140+
"""Class representing a box"""
141+
width: int
142+
height: int
143+
depth: int
144+
units: str
145+
146+
model = OutlinesModel.from_transformers(
147+
AutoModelForCausalLM.from_pretrained('microsoft/Phi-3-mini-4k-instruct'),
148+
AutoTokenizer.from_pretrained('microsoft/Phi-3-mini-4k-instruct')
149+
)
150+
agent = Agent(model, output_type=Box)
151+
152+
result = agent.run_sync(
153+
'Give me the dimensions of a box',
154+
model_settings=ModelSettings(extra_body={'max_new_tokens': 100})
155+
)
156+
print(result.output) # width=20 height=30 depth=40 units='cm'
157+
```
158+
159+
Outlines does not support tools yet, but support for that feature will be added in the near future.
160+
161+
## Multimodal models
162+
163+
If the model you are running through Outlines and the provider selected supports it, you can include images in your prompts using [`ImageUrl`][pydantic_ai.messages.ImageUrl] or [`BinaryImage`][pydantic_ai.messages.BinaryImage]. In that case, the prompt you provide when running the agent should be a list containing a string and one or several images. See the [input documentation](../input.md) for details and examples on using assets in model inputs.
164+
165+
This feature is supported in Outlines for the `SGLang` and `Transformers` models. If you want to run a multimodal model through `transformers`, you must provide a processor instead of a tokenizer as the second argument when initializing the model with the `OutlinesModel.from_transformers` method.
166+
167+
```python {test="skip"}
168+
from datetime import date
169+
from typing import Literal
170+
171+
import torch
172+
from pydantic import BaseModel
173+
from transformers import AutoProcessor, Qwen2VLForConditionalGeneration
174+
175+
from pydantic_ai import Agent, ModelSettings
176+
from pydantic_ai.messages import ImageUrl
177+
from pydantic_ai.models.outlines import OutlinesModel
178+
179+
MODEL_NAME = 'Qwen/Qwen2-VL-7B-Instruct'
180+
181+
class Item(BaseModel):
182+
name: str
183+
quantity: int | None
184+
price_per_unit: float | None
185+
total_price: float | None
186+
187+
class ReceiptSummary(BaseModel):
188+
store_name: str
189+
store_address: str
190+
store_number: int | None
191+
items: list[Item]
192+
tax: float | None
193+
total: float | None
194+
date: date
195+
payment_method: Literal['cash', 'credit', 'debit', 'check', 'other']
196+
197+
tf_model = Qwen2VLForConditionalGeneration.from_pretrained(
198+
MODEL_NAME,
199+
device_map='auto',
200+
dtype=torch.bfloat16
201+
)
202+
tf_processor = AutoProcessor.from_pretrained(
203+
MODEL_NAME,
204+
device_map='auto'
205+
)
206+
model = OutlinesModel.from_transformers(tf_model, tf_processor)
207+
208+
agent = Agent(model, output_type=ReceiptSummary)
209+
210+
result = agent.run_sync(
211+
[
212+
'You are an expert at extracting information from receipts. Please extract the information from the receipt. Be as detailed as possible, do not miss any information',
213+
ImageUrl('https://raw.githubusercontent.com/dottxt-ai/outlines/refs/heads/main/docs/examples/images/trader-joes-receipt.jpg')
214+
],
215+
model_settings=ModelSettings(extra_body={'max_new_tokens': 1000})
216+
)
217+
print(result.output)
218+
# store_name="Trader Joe's"
219+
# store_address='401 Bay Street, San Francisco, CA 94133'
220+
# store_number=0
221+
# items=[
222+
# Item(name='BANANA EACH', quantity=7, price_per_unit=0.23, total_price=1.61),
223+
# Item(name='BAREBELLS CHOCOLATE DOUG',quantity=1, price_per_unit=2.29, total_price=2.29),
224+
# Item(name='BAREBELLS CREAMY CRISP', quantity=1, price_per_unit=2.29, total_price=2.29),
225+
# Item(name='BAREBELLS CHOCOLATE DOUG', quantity=1, price_per_unit=2.29, total_price=2.29),
226+
# Item(name='BAREBELLS CARAMEL CASHEW', quantity=2, price_per_unit=2.29, total_price=4.58),
227+
# Item(name='BAREBELLS CREAMY CRISP', quantity=1, price_per_unit=2.29, total_price=2.29),
228+
# Item(name='T SPINDRIFT ORANGE MANGO 8', quantity=1, price_per_unit=7.49, total_price=7.49),
229+
# Item(name='T Bottle Deposit', quantity=8, price_per_unit=0.05, total_price=0.4),
230+
# Item(name='MILK ORGANIC GALLON WHOL', quantity=1, price_per_unit=6.79, total_price=6.79),
231+
# Item(name='CLASSIC GREEK SALAD', quantity=1, price_per_unit=3.49, total_price=3.49),
232+
# Item(name='COBB SALAD', quantity=1, price_per_unit=5.99, total_price=5.99),
233+
# Item(name='PEPPER BELL RED XL EACH', quantity=1, price_per_unit=1.29, total_price=1.29),
234+
# Item(name='BAG FEE.', quantity=1, price_per_unit=0.25, total_price=0.25),
235+
# Item(name='BAG FEE.', quantity=1, price_per_unit=0.25, total_price=0.25)]
236+
# tax=7.89
237+
# total=41.98
238+
# date='2023-04-01'
239+
# payment_method='credit'
240+
241+
```

docs/models/overview.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ Pydantic AI is model-agnostic and has built-in support for multiple model provid
1010
* [Cohere](cohere.md)
1111
* [Bedrock](bedrock.md)
1212
* [Hugging Face](huggingface.md)
13+
* [Outlines](outlines.md)
1314

1415
## OpenAI-compatible Providers
1516

docs/thinking.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -98,3 +98,9 @@ Thinking is supported by the `command-a-reasoning-08-2025` model. It does not ne
9898

9999
Text output inside `<think>` tags is automatically converted to [`ThinkingPart`][pydantic_ai.messages.ThinkingPart] objects.
100100
You can customize the tags using the [`thinking_tags`][pydantic_ai.profiles.ModelProfile.thinking_tags] field on the [model profile](models/openai.md#model-profile).
101+
102+
## Outlines
103+
104+
Some local models run through Outlines include in their text output a thinking part delimited by tags. In that case, it will be handled by Pydantic AI that will separate the thinking part from the final answer without the need to specifically enable it. The thinking tags used by default are `"<think>"` and `"</think>"`. If your model uses different tags, you can specify them in the [model profile](models/openai.md#model-profile) using the [`thinking_tags`][pydantic_ai.profiles.ModelProfile.thinking_tags] field.
105+
106+
Outlines currently does not support thinking along with structured output. If you provide an `output_type`, the model text output will not contain a thinking part with the associated tags, and you may experience degraded performance.

mkdocs.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@ nav:
3434
- models/groq.md
3535
- models/mistral.md
3636
- models/huggingface.md
37+
- models/outlines.md
3738
- Tools & Toolsets:
3839
- tools.md
3940
- tools-advanced.md
@@ -149,6 +150,7 @@ nav:
149150
- api/models/huggingface.md
150151
- api/models/instrumented.md
151152
- api/models/mistral.md
153+
- api/models/outlines.md
152154
- api/models/test.md
153155
- api/models/function.md
154156
- api/models/fallback.md

0 commit comments

Comments
 (0)