Add `OutlinesModel` to run Transformers, Llama.cpp, MLXLM, SGLang and vLLM via Outlines #2623

RobinPicard · 2025-08-20T12:42:04Z

This PR is a minimal proposition of an interface for the integration of Outlines in Pydantic AI.

The idea is that there would be single OutlinesModel that would take as an argument in its initialization an Outlines model instance and that would then be in charge of interacting with its interface. The class has class methods (transformers, llama_cpp...) to avoid having to first create the Outlines model oneself and instead directly providing the arguments required for each model.

The proposal only supports json schema as an output type as Pydantic AI is currently focused on that type.

pydantic_ai_slim/pydantic_ai/models/outlines.py

outlines_example.py

pydantic_ai_slim/pydantic_ai/models/outlines.py

DouweM · 2025-08-26T19:57:05Z

pydantic_ai_slim/pydantic_ai/models/outlines.py

+        )
+        model_settings_dict = dict(model_settings) if model_settings else {}
+        if isinstance(self.model, OutlinesAsyncBaseModel):
+            response: str = await self.model(prompt, output_type, None, **model_settings_dict)


I wouldn't expect the keys in our ModelSettings to map to valid kwargs in outlines. Should we do some mapping explicitly?

Outlines does not standardize inference arguments so it would need to be model specific.

Copying in what I wrote on Slack:

Another option would be to map the existing fields on ModelSettings if we know what their equivalent is for the model they're using (e.g. max_tokens , frequency), and tell them to put additional properties in extra_body : https://ai.pydantic.dev/api/settings/#pydantic_ai.settings.ModelSettings.extra_body (which can take an arbitrary dict). I'd like using extra_body better than passing the entire ModelSettings and hitting errors for unsupported keys

pydantic_ai_slim/pydantic_ai/profiles/outlines.py

pydantic_ai_slim/pydantic_ai/models/outlines.py

outlines_example.py

DouweM · 2025-08-26T20:10:42Z

outlines_example.py

+    chat_template = '{% for message in messages %}{{ message.role }}: {{ message.content }}{% endfor %}'
+    hf_tokenizer.chat_template = chat_template
+
+    model = OutlinesModel.transformers(hf_model, hf_tokenizer, settings=ModelSettings(max_new_tokens=100))


Related to my comment below, we should try to use and support the existing fields on ModelSettings

uv.lock

pydantic_ai_slim/pydantic_ai/models/outlines.py

pydantic_ai_slim/pydantic_ai/providers/outlines.py

outlines_example.py

pydantic_ai_slim/pydantic_ai/models/outlines.py

docs/models/outlines.md

mcp-run-python/src/prepareEnvCode.ts

pydantic_ai_slim/pydantic_ai/models/outlines.py

DouweM · 2025-09-08T16:23:59Z

pydantic_ai_slim/pyproject.toml

 bedrock = ["boto3>=1.39.0"]
 huggingface = ["huggingface-hub[inference]>=0.33.5"]
+outlines = ["outlines>=1.0.0, <1.3.0"]
+outlines-transformers = ["outlines[transformers]>=1.0.0, <1.3.0"]


Why is this necessary? For tests? We should at least document it then.

I'd rather add it as an optional group to pydantic-ai, which should be installed as part of the all-extras CI test

DouweM · 2025-09-08T16:24:38Z

pyproject.toml

 [tool.hatch.metadata.hooks.uv-dynamic-versioning]
 dependencies = [
-    "pydantic-ai-slim[openai,vertexai,google,groq,anthropic,mistral,cohere,bedrock,huggingface,cli,mcp,evals,ag-ui,retries,temporal,logfire]=={{ version }}",
+    "pydantic-ai-slim[openai,vertexai,google,groq,anthropic,mistral,cohere,bedrock,huggingface,outlines-transformers,cli,mcp,evals,ag-ui,retries,temporal,logfire]=={{ version }}",


I'd rather not include transformers by default

DouweM · 2025-09-08T16:26:03Z

tests/models/test_outlines.py

+
+
+async def test_request_streaming(outlines_model: 'Transformers') -> None:
+    # The transformers model does not support streaming,


Can we test with another model that does support streaming?

We can test with llama_cpp

DouweM · 2025-09-08T16:27:04Z

tests/models/test_outlines.py

+    result = await agent.run('What is the capital of Germany?', message_history=result.all_messages())
+    assert len(result.output) > 0
+    all_messages = result.all_messages()
+    assert len(all_messages) == 4


Let's use inline-snapshot instead of these explicit assertions: assert result.all_messages() == snapshot() and run the test to get the snapshot to be filled in. You can replace any variable values with IsStr (see examples across the test suite)

DouweM · 2025-09-08T16:29:26Z

tests/test_examples.py

    vertex_provider_auth: None,
 ):
+    # Skip Outlines examples if loading the transformers model fails
+    if 'from transformers import' in example.source and 'from pydantic_ai.models.outlines import' in example.source:


I don't love this -- would it be enough to add test="ci_only" to the example so it's only run in CI, where all extras will be installed?

pydantic_ai_slim/pydantic_ai/models/outlines.py

pydantic_ai_slim/pydantic_ai/_output.py

DouweM · 2025-10-24T15:05:23Z

@RobinPicard Heh, we're hitting HuggingFace rate limits:

E               429 Client Error: Too Many Requests for url: https://huggingface.co/api/models/erwanf/gpt2-mini/xet-read-token/f12cc7ee54aa4d3e6366597f72ff7acb39b0ab3a (Request ID: Root=1-68fb9343-3b4898b33337e4ed6923263e;4433d677-476f-4296-8e11-8ba47752b357)
E               
E               We had to rate limit your IP (172.183.89.68). To continue using our service, create a HF account or login to your existing account, and make sure you pass a HF_TOKEN if you're using the API.

Is there a way to not do any live requests?

RobinPicard · 2025-10-25T15:33:38Z

I've made 2 modifications to fix the remaining issues @DouweM

Added a commit to add a step in the CI to cache the HF model download. That way models are downloaded only once (instead of over a dozen times) and there's no problem of rate limit anymore. I think this is best as it allows us not to mock anything so that we can have more reliable tests
Added the # pyright: reportUnnecessaryTypeIgnoreComment = false escape command on top of the files pydantic_ai_slim/pydantic_ai/models/outlines.py and tests/outlines/test_outlines.py. The reason for this is that there was a conflict between the pyright linting for python3.10 and 3.13: one was requesting pyright: ignore while the other would fail it were included because of unnecessary ignore commands.

DouweM · 2025-10-27T17:10:19Z

@RobinPicard Thanks for all your work and patience here; merged! 🎉

RobinPicard force-pushed the mvp_outlines_integration branch 5 times, most recently from 7cb22ca to f118c9f Compare August 20, 2025 15:34

Kludex reviewed Aug 25, 2025

View reviewed changes

Kludex self-assigned this Aug 25, 2025

DouweM requested changes Aug 26, 2025

View reviewed changes

DouweM self-assigned this Aug 26, 2025

DouweM added the awaiting author revision label Aug 26, 2025

DouweM reviewed Aug 26, 2025

View reviewed changes

pydantic_ai_slim/pydantic_ai/models/outlines.py Outdated Show resolved Hide resolved

DouweM requested changes Sep 1, 2025

View reviewed changes

DouweM unassigned Kludex Sep 1, 2025

RobinPicard force-pushed the mvp_outlines_integration branch from 6edfe71 to e34df9f Compare September 5, 2025 16:56

RobinPicard marked this pull request as ready for review September 5, 2025 17:01

RobinPicard requested review from DouweM and Kludex September 5, 2025 17:02

RobinPicard force-pushed the mvp_outlines_integration branch from c2ede74 to a574c05 Compare September 8, 2025 07:58

RobinPicard changed the title ~~Proposition MVP integration with Outlines~~ Proposition integration with Outlines Sep 8, 2025

RobinPicard force-pushed the mvp_outlines_integration branch 2 times, most recently from 9f112c7 to 7dbb43a Compare September 8, 2025 12:29

DouweM reviewed Sep 8, 2025

View reviewed changes

docs/models/outlines.md Outdated Show resolved Hide resolved

DouweM requested changes Sep 8, 2025

View reviewed changes

pydantic_ai_slim/pydantic_ai/models/outlines.py Outdated Show resolved Hide resolved

DouweM mentioned this pull request Sep 8, 2025

Add support for GPT-5 Free-Form Function Calling and Context Free Grammar constraints over tools #2572

Closed

5 tasks

RobinPicard force-pushed the mvp_outlines_integration branch 5 times, most recently from 3fa7f18 to 55146ac Compare September 16, 2025 17:27

RobinPicard force-pushed the mvp_outlines_integration branch from 7b3deb2 to 3441849 Compare October 23, 2025 15:33

DouweM requested changes Oct 23, 2025

View reviewed changes

RobinPicard force-pushed the mvp_outlines_integration branch 2 times, most recently from f2c1d45 to 2b5e71b Compare October 23, 2025 16:23

DouweM changed the title ~~Add OutlinesModel to run Transformers, Llama.cpp, MLXLM, SGLang and vLLM~~ Add OutlinesModel to run Transformers, Llama.cpp, MLXLM, SGLang and vLLM via Outlines Oct 23, 2025

DouweM enabled auto-merge (squash) October 23, 2025 17:49

auto-merge was automatically disabled October 24, 2025 12:02
Head branch was pushed to by a user without write access

RobinPicard force-pushed the mvp_outlines_integration branch 4 times, most recently from 30decd2 to edc23db Compare October 24, 2025 14:50

RobinPicard force-pushed the mvp_outlines_integration branch 4 times, most recently from 9a8f509 to b9005a7 Compare October 25, 2025 14:39

RobinPicard added 4 commits October 25, 2025 23:02

Add support for the Outlines model

dece720

Add tests for the Outlines model

603870c

Add documentation for the Outlines model

3ac4701

Cache the HF models in the CI

be6bed0

RobinPicard force-pushed the mvp_outlines_integration branch from b9005a7 to be6bed0 Compare October 25, 2025 15:04

RobinPicard requested a review from DouweM October 25, 2025 15:37

DouweM merged commit a70c2a5 into pydantic:main Oct 27, 2025
31 checks passed

This was referenced Nov 2, 2025

Building environments (make install-all-python) fails on Intel Macs #3311

Closed

Skip installing outlines dependencies mlx, vllm, torch on Intel Macs #3312

Merged

Support Python 3.14 #2402

Open

laisbsc added new models Support for new model(s) and removed awaiting author revision labels Nov 12, 2025



		async def test_request_streaming(outlines_model: 'Transformers') -> None:
		# The transformers model does not support streaming,

Add OutlinesModel to run Transformers, Llama.cpp, MLXLM, SGLang and vLLM via Outlines #2623

Add OutlinesModel to run Transformers, Llama.cpp, MLXLM, SGLang and vLLM via Outlines #2623

Uh oh!

Conversation

RobinPicard commented Aug 20, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DouweM commented Oct 24, 2025

Uh oh!

RobinPicard commented Oct 25, 2025

Uh oh!

Uh oh!

DouweM commented Oct 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add `OutlinesModel` to run Transformers, Llama.cpp, MLXLM, SGLang and vLLM via Outlines #2623

Add `OutlinesModel` to run Transformers, Llama.cpp, MLXLM, SGLang and vLLM via Outlines #2623