-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Add OutlinesModel to run Transformers, Llama.cpp, MLXLM, SGLang and vLLM via Outlines
#2623
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
7cb22ca to
f118c9f
Compare
| ) | ||
| model_settings_dict = dict(model_settings) if model_settings else {} | ||
| if isinstance(self.model, OutlinesAsyncBaseModel): | ||
| response: str = await self.model(prompt, output_type, None, **model_settings_dict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't expect the keys in our ModelSettings to map to valid kwargs in outlines. Should we do some mapping explicitly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Outlines does not standardize inference arguments so it would need to be model specific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copying in what I wrote on Slack:
Another option would be to map the existing fields on ModelSettings if we know what their equivalent is for the model they're using (e.g. max_tokens , frequency), and tell them to put additional properties in extra_body : https://ai.pydantic.dev/api/settings/#pydantic_ai.settings.ModelSettings.extra_body (which can take an arbitrary dict). I'd like using extra_body better than passing the entire ModelSettings and hitting errors for unsupported keys
outlines_example.py
Outdated
| chat_template = '{% for message in messages %}{{ message.role }}: {{ message.content }}{% endfor %}' | ||
| hf_tokenizer.chat_template = chat_template | ||
|
|
||
| model = OutlinesModel.transformers(hf_model, hf_tokenizer, settings=ModelSettings(max_new_tokens=100)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to my comment below, we should try to use and support the existing fields on ModelSettings
6edfe71 to
e34df9f
Compare
c2ede74 to
a574c05
Compare
9f112c7 to
7dbb43a
Compare
pydantic_ai_slim/pyproject.toml
Outdated
| bedrock = ["boto3>=1.39.0"] | ||
| huggingface = ["huggingface-hub[inference]>=0.33.5"] | ||
| outlines = ["outlines>=1.0.0, <1.3.0"] | ||
| outlines-transformers = ["outlines[transformers]>=1.0.0, <1.3.0"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this necessary? For tests? We should at least document it then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather add it as an optional group to pydantic-ai, which should be installed as part of the all-extras CI test
pyproject.toml
Outdated
| [tool.hatch.metadata.hooks.uv-dynamic-versioning] | ||
| dependencies = [ | ||
| "pydantic-ai-slim[openai,vertexai,google,groq,anthropic,mistral,cohere,bedrock,huggingface,cli,mcp,evals,ag-ui,retries,temporal,logfire]=={{ version }}", | ||
| "pydantic-ai-slim[openai,vertexai,google,groq,anthropic,mistral,cohere,bedrock,huggingface,outlines-transformers,cli,mcp,evals,ag-ui,retries,temporal,logfire]=={{ version }}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather not include transformers by default
tests/models/test_outlines.py
Outdated
|
|
||
|
|
||
| async def test_request_streaming(outlines_model: 'Transformers') -> None: | ||
| # The transformers model does not support streaming, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we test with another model that does support streaming?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can test with llama_cpp
tests/models/test_outlines.py
Outdated
| result = await agent.run('What is the capital of Germany?', message_history=result.all_messages()) | ||
| assert len(result.output) > 0 | ||
| all_messages = result.all_messages() | ||
| assert len(all_messages) == 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use inline-snapshot instead of these explicit assertions: assert result.all_messages() == snapshot() and run the test to get the snapshot to be filled in. You can replace any variable values with IsStr (see examples across the test suite)
tests/test_examples.py
Outdated
| vertex_provider_auth: None, | ||
| ): | ||
| # Skip Outlines examples if loading the transformers model fails | ||
| if 'from transformers import' in example.source and 'from pydantic_ai.models.outlines import' in example.source: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love this -- would it be enough to add test="ci_only" to the example so it's only run in CI, where all extras will be installed?
3fa7f18 to
55146ac
Compare
7b3deb2 to
3441849
Compare
f2c1d45 to
2b5e71b
Compare
OutlinesModel to run Transformers, Llama.cpp, MLXLM, SGLang and vLLMOutlinesModel to run Transformers, Llama.cpp, MLXLM, SGLang and vLLM via Outlines
Head branch was pushed to by a user without write access
30decd2 to
edc23db
Compare
|
@RobinPicard Heh, we're hitting HuggingFace rate limits: Is there a way to not do any live requests? |
9a8f509 to
b9005a7
Compare
b9005a7 to
be6bed0
Compare
|
I've made 2 modifications to fix the remaining issues @DouweM
|
|
@RobinPicard Thanks for all your work and patience here; merged! 🎉 |
This PR is a minimal proposition of an interface for the integration of Outlines in Pydantic AI.
The idea is that there would be single
OutlinesModelthat would take as an argument in its initialization an Outlines model instance and that would then be in charge of interacting with its interface. The class has class methods (transformers,llama_cpp...) to avoid having to first create the Outlines model oneself and instead directly providing the arguments required for each model.The proposal only supports json schema as an output type as Pydantic AI is currently focused on that type.