Skip to content

Conversation

@dmontagu
Copy link
Contributor

@dmontagu dmontagu commented Oct 24, 2025

Started this in collaboration with @DouweM, I'd like to ensure consensus on the API design before adding the remaining-providers/logfire-instrumentation/docs/tests.

This is inspired by the approach in haiku.rag, though we adapted it to be a bit closer to the Agent APIs are used (and how you can override model, settings, etc.).

Closes #58

Example:

import asyncio

from pydantic_ai.embeddings import Embedder

embedder = Embedder("openai:text-embedding-3-large")


async def main():
    result = await embedder.embed_documents(["hello", "world"])
    print(result)
    # (IsList, snapshot, and IsDatetime are testing helpers, but you get the point)
    # EmbeddingResult(
    #     embeddings=IsList(
    #         IsList(
    #             snapshot(0.01681816205382347),
    #             snapshot(-0.05579638481140137),
    #             snapshot(0.005661087576299906),
    #             length=1536,
    #         ),
    #         IsList(
    #             snapshot(-0.010592407546937466),
    #             snapshot(-0.03599696233868599),
    #             snapshot(0.030227113515138626),
    #             length=1536,
    #         ),
    #         length=2,
    #     ),
    #     inputs=['hello', 'world'],
    #     input_type='document',
    #     usage=RequestUsage(input_tokens=2),
    #     model_name='text-embedding-3-small',
    #     timestamp=IsDatetime(),
    #     provider_name='openai',
    # )



if __name__ == "__main__":
    asyncio.run(main())

To do:

from pydantic_ai.models.instrumented import InstrumentationSettings
from pydantic_ai.providers import infer_provider

KnownEmbeddingModelName = TypeAliasType(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a test like this one to verify this is up to date:

def test_known_model_names(): # pragma: lax no cover

@github-actions
Copy link

github-actions bot commented Oct 24, 2025

Docs Preview

commit: 6878484
Preview URL: https://c72e1fd7-pydantic-ai-previews.pydantic.workers.dev

@ggozad
Copy link

ggozad commented Oct 29, 2025

Thanks for starting this and please do let me know if you need help :)
I went quickly through, looks like a great start!

One thing you might want to support from the start is having as part of the EmbeddingSettings is max_context_length and encoding.

Embedding models have a limit of how many tokens of input they can handle. Most providers will raise (openai.BadRequestError iirc for OpenAI, vLLM will return an ugly 500 omg) and then some will say nothing (looking at you Ollama) and just truncate the input so that it fits.

All this is well explained here

I would not necessarily truncate like in the cookbook and still just raise, but I would be grateful to have available from the model side the max_context_length and the encoding so that as a library I can quickly check if a chunk of text fits or not.
Even better if I could get the number of tokens used for some text by a given embedding model.

The only difficulty I see with this is that not all providers expose the tokenizers, for example Ollama does not. But still, would be nice to have it for the providers that do support it, as it's a crucial step when you are trying to chunk a document for embedding.

In haiku.rag, my focus is local models, and like I mentioned Ollama, the popular choice, does not expose a way to tokenize text. So I just do the dumb thing and guesstimate the tokens hoping they are not going to be all that different from some OpenAI model's encoder: I use tiktoken (which you would probably also want to use to support this) and gpt-4o as a "close" model and get an estimate. But I am sure we can do better that this here.

Edit: I am not suggesting that calling embed should calculate the tokens needed on every call. But I imagine that whoever used pydantic AI to embed, would need to also go through the process of chunking some large text, unless they only dealt with embedding queries or simple sentences. So it would be a missed opportunity to not have support for that.

Copy link

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to be able to comment on the API, but there are no tests showing how to call it.

@DouweM
Copy link
Collaborator

DouweM commented Nov 14, 2025

@gvanrossum I'll make some progress on the PR today, but this is the API as it stands today:

import asyncio

from pydantic_ai.embeddings import Embedder

embedder = Embedder("openai:text-embedding-3-large")


async def main():
    result = await embedder.embed("Hello, world!")
    print(result)


if __name__ == "__main__":
    asyncio.run(main())

With Azure OpenAI you currently have to create the model and provider manually, but we'll make Embedder('azure:text-embedding-3-large') work as well:

import asyncio

from pydantic_ai.embeddings import Embedder
from pydantic_ai.embeddings.openai import OpenAIEmbeddingModel
from pydantic_ai.providers.azure import AzureProvider

model = OpenAIEmbeddingModel("text-embedding-3-large", provider=AzureProvider())

embedder = Embedder(model)


async def main():
    result = await embedder.embed("Hello, world!")
    print(result)


if __name__ == "__main__":
    asyncio.run(main())

@gvanrossum
Copy link

Nice. Do you have a bulk API too? That's essential for typeagent.

@DouweM
Copy link
Collaborator

DouweM commented Nov 14, 2025

@gvanrossum Yep, the embed method is overloaded to take either a str and return list[float], or take Sequence[str] and return list[list[float]], so it's the same method for single and bulk usage. (I'm aware str is itself a Sequence[str], but type checkers appear to handle the overloads correctly.)

@DouweM
Copy link
Collaborator

DouweM commented Nov 15, 2025

@gvanrossum In case you'd like to give it a try pre-release, I've made some progress today, including support for Embedder('azure:...').

@DouweM DouweM changed the title Draft implementation of support for embeddings APIs Support embeddings models Nov 18, 2025
@DouweM
Copy link
Collaborator

DouweM commented Nov 21, 2025

Unfortunately I haven't managed to get to this this week. Next week should be better.

@tomaarsen
Copy link

Following this PR now!
The modifications re. count_tokens and max_input_tokens look solid on the SentenceTransformer side, they match what I'd have done. Exciting!

  • Tom Aarsen

@stuartaxonHO
Copy link

It might be nice to be able to do this as a single function call so you don't always need to create the embedder ahead of time, but I'm not sure if this fits with the rest of pydantic.ai ?

@DouweM
Copy link
Collaborator

DouweM commented Dec 1, 2025

@stuartaxonHO I personally don't think it's worth adding a helper function when the "verbose" version is just await Embedder('openai:text-embedding-3-large').embed_documents(['hello', 'world'])

@stuartaxonHO
Copy link

Think I was spoiled by the litellm version embeddings("somemodel", ["sometext", "some more text"]), you're right there's not much in it though.

@tomaarsen
Copy link

I like @DouweM 's current approach, as initializing the embedder ahead of time will always reduce embedding latency.

  • Tom Aarsen

@DouweM
Copy link
Collaborator

DouweM commented Dec 1, 2025

@tomaarsen Note that I moved the SentenceTransformers initialization out of SentenceTransformersModel.__init__ as it may download the model from Hugging Face, which would be unexpected to happen from a sync method in Pydantic AI. So now it's only initialized/downloaded on first use of one of the async embed methods.

class CohereEmbeddingSettings(EmbeddingSettings, total=False):
"""Settings used for a Cohere embedding model request."""

# ALL FIELDS MUST BE `cohere_` PREFIXED SO YOU CAN MERGE THEM WITH OTHER MODELS.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is the case, should we just make cohere a top level field? Are there any fields that are shared between providers? I guess this isn't that important

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmontagu I'd prefer for it to be consistent with ModelSettings where we took this prefix route

@dmontagu
Copy link
Contributor Author

Do we need to include any utilities for actually doing queries on embeddings? I guess it's mostly going to be vector manipulation like numpy stuff or otherwise integrating with vector DBs, and maybe we don't want that in the library (yet?), but it seems like an obvious need for anyone working with these. I'm fine if we just start with the API wrappers but it feels like something that could definitely merit inclusion at some point

@ggozad
Copy link

ggozad commented Dec 11, 2025

Do we need to include any utilities for actually doing queries on embeddings? I guess it's mostly going to be vector manipulation like numpy stuff or otherwise integrating with vector DBs, and maybe we don't want that in the library (yet?), but it seems like an obvious need for anyone working with these. I'm fine if we just start with the API wrappers but it feels like something that could definitely merit inclusion at some point

As your typical user, I can tell that I do not need (and would not expect) pydantic ai to be doing the similarity calculations or any of the vector stuff. This would happen typically at the vector db level or for custom needs by coding.

@gvanrossum
Copy link

Same here.

@stuaxo
Copy link

stuaxo commented Dec 12, 2025

This might already be covered, but: one of the really annoying things is how all the low level APIs return data in different formats (though it's understandable there).

LiteLLM helps the user by translating everything into openai format,.

Langchain doesn't and everyone doing an embedding has to write code to work out where to get the floats from. It would be good pydantic.ai can avoid this by converting to a common format.

It's great being able to call models of any name and provider, but if the user then also has to fiddle with formats on the other end some of the utility is lost.

@DouweM
Copy link
Collaborator

DouweM commented Dec 12, 2025

It would be good pydantic.ai can avoid this by converting to a common format.

@stuaxo We return a EmbeddingResult with embeddings: list[list[float]]

# Conflicts:
#	pydantic_ai_slim/pyproject.toml
@dmontagu
Copy link
Contributor Author

@ggozad @gvanrossum makes sense, appreciate the feedback

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Vector search and embeddings API

9 participants