-
Notifications
You must be signed in to change notification settings - Fork 1.4k
feat: expose agents as openai compatible endpoint with FastAPI #3320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
b81d7ab to
4705ba1
Compare
|
@dutchfarao Nice work! @Kludex Please have a look |
9c28ece to
50430ae
Compare
Co-authored-by: Ion Koutsouris <[email protected]>
50430ae to
c054723
Compare
|
Me and @dutchfarao are sitting at a bar here, trying to fix the CI. We are frustrated and are giving up haha, the CI should use markers instead of this try_import stuff😆 Would like your support here |
The 20 force pushes were an indication |
Signed-off-by: Ion Koutsouris <[email protected]>
ddadb21 to
964fd4d
Compare
|
Resolved the CI, having a clear mind helps ;) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll stop my review.
This is not a good first step. Please never open a 6k lines PR on a repository without discussing it first.
I think the first step would be to implement what my PR (#2041) proposed first, and then expand the idea to Responses API, if needed.
| except ImportError as _import_error: # pragma: no cover | ||
| raise ImportError( | ||
| 'Please install the `openai` package to enable the fastapi openai compatible endpoint, ' | ||
| 'you can use the `openai` and `fastapi` optional group — `pip install "pydantic-ai-slim[openai,fastapi]"`' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the extra should be called fastapi. The idea is to expose the chat completions/responses endpoints - so maybe chat-completions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This module shouldn't be called fastapi. The underlying package doesn't matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are open to better name suggestions! Internally we called this differently
| from pydantic_ai.fastapi.registry import AgentRegistry | ||
| from pydantic_ai.settings import ModelSettings | ||
|
|
||
| logger = logging.getLogger(__name__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no need for this.
| except Exception as e: | ||
| logger.error(f'Error creating completion: {e}') | ||
| raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error is shown anyway, there's no need for this handling.
Also, in other code sources, you may want to do: logger.exception('Error when creating completion'), since the exception is already logged.
| except Exception as e: | |
| logger.error(f'Error creating completion: {e}') | |
| raise |
| *args: tuple[Any], | ||
| **kwargs: tuple[Any], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is wrong. The type of the *args and **kwargs should represent each element in the case of args, and each value in the case of kwargs - that said, this is a properly typed library, so we can't include this as is.
| except Exception as e: | ||
| logger.error(f'Error in responses: {e}', exc_info=True) | ||
| raise HTTPException( | ||
| status_code=500, | ||
| detail=ErrorResponse( | ||
| error=ErrorObject( | ||
| type='internal_server_error', | ||
| message=str(e), | ||
| ), | ||
| ).model_dump(), | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need again.
| @self.get('/v1/models', response_model=ModelsResponse) | ||
| async def get_models() -> ModelsResponse: # type: ignore | ||
| try: | ||
| return await self.models_api.list_models() | ||
| except Exception as e: | ||
| logger.error(f'Error listing models: {e}', exc_info=True) | ||
| raise HTTPException( | ||
| status_code=500, | ||
| detail=ErrorResponse( | ||
| error=ErrorObject( | ||
| type='internal_server_error', | ||
| message=f'Error retrieving models: {str(e)}', | ||
| ), | ||
| ).model_dump(), | ||
| ) | ||
|
|
||
| @self.get('/v1/models' + '/{model_id}', response_model=Model) | ||
| async def get_model(model_id: str) -> Model: # type: ignore | ||
| try: | ||
| return await self.models_api.get_model(model_id) | ||
| except HTTPException: | ||
| raise | ||
| except Exception as e: | ||
| logger.error(f'Error fetching model info: {e}', exc_info=True) | ||
| raise HTTPException( | ||
| status_code=500, | ||
| detail=ErrorResponse( | ||
| error=ErrorObject( | ||
| type='internal_server_error', | ||
| message=f'Error retrieving model: {str(e)}', | ||
| ), | ||
| ).model_dump(), | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this necessary?
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| class AgentRegistry: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should work on a registry basis.
I think the Agent object should expose an ASGI application, that you can expose.
If you need multiple agents to become apps/endpoints, you can mount them together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mounting means you create a subpath per model, doing it on registry bases allows you to have a single endpoint with N amount of agents (models) and OpenAI spec to select the models.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point.
I don't think we want to work with registry design anyway. A factory that creates an ASGI application would be ideal.
def expose_agents(
agents: dict[str, Agent],
*,
chat_completions_url: '/v1/chat/completions' | None = '/v1/chat/completions',
responses_url: '/v1/responses' | None = None,
) -> ASGIApp:I'm not sure the above is the best name for the function or the parameters are the best design, but the idea is to simplify the logic here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if it simplifies. The registry has the property to return the full list of unique models, and we also have distinguishment with chat completions or responses api only models, additionally we make it simple to derive the name from the agent
agent_registry = (
AgentRegistry()
.register_completions_agent(completion_agent) # auto derives the name from Agent instance or you override the name
.register_responses_agent(responses_agent)
)
app = FastAPI(
title="LLM Agent API",
description="OpenAI-compatible API with pydantic-ai backend",
version="1.0.0",
lifespan=lifespan,
)
app.include_router(AgentAPIRouter(agent_registry=agent_registry))| openai_client_for_chat = AsyncOpenAI( | ||
| base_url=fake_openai_base, | ||
| api_key='test-key', | ||
| http_client=DefaultAioHttpClient(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why using DefaultAioHttpClient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be removed indeed for the test, we simply were using aiohttp as our default client for the normal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually it is needed for the mocking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not need to use aioresponses. We can use VCR.
| "pip>=25.2", | ||
| "genai-prices>=0.0.28", | ||
| "mcp-run-python>=0.0.20", | ||
| "pytest-asyncio>=1.2.0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? The whole code sources uses anyio.
Signed-off-by: Ion Koutsouris <[email protected]>
ad71cc6 to
6d7a61e
Compare
Signed-off-by: Ion Koutsouris <[email protected]>
6d7a61e to
67d47cd
Compare
@ion-elgreco I think the idea of this PR is a good, but I think we really need to strip it down first, as I mentioned on the comment above. |
I'm fine with splitting the responses and chat completions api, and just stack them as seperate PRs. But it won't make a huge difference, since it's a contained code change |
Signed-off-by: Ion Koutsouris <[email protected]>
c190a88 to
e7f12d7
Compare
This PR adds the ability to register pydantic-ai agents and expose them as an OpenAI compatible endpoint. The registry allows you to register agents that can be either exposed as
/v1/chat/completionsand/or/v1/responsesendpoint. The/v1/modelsendpoint derives all the unique models by their names that exist in the AgentRegistry.Streaming support for responses API has not been implemented yet, can be done in a follow up PR.
to_chat_completionsmethod #2041