Replies: 1 comment
-
We had a conversation on discord and it turns out that most problems in the OpenAI inference provider when using compatible servers stem from the usage of the litellm mixin.
The only thing that will probably not get resolved is automatically registering OpenAI's predefined models. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
As far as I could see (and test), current implementation of
remote::openai
doesn't support OpenAI compatible inference providers even though abase_url
config option was added in PR#2919.Adding support in
llama_stack/providers/remote/inference/openai
for any OpenAI compatible provider would be really useful. I use it for example to use ollama's OpenAI compatible endpoints instead of the normal ones because tool calling works better there.As I see it there are some limitations in the current provider:
base_url
is not used for the authentication, so even if a different URL is passed, the defaultapi.openai.com
is used for authentication.completion
, which is necessary in some cases.I have some code working to add support for all this (missing tests) in my branch: https://github.com/Akrog/llama-stack/tree/openai-others
This code is backward compatible in the sense that it will try to retrieve the existing models from OpenAI and if it cannot do that it will default to the ones that we have configured today.
It allows to limit the models that can be used from those that exist, define the embeddings metadata, and pass additional parameters to litellm's
completion
.For example I use it like this for my ollama's OpenAI endpoint when I only want to expose the
llama3.1:8b
model:A limitation with my current code is that if we are using a custom service and the REST API to list the models is down it will not register any of the models. I believe this should be improved to have a more reasonable behavior, like blindly accepting whatever models have been configured. I didn't want to spend more time on this before I heard if this is something that the project is interested in.
Beta Was this translation helpful? Give feedback.
All reactions