Support generic OpenAI compatible inference provider #3034

Akrog · 2025-08-04T17:01:17Z

Akrog
Aug 4, 2025

As far as I could see (and test), current implementation of remote::openai doesn't support OpenAI compatible inference providers even though a base_url config option was added in PR#2919.

Adding support in llama_stack/providers/remote/inference/openai for any OpenAI compatible provider would be really useful. I use it for example to use ollama's OpenAI compatible endpoints instead of the normal ones because tool calling works better there.

As I see it there are some limitations in the current provider:

The base_url is not used for the authentication, so even if a different URL is passed, the default api.openai.com is used for authentication.
Only the default list of OpenAI models are allowed.
Cannot restrict the models that are exposed by default so they are all registered on start.
Cannot pass extra parameters to litellm's completion, which is necessary in some cases.
Cannot retrieve existing model names from the OpenAI REST endpoint.

I have some code working to add support for all this (missing tests) in my branch: https://github.com/Akrog/llama-stack/tree/openai-others

This code is backward compatible in the sense that it will try to retrieve the existing models from OpenAI and if it cannot do that it will default to the ones that we have configured today.
It allows to limit the models that can be used from those that exist, define the embeddings metadata, and pass additional parameters to litellm's completion.

For example I use it like this for my ollama's OpenAI endpoint when I only want to expose the llama3.1:8b model:

providers:
  inference:
    - provider_id: ollama
      provider_type: remote::openai
      config:
        base_url: http://localhost:11434/v1/
        api_key: nokey
        allowed_models:
          - llama3.1:8b
        embeddings_metadata: {}
        extra_completion_params:
          drop_params: true

A limitation with my current code is that if we are using a custom service and the REST API to list the models is down it will not register any of the models. I believe this should be improved to have a more reasonable behavior, like blindly accepting whatever models have been configured. I didn't want to spend more time on this before I heard if this is something that the project is interested in.

Akrog · 2025-09-03T07:32:30Z

Akrog
Sep 3, 2025
Author

We had a conversation on discord and it turns out that most problems in the OpenAI inference provider when using compatible servers stem from the usage of the litellm mixin.
There is an effort to remove this dependency on litellm which will remove the following issues:

Even with base_url configured it will not be used for the authentication.
Cannot use custom provider names if we don't manually register them.
Some servers fail because we need to pass drop_params to the litellm mixin call as an extra parameter.

The only thing that will probably not get resolved is automatically registering OpenAI's predefined models.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support generic OpenAI compatible inference provider #3034

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Support generic OpenAI compatible inference provider #3034

Uh oh!

Akrog Aug 4, 2025

Replies: 1 comment

Uh oh!

Akrog Sep 3, 2025 Author

Akrog
Aug 4, 2025

Akrog
Sep 3, 2025
Author