Skip to content

Conversation

@cdoern
Copy link
Contributor

@cdoern cdoern commented Oct 23, 2025

What does this PR do?

Extract API definitions, models, and provider specifications into a standalone llama-stack-api package that can be published to PyPI independently of the main llama-stack server.

see: #2978 and #2978 (comment)

Motivation

External providers currently import from llama-stack, which overrides the installed version and causes dependency conflicts. This separation allows external providers to:

  • Install only the type definitions they need without server dependencies
  • Avoid version conflicts with the installed llama-stack package
  • Be versioned and released independently

This enables us to re-enable external provider module tests that were previously blocked by these import conflicts.

Changes

  • Created llama-stack-api package with minimal dependencies (pydantic, jsonschema)
  • Moved APIs, models, providers datatypes, strong_typing, and schema_utils
  • Updated all imports from llama_stack.* to llama_stack_spec.*
  • Configured local editable install for development workflow
  • Updated linting and type-checking configuration for both packages

Notes

the provider.utils.vector_io utility is used heavily in the api.vector_io pkg so I needed to move this. I wonder If it'd be better to instead relocate those utils elsewhere or if it makes sense for the utils for all providers to be in this pkg so external providers can take advantage of them cc @franciscojavierarceo

core.telemetry was moved to llama_stack_api because it is directly used by the apis (trace_protocol) cc @ehhuang @iamemilio

Next Steps

  • Publish llama-stack-spec to PyPI
  • Update external provider dependencies
  • Re-enable external provider module tests

relates to #3237

Test Plan

Package builds successfully and can be imported independently. All pre-commit hooks pass with expected exclusions maintained.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 23, 2025
@cdoern cdoern force-pushed the api-pkg branch 6 times, most recently from 431d00f to 2477826 Compare October 24, 2025 17:05
@cdoern cdoern marked this pull request as ready for review October 24, 2025 17:08
@cdoern
Copy link
Contributor Author

cdoern commented Oct 24, 2025

marking this as ready for review to get some eyes on this! I appreciate any and all feedback here and can provider more context if necessary!

Copy link
Contributor

@raghotham raghotham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still wondering if we need to publish a new package - it adds to management overhead.

Would you be open to experiment to see what the experience will be like if we just created llama-stack[api] with the dependencies below and see what the developer experience will be for external providers?

    "pydantic>=2.11.9",
    "jsonschema",
    "opentelemetry-sdk>=1.30.0",
    "opentelemetry-exporter-otlp-proto-http>=1.30.0",

@cdoern
Copy link
Contributor Author

cdoern commented Oct 24, 2025

@raghotham sure, let me try in a separate branch. I think an extra like llama-stack[api] might work but I am interested to see how/if we can package only a subset of the code here. I imagine an extra should be able just point to llama_stack_spec as the src dir so it might work perfectly, let me try it out.

@ashwinb
Copy link
Contributor

ashwinb commented Oct 24, 2025

FWIW I'd prefer llama-stack-api to llama-stack-spec from a naming standpoint.

@cdoern
Copy link
Contributor Author

cdoern commented Oct 27, 2025

rebasing just to keep up to date. Once I am back from PTO will re-work to do the extra's approach.

@cdoern
Copy link
Contributor Author

cdoern commented Oct 27, 2025

@raghotham , @ashwinb , @franciscojavierarceo, @leseb regarding extras vs separate package, here are my findings on why a separate pkg is desirable and actually the only option as opposed to an extra:

Extras vs. Separate Package for llama-stack-api:

Python extras control which dependencies get installed, not which source code is included in the package.

If we used llama-stack[api], running pip install llama-stack[api] would:

  • Install all code from the llama_stack/ directory (server, CLI, all providers)
  • Just add extra dependencies specified in the [project.optional-dependencies] section

This defeats the goal of providing a lightweight API-only package for external providers.

Why the Separate Package Approach is Correct

The current llama-stack-spec package (which I can rename to llama-stack-api) achieves the separation we need:

  1. True code isolation: External providers can install only llama-stack-spec and get just the API definitions, not the server/CLI/provider implementations
  2. No import conflicts: Different namespaces (llama_stack_spec.* vs llama_stack.*) prevent overriding the installed llama-stack package
  3. Independent versioning: API specs can be versioned separately from the server

Why the Separate Directory is Required

  • Each package needs its own pyproject.toml with different dependencies and configuration
  • They have different package names on PyPI and different module paths
  • You cannot use extras to include only a subset of source files from a package

This pattern is standard in the Python ecosystem (examples: boto3 vs boto3-stubs, mypy vs mypy-extensions).

As a concrete action in the meantime I can rename this to llama-stack-api. Let me know if this makes sense!

@cdoern cdoern changed the title feat: split API and provider specs into separate llama-stack-spec pkg feat: split API and provider specs into separate llama-stack-api pkg Oct 27, 2025
@cdoern cdoern force-pushed the api-pkg branch 7 times, most recently from b8a7a20 to d1df2e4 Compare October 28, 2025 00:29
@mattf
Copy link
Collaborator

mattf commented Oct 28, 2025

stack has a public api with versioning and lifecycle - /v1, /v1alpha, /v1beta. great, hard work.

behind those public apis is a bunch of fast moving implementation code.

@cdoern does this effectively expose that implementation code with a version for use by other projects?

@cdoern
Copy link
Contributor Author

cdoern commented Oct 28, 2025

@mattf , the proposal here is to publish llama-stack-api which would expose the API as you have described.

This code though is already exposed in llama-stack and is imported already by external providers. This PR just moves that code to a separately published package that if we choose we can version and release separately from llama-stack. This actually (if we choose) gives us a new level of flexibility for fast changes since the consumers of llama-stack-api can pin to a previous version.

Without this change, external providers will need to keep importing llama-stack as a whole which makes it super hard for those consumers to ship external providers in production use cases.

@ashwinb
Copy link
Contributor

ashwinb commented Oct 28, 2025

I wonder if we could do this a bit more piecemeal as a set of stacked PRs. Right now this feels like moving way too much out perhaps. I am going to think about it a bit more though.

@mattf
Copy link
Collaborator

mattf commented Oct 28, 2025

@mattf , the proposal here is to publish llama-stack-api which would expose the API as you have described.

This code though is already exposed in llama-stack and is imported already by external providers. This PR just moves that code to a separately published package that if we choose we can version and release separately from llama-stack. This actually (if we choose) gives us a new level of flexibility for fast changes since the consumers of llama-stack-api can pin to a previous version.

Without this change, external providers will need to keep importing llama-stack as a whole which makes it super hard for those consumers to ship external providers in production use cases.

this will move the conflict from the llama-stack package to the llama-stack-api package and open a new category of issues where a user's llama-stack version isn't compatible with their llama-stack-api version.

correct me if i'm wrong, i see four approaches for external providers -

  1. freeze the llama stack version that is supported by the provider (recommended for production deployments)
  2. keep the external provider in sync with llama stack releases (maintenance burden is proportional to amount of internal apis used by provider)
  3. implement external provider without using internal llama stack apis and utils (recommended)
  4. expand the versioning, stability guarantees from the public api to the internal implementation and utils

consider (3) - a responses provider would be implemented against the public chat, prompt, conversations, file api endpoints, instead of against the internal provider apis.

we can make (3) simpler by passing a LlamaStackClient to the external providers instead of internal provider apis.

i recommend we do not do the package split.

@cdoern
Copy link
Contributor Author

cdoern commented Oct 28, 2025

@mattf

responding to each option:

  1. freeze the llama stack version that is supported by the provider (recommended for production deployments)
  1. This would be ideal, but is probably not realistic given the current rate of development of LLS
  1. keep the external provider in sync with llama stack releases (maintenance burden is proportional to amount of internal apis used by provider)
  1. same reason as above
  1. implement external provider without using internal llama stack apis and utils (recommended)
  1. I have a question on this: what do you mean by "public chat, prompt, conversations, file api endpoints, instead of against the internal provider apis.". There might be a differentiation I am missing here, but this PR moves llama_stack.apis... moving things like inference.py, responses.py, etc. Are these the "public" APIs you are referring to? If so, this is what I am aiming to move here. There are providers.datatypes and providers.utils I am moving but those are a consequence of the fact that we allow external providers. Those providers need access to these datatypes in order to conform to our process of validating APIs, providers, and standing up a stack.

Option 3 would be great but I imagine would require a re-architecture of how we register external providers and our APIs in general during stack standup. The reason external providers based around our internal provider datatypes is because that is the only way we can verify they are valid implementations and the stack will work.

Would it make more sense to instead standardize our Provider Datatypes enabling the External Provider ecosystem to have a solid foundation on which folks can build providers. Otherwise, if we keep these types "internal" it's really a sneaky way of allowing breaking changes to publicly consumed API types that we really should have some backwards compatibility on.

  1. expand the versioning, stability guarantees from the public api to the internal implementation and utils
  1. This is kind of what I describe above in my response to 3. Folks are already consuming providers.Datatypes.... Expanding the stability guarantee seems like the best option. Having a separate package to ship the provider datatypes + the public APIs seems reasonable because otherwise folks are requiring llama-stack which has a whole host of other dependencies just to implement their external providers, APIs, etc.

I see the version incompatibility issue between llama-stack and llama-stack-api and understand the issues that inherently come with that. But I think given we ensure backwards compatibility for the public API already, there should be a pretty solid upgrade path between z-streams of llama-stack-api and llama-stack. But yes, we'd need to ensure provider datatypes between versions as well. As mentioned above though, people are already consuming those types and have had a bumpy ride between z-streams so we need probably need to ensure them!

@cdoern
Copy link
Contributor Author

cdoern commented Oct 28, 2025

If the alternative here is to standardize:

  1. our provider datatypes (RemoteProviderSpec, InlineProviderSpec, ProviderSpec) between z-streams
  2. our public API between z-streams
  3. out external provider registration process between z-streams

we can also go that route @mattf !

I do think though that a package with less deps than the entire server of llama-stack would make more sense.

Additionally, the issue of llama-stack server being a different version than what an external-provider is up to will still be problematic, especially when it comes to testing these providers. Presumably our llama-stack-api package should be compatible with an entire llama-stack Y-stream.

@mattf
Copy link
Collaborator

mattf commented Oct 28, 2025

@mattf

responding to each option:

  1. freeze the llama stack version that is supported by the provider (recommended for production deployments)
1. This would be ideal, but is probably not realistic given the current rate of development of LLS
  1. keep the external provider in sync with llama stack releases (maintenance burden is proportional to amount of internal apis used by provider)
2. same reason as above

that's the crux. the internal development speed of stack should not be limited by the development speed of external providers.

it'd be very reasonable for a production system to standardize on 0.3.0 and make sure all the supported external providers are compatible.

there's value in that productization of llama stack.

  1. implement external provider without using internal llama stack apis and utils (recommended)
3. I have a question on this: what do you mean by "public chat, prompt, conversations, file api endpoints, instead of against the internal provider apis.". There might be a differentiation I am missing here, but this PR moves `llama_stack.apis...` moving things like `inference.py`, `responses.py`, etc. Are these the "public" APIs you are referring to? If so, this is what I am aiming to move here. There are `providers.datatypes` and `providers.utils` I am moving but those are a consequence of the fact that we allow external providers. Those providers need access to these datatypes in order to conform to our process of validating APIs, providers, and standing up a stack.

Option 3 would be great but I imagine would require a re-architecture of how we register external providers and our APIs in general during stack standup. The reason external providers based around our internal provider datatypes is because that is the only way we can verify they are valid implementations and the stack will work.

Would it make more sense to instead standardize our Provider Datatypes enabling the External Provider ecosystem to have a solid foundation on which folks can build providers. Otherwise, if we keep these types "internal" it's really a sneaky way of allowing breaking changes to publicly consumed API types that we really should have some backwards compatibility on.

the public apis i'm referring to are the ones you just painstakingly worked to define, e.g http://localhost:8321/v1/completions etc.

the other side of giving an external provider a LlamaStackClient is that the external provider can be a separate process. stack already has the concept of a passthrough provider.

  1. expand the versioning, stability guarantees from the public api to the internal implementation and utils
4. This is kind of what I describe above in my response to 3. Folks are already consuming `providers.Datatypes...`. Expanding the stability guarantee seems like the best option. Having a separate package to ship the provider datatypes + the public APIs seems reasonable because otherwise folks are requiring `llama-stack` which has a whole host of other dependencies just to implement their external providers, APIs, etc.

I see the version incompatibility issue between llama-stack and llama-stack-api and understand the issues that inherently come with that. But I think given we ensure backwards compatibility for the public API already, there should be a pretty solid upgrade path between z-streams of llama-stack-api and llama-stack. But yes, we'd need to ensure provider datatypes between versions as well. As mentioned above though, people are already consuming those types and have had a bumpy ride between z-streams so we need probably need to ensure them!

i don't think we should do (4).

it should be possible to achieve fast internal development and stability for external providers.

@ashwinb
Copy link
Contributor

ashwinb commented Oct 28, 2025

I think extension "ABI" is a legitimate need if you want to enable extensions. However the approach the PR takes may need to be thought through. We have to be careful to expose absolutely minimal things. This PR seeems to move way too many things potentially.

@ashwinb
Copy link
Contributor

ashwinb commented Oct 28, 2025

I guess we need to see what some of the external providers look like. What "extension points" are they using? Without that information we'd be designing this in a vacuum.

@cdoern
Copy link
Contributor Author

cdoern commented Oct 28, 2025

@ashwinb that is a fair point. I can likely decrease the surface of this down to the following:

  1. the public API definitions: eg. https://github.com/llamastack/llama-stack/tree/main/src/llama_stack/apis these routes are what the external provider needs to implement: https://github.com/trustyai-explainability/llama-stack-provider-lmeval/blob/80f137a903ed961d3ca4e5ba78ff07f2a9b0f64e/src/llama_stack_provider_lmeval/lmeval.py#L1199
  2. The datatypes in providers.datatypes:
    class ProviderSpec(BaseModel):
    the ProviderSpec type is a necessity for get_provider_spec

Anything else I moved is a dependent of those two. for example strong_typing in its entirety is primarily used by api with @json_schema_type, etc. The other issue here is the usage of telemetry in @trace_protocol which is found in all of the APIs. trace_protocol imports llama_stack.core.telemetry.tracing as well. This is likely a decoupling issue we need to fix.

The provider utilities I moved are ones used directly in the API pkg. This is probably something to fix rather than to move. Most of these were from the vector_io API. I can work on fixing this in a precursor PR to this one if that'd make sense.

My intended scope here was just the apis pkg and the provider datatypes. So I guess we could scope some pre-work here to unwind anything else that is moved here as a requirement that we don't want in the new sub-package?

@ashwinb
Copy link
Contributor

ashwinb commented Oct 28, 2025

@cdoern I think that list is quite reasonable and you are exactly right that there's a bunch of things we should fix so we don't need to move as much into the extension support.

@ruivieira
Copy link

ruivieira commented Oct 28, 2025

@cdoern (with my provider developer hat on) I like the idea, especially isolating external providers from internal changes in z-stream releases (if the llama-stack-api doesn't change). But external providers should make sure to only have a llama-stack-api dependency. Having dependencies on both internal llama-stack and llama-stack-api would only make the problem even worse...
This might also simplify provider upstream testing.

@cdoern cdoern force-pushed the api-pkg branch 2 times, most recently from d67be1a to 29a322c Compare October 28, 2025 23:23
Extract API definitions, models, and provider specifications into a
standalone llama-stack-api package that can be published to PyPI
independently of the main llama-stack server.

Motivation

External providers currently import from llama-stack, which overrides
the installed version and causes dependency conflicts. This separation
allows external providers to:

- Install only the type definitions they need without server dependencies
- Avoid version conflicts with the installed llama-stack package
- Be versioned and released independently

This enables us to re-enable external provider module tests that were
previously blocked by these import conflicts.

Changes

- Created llama-stack-api package with minimal dependencies (pydantic, jsonschema)
- Moved APIs, models, providers datatypes, strong_typing, and schema_utils
- Updated all imports from llama_stack.* to llama_stack_api.*
- Preserved git history using git mv for moved files
- Configured local editable install for development workflow
- Updated linting and type-checking configuration for both packages
- Rebased on top of upstream src/ layout changes

Testing

Package builds successfully and can be imported independently.
All pre-commit hooks pass with expected exclusions maintained.

Next Steps

- Publish llama-stack-api to PyPI
- Update external provider dependencies
- Re-enable external provider module tests

Signed-off-by: Charlie Doern <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants