feat: split API and provider specs into separate llama-stack-api pkg #3895

cdoern · 2025-10-23T19:24:22Z

What does this PR do?

Extract API definitions, models, and provider specifications into a standalone llama-stack-api package that can be published to PyPI independently of the main llama-stack server.

see: #2978 and #2978 (comment)

Motivation

External providers currently import from llama-stack, which overrides the installed version and causes dependency conflicts. This separation allows external providers to:

Install only the type definitions they need without server dependencies
Avoid version conflicts with the installed llama-stack package
Be versioned and released independently

This enables us to re-enable external provider module tests that were previously blocked by these import conflicts.

Changes

Created llama-stack-api package with minimal dependencies (pydantic, jsonschema)
Moved APIs, models, providers datatypes, strong_typing, and schema_utils
Updated all imports from llama_stack.* to llama_stack_spec.*
Configured local editable install for development workflow
Updated linting and type-checking configuration for both packages

Notes

the provider.utils.vector_io utility is used heavily in the api.vector_io pkg so I needed to move this. I wonder If it'd be better to instead relocate those utils elsewhere or if it makes sense for the utils for all providers to be in this pkg so external providers can take advantage of them cc @franciscojavierarceo

core.telemetry was moved to llama_stack_api because it is directly used by the apis (trace_protocol) cc @ehhuang @iamemilio

Next Steps

Publish llama-stack-spec to PyPI
Update external provider dependencies
Re-enable external provider module tests

relates to #3237

Test Plan

Package builds successfully and can be imported independently. All pre-commit hooks pass with expected exclusions maintained.

cdoern · 2025-10-24T17:09:42Z

marking this as ready for review to get some eyes on this! I appreciate any and all feedback here and can provider more context if necessary!

raghotham

still wondering if we need to publish a new package - it adds to management overhead.

Would you be open to experiment to see what the experience will be like if we just created llama-stack[api] with the dependencies below and see what the developer experience will be for external providers?

    "pydantic>=2.11.9",
    "jsonschema",
    "opentelemetry-sdk>=1.30.0",
    "opentelemetry-exporter-otlp-proto-http>=1.30.0",

cdoern · 2025-10-24T18:48:28Z

@raghotham sure, let me try in a separate branch. I think an extra like llama-stack[api] might work but I am interested to see how/if we can package only a subset of the code here. I imagine an extra should be able just point to llama_stack_spec as the src dir so it might work perfectly, let me try it out.

ashwinb · 2025-10-24T22:04:31Z

FWIW I'd prefer llama-stack-api to llama-stack-spec from a naming standpoint.

cdoern · 2025-10-27T17:18:41Z

rebasing just to keep up to date. Once I am back from PTO will re-work to do the extra's approach.

cdoern · 2025-10-27T17:32:11Z

@raghotham , @ashwinb , @franciscojavierarceo, @leseb regarding extras vs separate package, here are my findings on why a separate pkg is desirable and actually the only option as opposed to an extra:

Extras vs. Separate Package for llama-stack-api:

Python extras control which dependencies get installed, not which source code is included in the package.

If we used llama-stack[api], running pip install llama-stack[api] would:

Install all code from the llama_stack/ directory (server, CLI, all providers)
Just add extra dependencies specified in the [project.optional-dependencies] section

This defeats the goal of providing a lightweight API-only package for external providers.

Why the Separate Package Approach is Correct

The current llama-stack-spec package (which I can rename to llama-stack-api) achieves the separation we need:

True code isolation: External providers can install only llama-stack-spec and get just the API definitions, not the server/CLI/provider implementations
No import conflicts: Different namespaces (llama_stack_spec.* vs llama_stack.*) prevent overriding the installed llama-stack package
Independent versioning: API specs can be versioned separately from the server

Why the Separate Directory is Required

Each package needs its own pyproject.toml with different dependencies and configuration
They have different package names on PyPI and different module paths
You cannot use extras to include only a subset of source files from a package

This pattern is standard in the Python ecosystem (examples: boto3 vs boto3-stubs, mypy vs mypy-extensions).

As a concrete action in the meantime I can rename this to llama-stack-api. Let me know if this makes sense!

mattf · 2025-10-28T11:16:53Z

stack has a public api with versioning and lifecycle - /v1, /v1alpha, /v1beta. great, hard work.

behind those public apis is a bunch of fast moving implementation code.

@cdoern does this effectively expose that implementation code with a version for use by other projects?

cdoern · 2025-10-28T13:08:39Z

@mattf , the proposal here is to publish llama-stack-api which would expose the API as you have described.

This code though is already exposed in llama-stack and is imported already by external providers. This PR just moves that code to a separately published package that if we choose we can version and release separately from llama-stack. This actually (if we choose) gives us a new level of flexibility for fast changes since the consumers of llama-stack-api can pin to a previous version.

Without this change, external providers will need to keep importing llama-stack as a whole which makes it super hard for those consumers to ship external providers in production use cases.

ashwinb · 2025-10-28T13:11:15Z

I wonder if we could do this a bit more piecemeal as a set of stacked PRs. Right now this feels like moving way too much out perhaps. I am going to think about it a bit more though.

mattf · 2025-10-28T13:58:53Z

@mattf , the proposal here is to publish llama-stack-api which would expose the API as you have described.

This code though is already exposed in llama-stack and is imported already by external providers. This PR just moves that code to a separately published package that if we choose we can version and release separately from llama-stack. This actually (if we choose) gives us a new level of flexibility for fast changes since the consumers of llama-stack-api can pin to a previous version.

Without this change, external providers will need to keep importing llama-stack as a whole which makes it super hard for those consumers to ship external providers in production use cases.

this will move the conflict from the llama-stack package to the llama-stack-api package and open a new category of issues where a user's llama-stack version isn't compatible with their llama-stack-api version.

correct me if i'm wrong, i see four approaches for external providers -

freeze the llama stack version that is supported by the provider (recommended for production deployments)
keep the external provider in sync with llama stack releases (maintenance burden is proportional to amount of internal apis used by provider)
implement external provider without using internal llama stack apis and utils (recommended)
expand the versioning, stability guarantees from the public api to the internal implementation and utils

consider (3) - a responses provider would be implemented against the public chat, prompt, conversations, file api endpoints, instead of against the internal provider apis.

we can make (3) simpler by passing a LlamaStackClient to the external providers instead of internal provider apis.

i recommend we do not do the package split.

cdoern · 2025-10-28T14:29:20Z

@mattf

responding to each option:

freeze the llama stack version that is supported by the provider (recommended for production deployments)

This would be ideal, but is probably not realistic given the current rate of development of LLS

keep the external provider in sync with llama stack releases (maintenance burden is proportional to amount of internal apis used by provider)

same reason as above

implement external provider without using internal llama stack apis and utils (recommended)

I have a question on this: what do you mean by "public chat, prompt, conversations, file api endpoints, instead of against the internal provider apis.". There might be a differentiation I am missing here, but this PR moves llama_stack.apis... moving things like inference.py, responses.py, etc. Are these the "public" APIs you are referring to? If so, this is what I am aiming to move here. There are providers.datatypes and providers.utils I am moving but those are a consequence of the fact that we allow external providers. Those providers need access to these datatypes in order to conform to our process of validating APIs, providers, and standing up a stack.

Option 3 would be great but I imagine would require a re-architecture of how we register external providers and our APIs in general during stack standup. The reason external providers based around our internal provider datatypes is because that is the only way we can verify they are valid implementations and the stack will work.

Would it make more sense to instead standardize our Provider Datatypes enabling the External Provider ecosystem to have a solid foundation on which folks can build providers. Otherwise, if we keep these types "internal" it's really a sneaky way of allowing breaking changes to publicly consumed API types that we really should have some backwards compatibility on.

expand the versioning, stability guarantees from the public api to the internal implementation and utils

This is kind of what I describe above in my response to 3. Folks are already consuming providers.Datatypes.... Expanding the stability guarantee seems like the best option. Having a separate package to ship the provider datatypes + the public APIs seems reasonable because otherwise folks are requiring llama-stack which has a whole host of other dependencies just to implement their external providers, APIs, etc.

I see the version incompatibility issue between llama-stack and llama-stack-api and understand the issues that inherently come with that. But I think given we ensure backwards compatibility for the public API already, there should be a pretty solid upgrade path between z-streams of llama-stack-api and llama-stack. But yes, we'd need to ensure provider datatypes between versions as well. As mentioned above though, people are already consuming those types and have had a bumpy ride between z-streams so we need probably need to ensure them!

cdoern · 2025-10-28T14:44:37Z

If the alternative here is to standardize:

our provider datatypes (RemoteProviderSpec, InlineProviderSpec, ProviderSpec) between z-streams
our public API between z-streams
out external provider registration process between z-streams

we can also go that route @mattf !

I do think though that a package with less deps than the entire server of llama-stack would make more sense.

Additionally, the issue of llama-stack server being a different version than what an external-provider is up to will still be problematic, especially when it comes to testing these providers. Presumably our llama-stack-api package should be compatible with an entire llama-stack Y-stream.

mattf · 2025-10-28T15:01:04Z

@mattf

responding to each option:

freeze the llama stack version that is supported by the provider (recommended for production deployments)
1. This would be ideal, but is probably not realistic given the current rate of development of LLS
keep the external provider in sync with llama stack releases (maintenance burden is proportional to amount of internal apis used by provider)
2. same reason as above

that's the crux. the internal development speed of stack should not be limited by the development speed of external providers.

it'd be very reasonable for a production system to standardize on 0.3.0 and make sure all the supported external providers are compatible.

there's value in that productization of llama stack.

implement external provider without using internal llama stack apis and utils (recommended)
3. I have a question on this: what do you mean by "public chat, prompt, conversations, file api endpoints, instead of against the internal provider apis.". There might be a differentiation I am missing here, but this PR moves `llama_stack.apis...` moving things like `inference.py`, `responses.py`, etc. Are these the "public" APIs you are referring to? If so, this is what I am aiming to move here. There are `providers.datatypes` and `providers.utils` I am moving but those are a consequence of the fact that we allow external providers. Those providers need access to these datatypes in order to conform to our process of validating APIs, providers, and standing up a stack.
Option 3 would be great but I imagine would require a re-architecture of how we register external providers and our APIs in general during stack standup. The reason external providers based around our internal provider datatypes is because that is the only way we can verify they are valid implementations and the stack will work.

Would it make more sense to instead standardize our Provider Datatypes enabling the External Provider ecosystem to have a solid foundation on which folks can build providers. Otherwise, if we keep these types "internal" it's really a sneaky way of allowing breaking changes to publicly consumed API types that we really should have some backwards compatibility on.

the public apis i'm referring to are the ones you just painstakingly worked to define, e.g http://localhost:8321/v1/completions etc.

the other side of giving an external provider a LlamaStackClient is that the external provider can be a separate process. stack already has the concept of a passthrough provider.

expand the versioning, stability guarantees from the public api to the internal implementation and utils
4. This is kind of what I describe above in my response to 3. Folks are already consuming `providers.Datatypes...`. Expanding the stability guarantee seems like the best option. Having a separate package to ship the provider datatypes + the public APIs seems reasonable because otherwise folks are requiring `llama-stack` which has a whole host of other dependencies just to implement their external providers, APIs, etc.
I see the version incompatibility issue between llama-stack and llama-stack-api and understand the issues that inherently come with that. But I think given we ensure backwards compatibility for the public API already, there should be a pretty solid upgrade path between z-streams of llama-stack-api and llama-stack. But yes, we'd need to ensure provider datatypes between versions as well. As mentioned above though, people are already consuming those types and have had a bumpy ride between z-streams so we need probably need to ensure them!

i don't think we should do (4).

it should be possible to achieve fast internal development and stability for external providers.

ashwinb · 2025-10-28T15:15:16Z

I think extension "ABI" is a legitimate need if you want to enable extensions. However the approach the PR takes may need to be thought through. We have to be careful to expose absolutely minimal things. This PR seeems to move way too many things potentially.

ashwinb · 2025-10-28T15:24:50Z

I guess we need to see what some of the external providers look like. What "extension points" are they using? Without that information we'd be designing this in a vacuum.

cdoern · 2025-10-28T15:44:02Z

@ashwinb that is a fair point. I can likely decrease the surface of this down to the following:

the public API definitions: eg. https://github.com/llamastack/llama-stack/tree/main/src/llama_stack/apis these routes are what the external provider needs to implement: https://github.com/trustyai-explainability/llama-stack-provider-lmeval/blob/80f137a903ed961d3ca4e5ba78ff07f2a9b0f64e/src/llama_stack_provider_lmeval/lmeval.py#L1199
The datatypes in providers.datatypes:

llama-stack/src/llama_stack/providers/datatypes.py

Line 100 in e5ca7e6

class ProviderSpec(BaseModel):

the ProviderSpec type is a necessity for get_provider_spec

Anything else I moved is a dependent of those two. for example strong_typing in its entirety is primarily used by api with @json_schema_type, etc. The other issue here is the usage of telemetry in @trace_protocol which is found in all of the APIs. trace_protocol imports llama_stack.core.telemetry.tracing as well. This is likely a decoupling issue we need to fix.

The provider utilities I moved are ones used directly in the API pkg. This is probably something to fix rather than to move. Most of these were from the vector_io API. I can work on fixing this in a precursor PR to this one if that'd make sense.

My intended scope here was just the apis pkg and the provider datatypes. So I guess we could scope some pre-work here to unwind anything else that is moved here as a requirement that we don't want in the new sub-package?

ashwinb · 2025-10-28T16:25:06Z

@cdoern I think that list is quite reasonable and you are exactly right that there's a bunch of things we should fix so we don't need to move as much into the extension support.

ruivieira · 2025-10-28T23:05:23Z

@cdoern (with my provider developer hat on) I like the idea, especially isolating external providers from internal changes in z-stream releases (if the llama-stack-api doesn't change). But external providers should make sure to only have a llama-stack-api dependency. Having dependencies on both internal llama-stack and llama-stack-api would only make the problem even worse...
This might also simplify provider upstream testing.

Extract API definitions, models, and provider specifications into a standalone llama-stack-api package that can be published to PyPI independently of the main llama-stack server. Motivation External providers currently import from llama-stack, which overrides the installed version and causes dependency conflicts. This separation allows external providers to: - Install only the type definitions they need without server dependencies - Avoid version conflicts with the installed llama-stack package - Be versioned and released independently This enables us to re-enable external provider module tests that were previously blocked by these import conflicts. Changes - Created llama-stack-api package with minimal dependencies (pydantic, jsonschema) - Moved APIs, models, providers datatypes, strong_typing, and schema_utils - Updated all imports from llama_stack.* to llama_stack_api.* - Preserved git history using git mv for moved files - Configured local editable install for development workflow - Updated linting and type-checking configuration for both packages - Rebased on top of upstream src/ layout changes Testing Package builds successfully and can be imported independently. All pre-commit hooks pass with expected exclusions maintained. Next Steps - Publish llama-stack-api to PyPI - Update external provider dependencies - Re-enable external provider module tests Signed-off-by: Charlie Doern <[email protected]>

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 23, 2025

cdoern force-pushed the api-pkg branch 6 times, most recently from 431d00f to 2477826 Compare October 24, 2025 17:05

cdoern marked this pull request as ready for review October 24, 2025 17:08

cdoern requested review from ashwinb, bbrowning, ehhuang, franciscojavierarceo, hardikjshah, leseb, mattf, raghotham, reluctantfuturist, slekkala1, terrytangyuan and yanxi0830 as code owners October 24, 2025 17:09

raghotham reviewed Oct 24, 2025

View reviewed changes

cdoern force-pushed the api-pkg branch from 2477826 to 7225edd Compare October 27, 2025 17:18

cdoern force-pushed the api-pkg branch from 7225edd to 48c3a9b Compare October 27, 2025 17:26

cdoern force-pushed the api-pkg branch from 48c3a9b to 2361777 Compare October 27, 2025 20:30

cdoern changed the title ~~feat: split API and provider specs into separate llama-stack-spec pkg~~ feat: split API and provider specs into separate llama-stack-api pkg Oct 27, 2025

cdoern force-pushed the api-pkg branch 7 times, most recently from b8a7a20 to d1df2e4 Compare October 28, 2025 00:29

cdoern force-pushed the api-pkg branch from d1df2e4 to 83fd219 Compare October 28, 2025 14:15

cdoern force-pushed the api-pkg branch 2 times, most recently from d67be1a to 29a322c Compare October 28, 2025 23:23

cdoern force-pushed the api-pkg branch from 29a322c to cb47362 Compare October 29, 2025 00:12

cdoern mentioned this pull request Oct 29, 2025

fix: remove generate_chunk_id from providers.utils #3954

Open

Uh oh!

feat: split API and provider specs into separate llama-stack-api pkg #3895

Are you sure you want to change the base?

feat: split API and provider specs into separate llama-stack-api pkg #3895

Conversation

cdoern commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Test Plan

Uh oh!

cdoern commented Oct 24, 2025

Uh oh!

raghotham left a comment

Choose a reason for hiding this comment

Uh oh!

cdoern commented Oct 24, 2025

Uh oh!

ashwinb commented Oct 24, 2025

Uh oh!

cdoern commented Oct 27, 2025

Uh oh!

cdoern commented Oct 27, 2025

Uh oh!

mattf commented Oct 28, 2025

Uh oh!

cdoern commented Oct 28, 2025

Uh oh!

ashwinb commented Oct 28, 2025

Uh oh!

mattf commented Oct 28, 2025

Uh oh!

cdoern commented Oct 28, 2025

Uh oh!

cdoern commented Oct 28, 2025

Uh oh!

mattf commented Oct 28, 2025

Uh oh!

ashwinb commented Oct 28, 2025

Uh oh!

ashwinb commented Oct 28, 2025

Uh oh!

cdoern commented Oct 28, 2025

Uh oh!

ashwinb commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ruivieira commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

cdoern commented Oct 23, 2025 •

edited

Loading

ashwinb commented Oct 28, 2025 •

edited

Loading

ruivieira commented Oct 28, 2025 •

edited

Loading