llamastack · cdoern · Oct 27, 2025
@@ -30,13 +30,16 @@ jobs:
         activate-environment: true
         version: 0.7.6
 
+    - name: Build Llama Stack Spec package
+      working-directory: src/llama-stack-api
+      run: uv build
+
     - name: Build Llama Stack package
-      run: |
-        uv build
+      run: uv build
 
-    - name: Install Llama Stack package
+    - name: Install Llama Stack package (with spec from local build)
       run: |
-        uv pip install dist/*.whl
+        uv pip install --find-links src/llama-stack-api/dist dist/*.whl
 
     - name: Verify Llama Stack package
       run: |

@@ -42,7 +42,7 @@ repos:
     hooks:
     -   id: ruff
         args: [ --fix ]
-        exclude: ^src/llama_stack/strong_typing/.*$
+        exclude: ^(src/llama_stack/strong_typing/.*|src/llama-stack-api/llama_stack_api/strong_typing/.*)$
     -   id: ruff-format
 
 -   repo: https://github.com/adamchainz/blacken-docs

@@ -8,7 +8,7 @@ The Llama Stack provides a comprehensive set of APIs organized by stability leve
 
 These APIs are fully tested, documented, and stable. They follow semantic versioning principles and maintain backward compatibility within major versions. Recommended for production applications.
 
-[**Browse Stable APIs →**](./api/llama-stack-specification)
+[**Browse Stable APIs →**](./api/llama-stack-apiification)
 
 **Key Features:**
 - ✅ Backward compatibility guaranteed
@@ -24,7 +24,7 @@ These APIs are fully tested, documented, and stable. They follow semantic versio
 
 These APIs include v1alpha and v1beta endpoints that are feature-complete but may undergo changes based on feedback. Great for exploring new capabilities and providing feedback.
 
-[**Browse Experimental APIs →**](./api-experimental/llama-stack-specification-experimental-apis)
+[**Browse Experimental APIs →**](./api-experimental/llama-stack-apiification-experimental-apis)
 
 **Key Features:**
 - 🧪 Latest features and capabilities
@@ -40,7 +40,7 @@ These APIs include v1alpha and v1beta endpoints that are feature-complete but ma
 
 These APIs are deprecated and will be removed in future versions. They are provided for migration purposes and to help transition to newer, stable alternatives.
 
-[**Browse Deprecated APIs →**](./api-deprecated/llama-stack-specification-deprecated-apis)
+[**Browse Deprecated APIs →**](./api-deprecated/llama-stack-apiification-deprecated-apis)
 
 **Key Features:**
 - ⚠️ Will be removed in future versions

@@ -80,4 +80,4 @@ Build production-ready systems with:
 - **[Getting Started](/docs/getting_started/quickstart)** - Basic setup and concepts
 - **[Providers](/docs/providers/)** - Available AI service providers
 - **[Distributions](/docs/distributions/)** - Pre-configured deployment packages
-- **[API Reference](/docs/api/llama-stack-specification)** - Complete API documentation
+- **[API Reference](/docs/api/llama-stack-apiification)** - Complete API documentation
@@ -295,4 +295,4 @@ llama stack run meta-reference
 - **[Agents](./agent)** - Building intelligent agents
 - **[RAG (Retrieval Augmented Generation)](./rag)** - Knowledge-enhanced applications
 - **[Evaluations](./evals)** - Comprehensive evaluation framework
-- **[API Reference](/docs/api/llama-stack-specification)** - Complete API documentation
+- **[API Reference](/docs/api/llama-stack-apiification)** - Complete API documentation
@@ -58,7 +58,7 @@ External APIs must expose a `available_providers()` function in their module tha
 
 ```python
 # llama_stack_api_weather/api.py
-from llama_stack.providers.datatypes import Api, InlineProviderSpec, ProviderSpec
+from llama_stack_api.providers.datatypes import Api, InlineProviderSpec, ProviderSpec
 
 
 def available_providers() -> list[ProviderSpec]:
@@ -79,7 +79,7 @@ A Protocol class like so:
 # llama_stack_api_weather/api.py
 from typing import Protocol
 
-from llama_stack.schema_utils import webmethod
+from llama_stack_api.schema_utils import webmethod
 
 
 class WeatherAPI(Protocol):
@@ -151,12 +151,12 @@ __all__ = ["WeatherAPI", "available_providers"]
 # llama-stack-api-weather/src/llama_stack_api_weather/weather.py
 from typing import Protocol
 
-from llama_stack.providers.datatypes import (
+from llama_stack_api.providers.datatypes import (
     Api,
     ProviderSpec,
     RemoteProviderSpec,
 )
-from llama_stack.schema_utils import webmethod
+from llama_stack_api.schema_utils import webmethod
 
 
 def available_providers() -> list[ProviderSpec]:

@@ -65,7 +65,7 @@ external_providers_dir: /workspace/providers.d
 Inside `providers.d/custom_ollama/provider.py`, define `get_provider_spec()` so the CLI can discover dependencies:
 
 ```python
-from llama_stack.providers.datatypes import ProviderSpec
+from llama_stack_api.providers.datatypes import ProviderSpec
 
 
 def get_provider_spec() -> ProviderSpec:

@@ -80,7 +80,7 @@ container_image: custom-vector-store:latest  # optional
 All providers must contain a `get_provider_spec` function in their `provider` module. This is a standardized structure that Llama Stack expects and is necessary for getting things such as the config class. The `get_provider_spec` method returns a structure identical to the `adapter`. An example function may look like:
 
 ```python
-from llama_stack.providers.datatypes import (
+from llama_stack_api.providers.datatypes import (
     ProviderSpec,
     Api,
     RemoteProviderSpec,

@@ -153,7 +153,7 @@ description: |
   Example using RAGQueryConfig with different search modes:
 
   ```python
-  from llama_stack.apis.tools import RAGQueryConfig, RRFRanker, WeightedRanker
+  from llama_stack_api.apis.tools import RAGQueryConfig, RRFRanker, WeightedRanker
 
   # Vector search
   config = RAGQueryConfig(mode="vector", max_chunks=5)
@@ -358,7 +358,7 @@ Two ranker types are supported:
 Example using RAGQueryConfig with different search modes:
 
 ```python
-from llama_stack.apis.tools import RAGQueryConfig, RRFRanker, WeightedRanker
+from llama_stack_api.apis.tools import RAGQueryConfig, RRFRanker, WeightedRanker
 
 # Vector search
 config = RAGQueryConfig(mode="vector", max_chunks=5)

@@ -16,7 +16,7 @@
 import fire
 import ruamel.yaml as yaml
 
-from llama_stack.apis.version import LLAMA_STACK_API_V1 # noqa: E402
+from llama_stack_api.apis.version import LLAMA_STACK_API_V1 # noqa: E402
 from llama_stack.core.stack import LlamaStack  # noqa: E402
 
 from .pyopenapi.options import Options  # noqa: E402

@@ -16,10 +16,10 @@
 
 from fastapi import UploadFile
 
-from llama_stack.apis.datatypes import Error
-from llama_stack.strong_typing.core import JsonType
-from llama_stack.strong_typing.docstring import Docstring, parse_type
-from llama_stack.strong_typing.inspection import (
+from llama_stack_api.apis.datatypes import Error
+from llama_stack_api.strong_typing.core import JsonType
+from llama_stack_api.strong_typing.docstring import Docstring, parse_type
+from llama_stack_api.strong_typing.inspection import (
     is_generic_list,
     is_type_optional,
     is_type_union,
@@ -28,15 +28,15 @@
     unwrap_optional_type,
     unwrap_union_types,
 )
-from llama_stack.strong_typing.name import python_type_to_name
-from llama_stack.strong_typing.schema import (
+from llama_stack_api.strong_typing.name import python_type_to_name
+from llama_stack_api.strong_typing.schema import (
     get_schema_identifier,
     JsonSchemaGenerator,
     register_schema,
     Schema,
     SchemaOptions,
 )
-from llama_stack.strong_typing.serialization import json_dump_string, object_to_json
+from llama_stack_api.strong_typing.serialization import json_dump_string, object_to_json
 from pydantic import BaseModel
 
 from .operations import (

@@ -11,19 +11,19 @@
 from dataclasses import dataclass
 from typing import Any, Callable, Dict, Iterable, Iterator, List, Optional, Tuple, Union
 
-from llama_stack.apis.version import LLAMA_STACK_API_V1, LLAMA_STACK_API_V1BETA, LLAMA_STACK_API_V1ALPHA
+from llama_stack_api.apis.version import LLAMA_STACK_API_V1, LLAMA_STACK_API_V1BETA, LLAMA_STACK_API_V1ALPHA
 
 from termcolor import colored
 
-from llama_stack.strong_typing.inspection import get_signature
+from llama_stack_api.strong_typing.inspection import get_signature
 
 from typing import get_origin, get_args
 
 from fastapi import UploadFile
 from fastapi.params import File, Form
 from typing import Annotated
 
-from llama_stack.schema_utils import ExtraBodyField
+from llama_stack_api.schema_utils import ExtraBodyField
 
 
 def split_prefix(
@@ -197,7 +197,7 @@ def _get_defining_class(member_fn: str, derived_cls: type) -> type:
     "Find the class in which a member function is first defined in a class inheritance hierarchy."
 
     # This import must be dynamic here
-    from llama_stack.apis.tools import RAGToolRuntime, ToolRuntime
+    from llama_stack_api.apis.tools import RAGToolRuntime, ToolRuntime
 
     # iterate in reverse member resolution order to find most specific class first
     for cls in reversed(inspect.getmro(derived_cls)):

@@ -9,7 +9,7 @@
 from dataclasses import dataclass
 from typing import Any, ClassVar, Dict, List, Optional, Union
 
-from llama_stack.strong_typing.schema import JsonType, Schema, StrictJsonType
+from llama_stack_api.strong_typing.schema import JsonType, Schema, StrictJsonType
 
 URL = str
 

@@ -11,8 +11,8 @@
 from typing import Any, List, Optional, TextIO, Union, get_type_hints, get_origin, get_args
 
 from pydantic import BaseModel
-from llama_stack.strong_typing.schema import object_to_json, StrictJsonType
-from llama_stack.strong_typing.inspection import is_unwrapped_body_param
+from llama_stack_api.strong_typing.schema import object_to_json, StrictJsonType
+from llama_stack_api.strong_typing.inspection import is_unwrapped_body_param
 from llama_stack.core.resolver import api_protocol_map
 
 from .generator import Generator

@@ -30,6 +30,7 @@ dependencies = [
     "httpx",
     "jinja2>=3.1.6",
     "jsonschema",
+    "llama-stack-api",  # API and provider specifications (local dev via tool.uv.sources)
     "llama-stack-client>=0.3.0",
     "openai>=2.5.0",
     "prompt-toolkit",
@@ -182,7 +183,7 @@ install-wheel-from-presigned = "llama_stack.cli.scripts.run:install_wheel_from_p
 
 [tool.setuptools.packages.find]
 where = ["src"]
-include = ["llama_stack", "llama_stack.*"]
+include = ["llama_stack", "llama_stack.*", "llama-stack-api", "llama-stack-api.*"]
 
 [[tool.uv.index]]
 name = "pytorch-cpu"
@@ -192,6 +193,7 @@ explicit = true
 [tool.uv.sources]
 torch = [{ index = "pytorch-cpu" }]
 torchvision = [{ index = "pytorch-cpu" }]
+llama-stack-api = [{ path = "src/llama-stack-api", editable = true }]
 
 [tool.ruff]
 line-length = 120
@@ -258,8 +260,8 @@ unfixable = [
 ] # Using import * is acceptable (or at least tolerated) in an __init__.py of a package API
 
 [tool.mypy]
-mypy_path = ["src"]
-packages = ["llama_stack"]
+mypy_path = ["src", "src/llama-stack-api"]
+packages = ["llama_stack", "llama_stack_api"]
 plugins = ['pydantic.mypy']
 disable_error_code = []
 warn_return_any = true
@@ -281,16 +283,18 @@ exclude = [
     "^src/llama_stack/core/store/registry\\.py$",
     "^src/llama_stack/core/utils/exec\\.py$",
     "^src/llama_stack/core/utils/prompt_for_config\\.py$",
-    "^src/llama_stack/models/llama/llama3/interface\\.py$",
-    "^src/llama_stack/models/llama/llama3/tokenizer\\.py$",
-    "^src/llama_stack/models/llama/llama3/tool_utils\\.py$",
+    # Moved to llama-stack-api but still excluded
+    "^src/llama-stack-api/llama_stack_api/models/llama/llama3/interface\\.py$",
+    "^src/llama-stack-api/llama_stack_api/models/llama/llama3/tokenizer\\.py$",
+    "^src/llama-stack-api/llama_stack_api/models/llama/llama3/tool_utils\\.py$",
+    "^src/llama-stack-api/llama_stack_api/models/llama/llama3/generation\\.py$",
+    "^src/llama-stack-api/llama_stack_api/models/llama/llama3/multimodal/model\\.py$",
+    "^src/llama-stack-api/llama_stack_api/models/llama/llama4/",
+    "^src/llama-stack-api/llama_stack_api/core/telemetry/telemetry\\.py$",
     "^src/llama_stack/providers/inline/agents/meta_reference/",
     "^src/llama_stack/providers/inline/datasetio/localfs/",
     "^src/llama_stack/providers/inline/eval/meta_reference/eval\\.py$",
     "^src/llama_stack/providers/inline/inference/meta_reference/inference\\.py$",
-    "^src/llama_stack/models/llama/llama3/generation\\.py$",
-    "^src/llama_stack/models/llama/llama3/multimodal/model\\.py$",
-    "^src/llama_stack/models/llama/llama4/",
     "^src/llama_stack/providers/inline/inference/sentence_transformers/sentence_transformers\\.py$",
     "^src/llama_stack/providers/inline/post_training/common/validator\\.py$",
     "^src/llama_stack/providers/inline/safety/code_scanner/",
@@ -339,7 +343,9 @@ exclude = [
     "^src/llama_stack/providers/utils/telemetry/dataset_mixin\\.py$",
     "^src/llama_stack/providers/utils/telemetry/trace_protocol\\.py$",
     "^src/llama_stack/providers/utils/telemetry/tracing\\.py$",
-    "^src/llama_stack/strong_typing/auxiliary\\.py$",
+    "^src/llama-stack-api/llama_stack_api/core/telemetry/trace_protocol\\.py$",
+    "^src/llama-stack-api/llama_stack_api/core/telemetry/tracing\\.py$",
+    "^src/llama-stack-api/llama_stack_api/strong_typing/auxiliary\\.py$",
     "^src/llama_stack/distributions/template\\.py$",
 ]
 

@@ -14,11 +14,10 @@
 from pathlib import Path
 
 import fire
-
-from llama_stack.apis.common.errors import ModelNotFoundError
-from llama_stack.models.llama.llama3.generation import Llama3
-from llama_stack.models.llama.llama4.generation import Llama4
-from llama_stack.models.llama.sku_list import resolve_model
+from llama_stack_api.apis.common.errors import ModelNotFoundError
+from llama_stack_api.models.llama.llama3.generation import Llama3
+from llama_stack_api.models.llama.llama4.generation import Llama4
+from llama_stack_api.models.llama.sku_list import resolve_model
 
 THIS_DIR = Path(__file__).parent.resolve()
 

@@ -22,7 +22,7 @@ def get_api_docstring(api_name: str) -> str | None:
     """Extract docstring from the API protocol class."""
     try:
         # Import the API module dynamically
-        api_module = __import__(f"llama_stack.apis.{api_name}", fromlist=[api_name.title()])
+        api_module = __import__(f"llama_stack_api.apis.{api_name}", fromlist=[api_name.title()])
 
         # Get the main protocol class (usually capitalized API name)
         protocol_class_name = api_name.title()
@@ -83,8 +83,9 @@ def get_config_class_info(config_class_path: str) -> dict[str, Any]:
                 # this string replace is ridiculous
                 field_type = field_type.replace("typing.", "").replace("Optional[", "").replace("]", "")
                 field_type = field_type.replace("Annotated[", "").replace("FieldInfo(", "").replace(")", "")
-                field_type = field_type.replace("llama_stack.apis.inference.inference.", "")
+                field_type = field_type.replace("llama_stack_api.apis.inference.inference.", "")
                 field_type = field_type.replace("llama_stack.providers.", "")
+                field_type = field_type.replace("llama_stack_api.providers.", "")
 
                 default_value = field.default
                 if field.default_factory is not None: