ntsh #4

derekhiggins · 2025-08-13T10:22:01Z

No description provided.

…lamastack#3438) Bumps [next](https://github.com/vercel/next.js) from 15.3.3 to 15.5.3. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/vercel/next.js/releases">next's releases</a>.</em></p> <blockquote> <h2>v15.5.3</h2> <blockquote> <p>[!NOTE]<br /> This release is backporting bug fixes. It does <strong>not</strong> include all pending features/changes on canary.</p> </blockquote> <h3>Core Changes</h3> <ul> <li>fix: validation return types of pages API routes (<a href="https://redirect.github.com/vercel/next.js/issues/83069">#83069</a>)</li> <li>fix: relative paths in dev in validator.ts (<a href="https://redirect.github.com/vercel/next.js/issues/83073">#83073</a>)</li> <li>fix: remove satisfies keyword from type validation to preserve old TS compatibility (<a href="https://redirect.github.com/vercel/next.js/issues/83071">#83071</a>)</li> </ul> <h3>Credits</h3> <p>Huge thanks to <a href="https://github.com/bgub"><code>@bgub</code></a> for helping!</p> <h2>v15.5.2</h2> <blockquote> <p>[!NOTE]<br /> This release is backporting bug fixes. It does <strong>not</strong> include all pending features/changes on canary.</p> </blockquote> <h3>Core Changes</h3> <ul> <li>fix: disable unknownatrules lint rule entirely (<a href="https://redirect.github.com/vercel/next.js/issues/83059">#83059</a>)</li> <li>revert: add ?dpl to fonts in /_next/static/media (<a href="https://redirect.github.com/vercel/next.js/issues/83062">#83062</a>)</li> </ul> <h3>Credits</h3> <p>Huge thanks to <a href="https://github.com/bgub"><code>@bgub</code></a> and <a href="https://github.com/ztanner"><code>@ztanner</code></a> for helping!</p> <h2>v15.5.1</h2> <blockquote> <p>[!NOTE]<br /> This release is backporting bug fixes. It does <strong>not</strong> include all pending features/changes on canary.</p> </blockquote> <h3>Core Changes</h3> <ul> <li>fix: aliased navigations should apply scroll handling (<a href="https://redirect.github.com/vercel/next.js/issues/82900">#82900</a>)</li> <li>Turbopack: fix invalid NFT entry with file behind symlink (<a href="https://redirect.github.com/vercel/next.js/issues/82887">#82887</a>)</li> <li>fix: typesafe linking to route handlers and pages API routes (<a href="https://redirect.github.com/vercel/next.js/issues/82858">#82858</a>)</li> <li>fix: change "noUnknownAtRules" to "warn" for Biome (<a href="https://redirect.github.com/vercel/next.js/issues/82974">#82974</a>)</li> <li>fix: add path normalization to getRelativePath for Windows (<a href="https://redirect.github.com/vercel/next.js/issues/82918">#82918</a>)</li> <li>feat: add typesafety with config.typedRoutes to redirect() and permanentRedirect() (<a href="https://redirect.github.com/vercel/next.js/issues/82860">#82860</a>)</li> <li>fix: avoid importing types that will be unused (<a href="https://redirect.github.com/vercel/next.js/issues/82856">#82856</a>)</li> <li>fix: update the config.api.responseLimit type (<a href="https://redirect.github.com/vercel/next.js/issues/82852">#82852</a>)</li> <li>fix: update validation return types (<a href="https://redirect.github.com/vercel/next.js/issues/82854">#82854</a>)</li> </ul> <h3>Credits</h3> <p>Huge thanks to <a href="https://github.com/bgub"><code>@bgub</code></a>, <a href="https://github.com/mischnic"><code>@mischnic</code></a>, and <a href="https://github.com/ztanner"><code>@ztanner</code></a> for helping!</p> <h2>v15.5.1-canary.39</h2> <h3>Core Changes</h3> <ul> <li>[metadata] change the metadata routes params to promises: <a href="https://redirect.github.com/vercel/next.js/issues/83560">#83560</a></li> </ul>  </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/vercel/next.js/commit/07d1cbc9c6393b5e7972edc7c0e33587b79f9943"><code>07d1cbc</code></a> v15.5.3</li> <li><a href="https://github.com/vercel/next.js/commit/db56d7759546c0447e9435c36c0b94e19d59409a"><code>db56d77</code></a> [backport] fix: validation return types of pages API routes (<a href="https://redirect.github.com/vercel/next.js/issues/83069">#83069</a>) (<a href="https://redirect.github.com/vercel/next.js/issues/83580">#83580</a>)</li> <li><a href="https://github.com/vercel/next.js/commit/7a806231f85a370a81f47170f0c426240fd58c8e"><code>7a80623</code></a> [backport] fix: relative paths in dev in validator.ts (<a href="https://redirect.github.com/vercel/next.js/issues/83073">#83073</a>) (<a href="https://redirect.github.com/vercel/next.js/issues/83190">#83190</a>)</li> <li><a href="https://github.com/vercel/next.js/commit/fddaeb85a0ca57fc9ae89dea4f987eb4f432e8a2"><code>fddaeb8</code></a> [backport] fix: remove <code>satisfies</code> keyword from type validation to preserve o...</li> <li><a href="https://github.com/vercel/next.js/commit/497ec6aa08a33f9e2d65a5c8461f550c2549d3e6"><code>497ec6a</code></a> v15.5.2</li> <li><a href="https://github.com/vercel/next.js/commit/bc72f41a2e66c16b8d8237c9e9020dcda9c5467f"><code>bc72f41</code></a> [backport] revert: add ?dpl to fonts in <code>/_next/static/media</code> (<a href="https://redirect.github.com/vercel/next.js/issues/83062">#83062</a>) (<a href="https://redirect.github.com/vercel/next.js/issues/83066">#83066</a>)</li> <li><a href="https://github.com/vercel/next.js/commit/c8faf6800b1e4e01807642d288b5894b3481ec5f"><code>c8faf68</code></a> [backport] fix: disable unknownatrules lint rule entirely (<a href="https://redirect.github.com/vercel/next.js/issues/83059">#83059</a>) (<a href="https://redirect.github.com/vercel/next.js/issues/83060">#83060</a>)</li> <li><a href="https://github.com/vercel/next.js/commit/cc68ced55210aca1716daabefb5aa2006bc3d024"><code>cc68ced</code></a> v15.5.1</li> <li><a href="https://github.com/vercel/next.js/commit/1ce9857276d1e348776dc61837692ee85a5401a7"><code>1ce9857</code></a> [backport] fix: update validation return types (<a href="https://redirect.github.com/vercel/next.js/issues/82854">#82854</a>) (<a href="https://redirect.github.com/vercel/next.js/issues/83027">#83027</a>)</li> <li><a href="https://github.com/vercel/next.js/commit/b93c89471755ba10e09ab0064c697c5ee35054d5"><code>b93c894</code></a> [backport] fix: update the config.api.responseLimit type (<a href="https://redirect.github.com/vercel/next.js/issues/82852">#82852</a>) (<a href="https://redirect.github.com/vercel/next.js/issues/83028">#83028</a>)</li> <li>Additional commits viewable in <a href="https://github.com/vercel/next.js/compare/v15.3.3...v15.5.3">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=next&package-manager=npm_and_yarn&previous-version=15.3.3&new-version=15.5.3)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

…lama_stack/ui (llamastack#3437) Bumps [@radix-ui/react-select](https://github.com/radix-ui/primitives) from 2.2.5 to 2.2.6. <details> <summary>Commits</summary> <ul> <li>See full diff in <a href="https://github.com/radix-ui/primitives/commits">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=@radix-ui/react-select&package-manager=npm_and_yarn&previous-version=2.2.5&new-version=2.2.6)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

…lamastack#3442) # What does this PR do? the @required_args decorator in openai-python is masking the async nature of the {AsyncCompletions,chat.AsyncCompletions}.create method. see openai/openai-python#996 this means two things - 0. we cannot use iscoroutine in the recorder to detect async vs non 1. our mocks are inappropriately introducing identifiable async for (0), we update the iscoroutine check w/ detection of /v1/models, which is the only non-async function we mock & record. for (1), we could leave everything as is and assume (0) will catch errors. to be defensive, we update the unit tests to mock below create methods, allowing the true openai-python create() methods to be tested.

…nchmark resources in Llama Stack (llamastack#3371) # What does this PR do?  This PR provides functionality for users to unregister ScoringFn and Benchmark resources for `scoring` and `eval` APIs.   Closes llamastack#3051 ## Test Plan  Updated integration and unit tests via CI workflow

…tack#3417) # What does this PR do? adds dynamic model support to TGI add new overwrite_completion_id feature to OpenAIMixin to deal with TGI always returning id="" ## Test Plan tgi: `docker run --gpus all --shm-size 1g -p 8080:80 -v /data:/data ghcr.io/huggingface/text-generation-inference --model-id Qwen/Qwen3-0.6B` stack: `TGI_URL=http://localhost:8080 uv run llama stack build --image-type venv --distro ci-tests --run` test: `./scripts/integration-tests.sh --stack-config http://localhost:8321 --setup tgi --subdirs inference --pattern openai`

# What does this PR do? use a logger * update the distro to add the Files API otherwise it won't start since it is a dependency of vector * clarify project_id and api_key requirements * disable openai compatible calls since the endpoint returns 404 * disable text_inference structured format tests * fixed openai client initialization ## Test Plan Execute text_inference: ``` WATSONX_API_KEY=... WATSONX_PROJECT_ID=... python -m llama_stack.core.server.server llama_stack/distributions/watsonx/run.yaml LLAMA_STACK_CONFIG=http://localhost:8321 uv run --group test pytest -vvvv -ra --text-model watsonx/meta-llama/llama-3-3-70b-instruct tests/integration/inference/test_text_inference.py ============================================= test session starts ============================================== platform darwin -- Python 3.12.8, pytest-8.4.2, pluggy-1.6.0 -- /Users/leseb/Documents/AI/llama-stack/.venv/bin/python3 cachedir: .pytest_cache metadata: {'Python': '3.12.8', 'Platform': 'macOS-15.6.1-arm64-arm-64bit', 'Packages': {'pytest': '8.4.2', 'pluggy': '1.6.0'}, 'Plugins': {'anyio': '4.9.0', 'html': '4.1.1', 'socket': '0.7.0', 'asyncio': '1.1.0', 'json-report': '1.5.0', 'timeout': '2.4.0', 'metadata': '3.1.1', 'cov': '6.2.1', 'nbval': '0.11.0', 'hydra-core': '1.3.2'}} rootdir: /Users/leseb/Documents/AI/llama-stack configfile: pyproject.toml plugins: anyio-4.9.0, html-4.1.1, socket-0.7.0, asyncio-1.1.0, json-report-1.5.0, timeout-2.4.0, metadata-3.1.1, cov-6.2.1, nbval-0.11.0, hydra-core-1.3.2 asyncio: mode=Mode.AUTO, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function collected 20 items tests/integration/inference/test_text_inference.py::test_text_completion_non_streaming[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:completion:sanity] PASSED [ 5%] tests/integration/inference/test_text_inference.py::test_text_completion_streaming[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:completion:sanity] PASSED [ 10%] tests/integration/inference/test_text_inference.py::test_text_completion_stop_sequence[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:completion:stop_sequence] XFAIL [ 15%] tests/integration/inference/test_text_inference.py::test_text_completion_log_probs_non_streaming[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:completion:log_probs] XFAIL [ 20%] tests/integration/inference/test_text_inference.py::test_text_completion_log_probs_streaming[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:completion:log_probs] XFAIL [ 25%] tests/integration/inference/test_text_inference.py::test_text_completion_structured_output[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:completion:structured_output] SKIPPED structured output) [ 30%] tests/integration/inference/test_text_inference.py::test_text_chat_completion_non_streaming[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:chat_completion:non_streaming_01] PASSED [ 35%] tests/integration/inference/test_text_inference.py::test_text_chat_completion_streaming[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:chat_completion:streaming_01] PASSED [ 40%] tests/integration/inference/test_text_inference.py::test_text_chat_completion_with_tool_calling_and_non_streaming[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:chat_completion:tool_calling] PASSED [ 45%] tests/integration/inference/test_text_inference.py::test_text_chat_completion_with_tool_calling_and_streaming[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:chat_completion:tool_calling] PASSED [ 50%] tests/integration/inference/test_text_inference.py::test_text_chat_completion_with_tool_choice_required[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:chat_completion:tool_calling] PASSED [ 55%] tests/integration/inference/test_text_inference.py::test_text_chat_completion_with_tool_choice_none[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:chat_completion:tool_calling] PASSED [ 60%] tests/integration/inference/test_text_inference.py::test_text_chat_completion_structured_output[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:chat_completion:structured_output] SKIPPEDstructured output) [ 65%] tests/integration/inference/test_text_inference.py::test_text_chat_completion_tool_calling_tools_not_in_request[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:chat_completion:tool_calling_tools_absent-True] PASSED [ 70%] tests/integration/inference/test_text_inference.py::test_text_chat_completion_with_multi_turn_tool_calling[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:chat_completion:text_then_tool] XFAIL [ 75%] tests/integration/inference/test_text_inference.py::test_text_chat_completion_non_streaming[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:chat_completion:non_streaming_02] PASSED [ 80%] tests/integration/inference/test_text_inference.py::test_text_chat_completion_streaming[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:chat_completion:streaming_02] PASSED [ 85%] tests/integration/inference/test_text_inference.py::test_text_chat_completion_tool_calling_tools_not_in_request[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:chat_completion:tool_calling_tools_absent-False] PASSED [ 90%] tests/integration/inference/test_text_inference.py::test_text_chat_completion_with_multi_turn_tool_calling[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:chat_completion:tool_then_answer] XFAIL [ 95%] tests/integration/inference/test_text_inference.py::test_text_chat_completion_with_multi_turn_tool_calling[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:chat_completion:array_parameter] XFAIL [100%] =========================================== short test summary info ============================================ SKIPPED [2] tests/integration/inference/test_text_inference.py:49: Model watsonx/meta-llama/llama-3-3-70b-instruct hosted by remote::watsonx doesn't support json_schema structured output XFAIL tests/integration/inference/test_text_inference.py::test_text_completion_stop_sequence[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:completion:stop_sequence] - remote::watsonx doesn't support 'stop' parameter yet XFAIL tests/integration/inference/test_text_inference.py::test_text_completion_log_probs_non_streaming[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:completion:log_probs] - remote::watsonx doesn't support log probs yet XFAIL tests/integration/inference/test_text_inference.py::test_text_completion_log_probs_streaming[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:completion:log_probs] - remote::watsonx doesn't support log probs yet XFAIL tests/integration/inference/test_text_inference.py::test_text_chat_completion_with_multi_turn_tool_calling[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:chat_completion:text_then_tool] - Not tested for non-llama4 models yet XFAIL tests/integration/inference/test_text_inference.py::test_text_chat_completion_with_multi_turn_tool_calling[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:chat_completion:tool_then_answer] - Not tested for non-llama4 models yet XFAIL tests/integration/inference/test_text_inference.py::test_text_chat_completion_with_multi_turn_tool_calling[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:chat_completion:array_parameter] - Not tested for non-llama4 models yet ============================ 12 passed, 2 skipped, 6 xfailed, 14 warnings in 36.88s ============================ ``` --------- Signed-off-by: Sébastien Han <[email protected]>

# What does this PR do? this document outlines different API stability levels, how to enforce them, and next steps ## Next Steps Following the adoption of this document, all existing APIs should follow the enforcement protocol. relates to llamastack#3237 Signed-off-by: Charlie Doern <[email protected]>

# What does this PR do? Pinning to latest pydantic version 2.11.9 as sometime we are picking older version and failing to start container in github actions : https://github.com/llamastack/llama-stack-ops/actions/runs/17750263127 Closes llamastack#3461 ## Test Plan Tested locally with the following commands to start a container Build container `llama stack build --distro starter --image-type container` start container `docker run -d -p 8321:8321 --name llama-stack-test distribution-starter:0.2.21` check health http://localhost:8321/v1/health Couldnt repro with older version(`2.8.2`), but `2.11.9` pydantic is able to start the container https://pypi.org/project/pydantic/#history , 2.11.9 is the latest version

…dapter (llamastack#3458) # What does this PR do? adds embedding and dynamic model support to Together inference adapter - updated to use OpenAIMixin - workarounds for Together api quirks - recordings for together suite when subdirs=inference,pattern=openai ## Test Plan ``` $ TOGETHER_API_KEY=_NONE_ ./scripts/integration-tests.sh --stack-config server:ci-tests --setup together --subdirs inference --pattern openai ... tests/integration/inference/test_openai_completion.py::test_openai_completion_non_streaming[txt=together/meta-llama/Llama-3.3-70B-Instruct-Turbo-Free-inference:completion:sanity] instantiating llama_stack_client Port 8321 is already in use, assuming server is already running... llama_stack_client instantiated in 0.121s PASSED [ 2%] tests/integration/inference/test_openai_completion.py::test_openai_completion_non_streaming_suffix[txt=together/meta-llama/Llama-3.3-70B-Instruct-Turbo-Free-inference:completion:suffix] SKIPPED [ 4%] tests/integration/inference/test_openai_completion.py::test_openai_completion_streaming[txt=together/meta-llama/Llama-3.3-70B-Instruct-Turbo-Free-inference:completion:sanity] PASSED [ 6%] tests/integration/inference/test_openai_completion.py::test_openai_completion_prompt_logprobs[txt=together/meta-llama/Llama-3.3-70B-Instruct-Turbo-Free-1] SKIPPED [ 8%] tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=together/meta-llama/Llama-3.3-70B-Instruct-Turbo-Free] SKIPPED [ 10%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[openai_client-txt=together/meta-llama/Llama-3.3-70B-Instruct-Turbo-Free-inference:chat_completion:non_streaming_01] PASSED [ 12%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[openai_client-txt=together/meta-llama/Llama-3.3-70B-Instruct-Turbo-Free-inference:chat_completion:streaming_01] PASSED [ 14%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[openai_client-txt=together/meta-llama/Llama-3.3-70B-Instruct-Turbo-Free-inference:chat_completion:streaming_01] SKIPPED [ 17%] tests/integration/inference/test_openai_completion.py::test_inference_store[openai_client-txt=together/meta-llama/Llama-3.3-70B-Instruct-Turbo-Free-True] PASSED [ 19%] tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=together/meta-llama/Llama-3.3-70B-Instruct-Turbo-Free-True] PASSED [ 21%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming_with_file[txt=together/meta-llama/Llama-3.3-70B-Instruct-Turbo-Free] SKIPPED [ 23%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_single_string[openai_client-emb=together/togethercomputer/m2-bert-80M-32k-retrieval] PASSED [ 25%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_multiple_strings[openai_client-emb=together/togethercomputer/m2-bert-80M-32k-retrieval] PASSED [ 27%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_encoding_format_float[openai_client-emb=together/togethercomputer/m2-bert-80M-32k-retrieval] PASSED [ 29%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_dimensions[openai_client-emb=together/togethercomputer/m2-bert-80M-32k-retrieval] SKIPPED [ 31%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_user_parameter[openai_client-emb=together/togethercomputer/m2-bert-80M-32k-retrieval] SKIPPED [ 34%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_empty_list_error[openai_client-emb=together/togethercomputer/m2-bert-80M-32k-retrieval] PASSED [ 36%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_invalid_model_error[openai_client-emb=together/togethercomputer/m2-bert-80M-32k-retrieval] PASSED [ 38%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_different_inputs_different_outputs[openai_client-emb=together/togethercomputer/m2-bert-80M-32k-retrieval] PASSED [ 40%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_encoding_format_base64[openai_client-emb=together/togethercomputer/m2-bert-80M-32k-retrieval] SKIPPED [ 42%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_base64_batch_processing[openai_client-emb=together/togethercomputer/m2-bert-80M-32k-retrieval] SKIPPED [ 44%] tests/integration/inference/test_openai_completion.py::test_openai_completion_prompt_logprobs[txt=together/meta-llama/Llama-3.3-70B-Instruct-Turbo-Free-0] SKIPPED [ 46%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[openai_client-txt=together/meta-llama/Llama-3.3-70B-Instruct-Turbo-Free-inference:chat_completion:non_streaming_02] PASSED [ 48%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[openai_client-txt=together/meta-llama/Llama-3.3-70B-Instruct-Turbo-Free-inference:chat_completion:streaming_02] PASSED [ 51%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[openai_client-txt=together/meta-llama/Llama-3.3-70B-Instruct-Turbo-Free-inference:chat_completion:streaming_02] SKIPPED [ 53%] tests/integration/inference/test_openai_completion.py::test_inference_store[openai_client-txt=together/meta-llama/Llama-3.3-70B-Instruct-Turbo-Free-False] PASSED [ 55%] tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=together/meta-llama/Llama-3.3-70B-Instruct-Turbo-Free-False] PASSED [ 57%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_single_string[llama_stack_client-emb=together/togethercomputer/m2-bert-80M-32k-retrieval] PASSED [ 59%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_multiple_strings[llama_stack_client-emb=together/togethercomputer/m2-bert-80M-32k-retrieval] PASSED [ 61%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_encoding_format_float[llama_stack_client-emb=together/togethercomputer/m2-bert-80M-32k-retrieval] PASSED [ 63%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_dimensions[llama_stack_client-emb=together/togethercomputer/m2-bert-80M-32k-retrieval] SKIPPED [ 65%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_user_parameter[llama_stack_client-emb=together/togethercomputer/m2-bert-80M-32k-retrieval] SKIPPED [ 68%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_empty_list_error[llama_stack_client-emb=together/togethercomputer/m2-bert-80M-32k-retrieval] PASSED [ 70%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_invalid_model_error[llama_stack_client-emb=together/togethercomputer/m2-bert-80M-32k-retrieval] PASSED [ 72%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_different_inputs_different_outputs[llama_stack_client-emb=together/togethercomputer/m2-bert-80M-32k-retrieval] PASSED [ 74%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_encoding_format_base64[llama_stack_client-emb=together/togethercomputer/m2-bert-80M-32k-retrieval] SKIPPED [ 76%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_base64_batch_processing[llama_stack_client-emb=together/togethercomputer/m2-bert-80M-32k-retrieval] SKIPPED [ 78%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[client_with_models-txt=together/meta-llama/Llama-3.3-70B-Instruct-Turbo-Free-inference:chat_completion:non_streaming_01] PASSED [ 80%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[client_with_models-txt=together/meta-llama/Llama-3.3-70B-Instruct-Turbo-Free-inference:chat_completion:streaming_01] PASSED [ 82%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[client_with_models-txt=together/meta-llama/Llama-3.3-70B-Instruct-Turbo-Free-inference:chat_completion:streaming_01] SKIPPED [ 85%] tests/integration/inference/test_openai_completion.py::test_inference_store[client_with_models-txt=together/meta-llama/Llama-3.3-70B-Instruct-Turbo-Free-True] PASSED [ 87%] tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=together/meta-llama/Llama-3.3-70B-Instruct-Turbo-Free-True] PASSED [ 89%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[client_with_models-txt=together/meta-llama/Llama-3.3-70B-Instruct-Turbo-Free-inference:chat_completion:non_streaming_02] PASSED [ 91%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[client_with_models-txt=together/meta-llama/Llama-3.3-70B-Instruct-Turbo-Free-inference:chat_completion:streaming_02] PASSED [ 93%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[client_with_models-txt=together/meta-llama/Llama-3.3-70B-Instruct-Turbo-Free-inference:chat_completion:streaming_02] SKIPPED [ 95%] tests/integration/inference/test_openai_completion.py::test_inference_store[client_with_models-txt=together/meta-llama/Llama-3.3-70B-Instruct-Turbo-Free-False] PASSED [ 97%] tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=together/meta-llama/Llama-3.3-70B-Instruct-Turbo-Free-False] PASSED [100%] ============================================ 30 passed, 17 skipped, 50 deselected, 4 warnings in 21.96s ============================================= ```

# What does this PR do? Modified the code in registry.py. The key changes are: 1. Removed the `return False` statement 2. Added a warning log message that includes the object type, identifier, and provider_id for better debugging. 3. The method now continues with the registration process instead of early returning. --------- Co-authored-by: Omar Abdelwahab <[email protected]>

Add default value for PR_HEAD_REPO to prevent 'unbound variable' error when no PR exists for a branch. Signed-off-by: Derek Higgins <[email protected]>

# What does this PR do? Fixes this warning in llama stack build: ```bash WARNING 2025-09-15 15:29:02,197 llama_stack.core.distribution:149 core: Failed to import module prompts: No module named 'llama_stack.providers.registry.prompts'" ``` ## Test Plan Test added --------- Signed-off-by: Francisco Javier Arceo <[email protected]>

# What does this PR do? * Updates documentation links from readthedocs to llamastack.github.io ## Test Plan * Manual testing

…mastack#3472) # What does this PR do? When registering a dataset for NVIDIA, the DatasetsRoutingTable expects `nvidia` to be passed via the `provider_id` [here](https://github.com/llamastack/llama-stack/blob/main/llama_stack/core/routing_tables/datasets.py#L61). This PR fixes a notebook to correctly use `provider_id`.  Closes llamastack#3308 ## Test Plan Manually execute the notebook steps to verify the dataset is registered. Co-authored-by: Jash Gulabrai <[email protected]>

) # What does this PR do? Updates the qdrant provider's convert_id function to use a FIPS-validated cryptographic hashing function, so that llama-stack is considered to be `Designed for FIPS`. The standard library `uuid.uuid5()` function uses SHA-1 under the hood, which is not FIPS-validated. This commit uses an approach similar to the one merged in llamastack#3423. Closes llamastack#3476. ## Test Plan Unit tests from scripts/unit-tests.sh were ran to verify that the tests pass. A small test script can display the data flow: ```python import hashlib import uuid # Input _id = "chunk_abc123" print(_id) # Step 1: Format and encode hash_input = f"qdrant_id:{_id}".encode() print(hash_input) # Result: b'qdrant_id:chunk_abc123' # Step 2: SHA-256 hash sha256_hash = hashlib.sha256(hash_input).hexdigest() print(sha256_hash) # Result: "184893a6eafeaac487cb9166351e8625b994d50f3456d8bc6cea32a014a27151" # Step 3: Create UUID from first 32 chars uuid_string = str(uuid.UUID(sha256_hash[:32])) print(uuid_string) # sha256_hash[:32] = "184893a6eafeaac487cb9166351e8625" # Final result: "184893a6-eafe-aac4-87cb-9166351e8625" ``` Signed-off-by: Doug Edgar <[email protected]>

…lamastack#3388) # What does this PR do? *Add dynamic authentication token forwarding support for vLLM provider* This enables per-request authentication tokens for vLLM providers, supporting use cases like RAG operations where different requests may need different authentication tokens. The implementation follows the same pattern as other providers like Together AI, Fireworks, and Passthrough. - Add LiteLLMOpenAIMixin that manages the vllm_api_token properly Usage: - Static: VLLM_API_TOKEN env var or config.api_token - Dynamic: X-LlamaStack-Provider-Data header with vllm_api_token All existing functionality is preserved while adding new dynamic capabilities.    ## Test Plan  ``` curl -X POST "http://localhost:8000/v1/chat/completions" -H "Authorization: Bearer my-dynamic-token" \ -H "X-LlamaStack-Provider-Data: {\"vllm_api_token\": \"Bearer my-dynamic-token\", \"vllm_url\": \"http://dynamic-server:8000\"}" \ -H "Content-Type: application/json" \ -d '{"model": "llama-3.1-8b", "messages": [{"role": "user", "content": "Hello!"}]}' ``` --------- Signed-off-by: Akram Ben Aissi <[email protected]>

# What does this PR do? this replaces the static model listing for any provider using OpenAIMixin currently - - anthropic - azure openai - gemini - groq - llama-api - nvidia - openai - sambanova - tgi - vertexai - vllm - not changed: together has its own impl ## Test Plan - new unit tests - manual for llama-api, openai, groq, gemini ``` for provider in llama-openai-compat openai groq gemini; do uv run llama stack build --image-type venv --providers inference=remote::provider --run & uv run --with llama-stack-client llama-stack-client models list | grep Total ``` results (17 sep 2025): - llama-api: 4 - openai: 86 - groq: 21 - gemini: 66 closes llamastack#3467

…-compat functions (llamastack#3395) # What does this PR do? update Ollama inference provider to use OpenAIMixin for openai-compat endpoints ## Test Plan ci

# What does this PR do? The rag-runtime tool requires files API as a dependency, but the NVIDIA distribution was missing the files provider configuration. Thus, when running: ``` llama stack build --distro nvidia --image-type venv ``` And then: ``` llama stack run {path_to_distribution_config} --image-type venv ``` It would raise an error: ``` RuntimeError: Failed to resolve 'tool_runtime' provider 'rag-runtime' of type 'inline::rag-runtime': required dependency 'files' is not available. Please add a 'files' provider to your configuration or check if the provider is properly configured. ``` This PR fixes the issue by adding missing files provider to NVIDIA distribution. ## Test Plan N/A

# What does this PR do? currently `RemoteProviderSpec` has an `AdapterSpec` embedded in it. Remove `AdapterSpec`, and put its leftover fields into `RemoteProviderSpec`. Additionally, many of the fields were duplicated between `InlineProviderSpec` and `RemoteProviderSpec`. Move these to `ProviderSpec` so they are shared. Fixup the distro codegen to use `RemoteProviderSpec` directly rather than `remote_provider_spec` which took an AdapterSpec and returned a full provider spec ## Test Plan existing distro tests should pass. Signed-off-by: Charlie Doern <[email protected]>

# What does this PR do? As shown in llamastack#3421, we can scale stack to handle more RPS with k8s replicas. This PR enables multi process stack with uvicorn --workers so that we can achieve the same scaling without being in k8s. To achieve that we refactor main to split out the app construction logic. This method needs to be non-async. We created a new `Stack` class to house impls and have a `start()` method to be called in lifespan to start background tasks instead of starting them in the old `construct_stack`. This way we avoid having to manage an event loop manually. ## Test Plan CI > uv run --with llama-stack python -m llama_stack.core.server.server benchmarking/k8s-benchmark/stack_run_config.yaml works. > LLAMA_STACK_CONFIG=benchmarking/k8s-benchmark/stack_run_config.yaml uv run uvicorn llama_stack.core.server.server:create_app --port 8321 --workers 4 works.

# What does this PR do? This PR fixes a blocking issue in the detailed RAG tutorial where the code fails with a 400 Bad Request error. The root cause is that recent versions of Llama-Stack ignore the client-generated vector_db_id and assign a new server-side ID. The tutorial was not updated to reflect this, causing the rag_tool.insert call to fail. This change updates the code to capture the authoritative ID from the .identifier attribute of the register() method's response. This ensures the tutorial code runs successfully and reflects the current API behavior. ## Test Plan The fix can be verified by running the Python code snippet from the detailed tutorial page. Run the original code (Before this change): Result: The script fails with a 400 Bad Request error on the rag_tool.insert step. Run the updated code (After this change): Result: The script runs successfully to completion. Co-authored-by: Adam Young <[email protected]>

…_stack/ui (llamastack#3789) Bumps [@types/react-dom](https://github.com/DefinitelyTyped/DefinitelyTyped/tree/HEAD/types/react-dom) from 19.2.0 to 19.2.1. <details> <summary>Commits</summary> <ul> <li>See full diff in <a href="https://github.com/DefinitelyTyped/DefinitelyTyped/commits/HEAD/types/react-dom">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=@types/react-dom&package-manager=npm_and_yarn&previous-version=19.2.0&new-version=19.2.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

…a_stack/ui (llamastack#3792) Bumps [framer-motion](https://github.com/motiondivision/motion) from 12.23.12 to 12.23.24. <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/motiondivision/motion/blob/main/CHANGELOG.md">framer-motion's changelog</a>.</em></p> <blockquote> <h2>[12.23.24] 2025-10-10</h2> <h3>Fixed</h3> <ul> <li>Ensure that when a component remounts, it continues to fire animations even when <code>initial={false}</code>.</li> </ul> <h2>[12.23.23] 2025-10-10</h2> <h3>Added</h3> <ul> <li>Exporting <code>PresenceChild</code> and <code>PopChild</code> type for internal use.</li> </ul> <h2>[12.23.22] 2025-09-25</h2> <h3>Added</h3> <ul> <li>Exporting <code>HTMLElements</code> and <code>useComposedRefs</code> type for internal use.</li> </ul> <h2>[12.23.21] 2025-09-24</h2> <h3>Fixed</h3> <ul> <li>Fixing main-thread <code>scroll</code> with animations that contain <code>delay</code>.</li> </ul> <h2>[12.23.20] 2025-09-24</h2> <h3>Fixed</h3> <ul> <li>Suppress non-animatable value warning for instant animations.</li> </ul> <h2>[12.23.19] 2025-09-23</h2> <h3>Fixed</h3> <ul> <li>Remove support for changing <code>ref</code> prop.</li> </ul> <h2>[12.23.18] 2025-09-19</h2> <h3>Fixed</h3> <ul> <li><code><motion /></code> components now support changing <code>ref</code> prop.</li> </ul> <h2>[12.23.17] 2025-09-19</h2> <h3>Fixed</h3> <ul> <li>Ensure <code>animate()</code> <code>onComplete</code> only fires once, when all values are complete.</li> </ul> <h2>[12.23.16] 2025-09-19</h2>  </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/motiondivision/motion/commit/b5df740a4649ee64a07523fc9f362c56f240eb3f"><code>b5df740</code></a> v12.23.24</li> <li><a href="https://github.com/motiondivision/motion/commit/808ebce630eb8dd75716190b41a0ecb3b7dab56f"><code>808ebce</code></a> Updating changelog</li> <li><a href="https://github.com/motiondivision/motion/commit/237eee22464039152dd3acf122496503b3f3a5da"><code>237eee2</code></a> v12.23.23</li> <li><a href="https://github.com/motiondivision/motion/commit/834965c8031118bd156292644ac94ef5cb5b45e8"><code>834965c</code></a> Updating changelog</li> <li><a href="https://github.com/motiondivision/motion/commit/40690864e92cf976e2dd19cdd41ec27b67e91f66"><code>4069086</code></a> Update README.md</li> <li><a href="https://github.com/motiondivision/motion/commit/6da6b61e9424380b35e0b2d41ab6e42be020d65b"><code>6da6b61</code></a> Update README.md with new sponsor links</li> <li><a href="https://github.com/motiondivision/motion/commit/e36683149d738b36237b10fb86ff0a64d0e39c03"><code>e366831</code></a> Update README.md</li> <li><a href="https://github.com/motiondivision/motion/commit/7796f4f1e079ccdffea0c0d1abab32bc08871be8"><code>7796f4f</code></a> Update Gold section with new links and images</li> <li><a href="https://github.com/motiondivision/motion/commit/d1bb93757c22afe2251f1088e4c785800bbb3fef"><code>d1bb937</code></a> Update sponsor section in README.md</li> <li><a href="https://github.com/motiondivision/motion/commit/97fba16059d072b79741c54689fede81d0b632ad"><code>97fba16</code></a> Update sponsorship logos in README</li> <li>Additional commits viewable in <a href="https://github.com/motiondivision/motion/compare/v12.23.12...v12.23.24">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=framer-motion&package-manager=npm_and_yarn&previous-version=12.23.12&new-version=12.23.24)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

…3796) # What does this PR do? Remove usage of deprecated `Message` from Safety apis ## Test Plan CI

…lamastack#3794) Applies the same pattern from llamastack#3777 to embeddings and vector_stores.create() endpoints. This should _not_ be a breaking change since (a) our tests were already using the `extra_body` parameter when passing in to the backend (b) but the backend probably wasn't extracting the parameters correctly. This PR will fix that. Updated APIs: `openai_embeddings(), openai_create_vector_store(), openai_create_vector_store_file_batch()`

## Test Plan added unit tests

Fixed CI job to check the correct directory for file changes Artifacts are now stored in multiple directories not just ./tests/integration/recordings Signed-off-by: Derek Higgins <[email protected]>

llamastack#3802) # What does this PR do? 2 main changes: 1. Remove `provider_id` requirement in call to vector stores and 2. Removes "register first embedding model" logic - Now forces embedding model id as required on Vector Store creation Simplifies the UX for OpenAI to: ```python vs = client.vector_stores.create( name="my_citations_db", extra_body={ "embedding_model": "ollama/nomic-embed-text:latest", } ) ```   ## Test Plan  --------- Signed-off-by: Francisco Javier Arceo <[email protected]>

# What does this PR do? This commit migrates the authentication system from python-jose to PyJWT to eliminate the dependency on the archived rsa package. The migration includes: - Refactored OAuth2TokenAuthProvider to use PyJWT's PyJWKClient for clean JWKS handling - Removed manual JWKS fetching, caching and key extraction logic in favor of PyJWT's built-in functionality The new implementation is cleaner, more maintainable, and follows PyJWT best practices while maintaining full backward compatibility. ## Test Plan Unit tests. Auth CI. --------- Signed-off-by: Sébastien Han <[email protected]>

# What does this PR do? This PR fixes issues with the WatsonX provider so it works correctly with LiteLLM. The main problem was that WatsonX requests failed because the provider data validator didn’t properly handle the API key and project ID. This was fixed by updating the WatsonXProviderDataValidator and ensuring the provider data is loaded correctly. The openai_chat_completion method was also updated to match the behavior of other providers while adding WatsonX-specific fields like project_id. It still calls await super().openai_chat_completion.__func__(self, params) to keep the existing setup and tracing logic. After these changes, WatsonX requests now run correctly. ## Test Plan The changes were tested by running chat completion requests and confirming that credentials and project parameters are passed correctly. I have tested with my WatsonX credentials, by using the cli with `uv run llama-stack-client inference chat-completion --session` --------- Signed-off-by: Sébastien Han <[email protected]> Co-authored-by: Sébastien Han <[email protected]>

…mbed-text-v1.5 in Llama Stack (llamastack#3183) # What does this PR do?  The purpose of this PR is to replace the Llama Stack's default embedding model by nomic-embed-text-v1.5. These are the key reasons why Llama Stack community decided to switch from all-MiniLM-L6-v2 to nomic-embed-text-v1.5: 1. The training data for [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2#training-data) includes a lot of data sets with various licensing terms, so it is tricky to know when/whether it is appropriate to use this model for commercial applications. 2. The model is not particularly competitive on major benchmarks. For example, if you look at the [MTEB Leaderboard](https://huggingface.co/spaces/mteb/leaderboard) and click on Miscellaneous/BEIR to see English information retrieval accuracy, you see that the top of the leaderboard is dominated by enormous models but also that there are many, many models of relatively modest size whith much higher Retrieval scores. If you want to look closely at the data, I recommend clicking "Download Table" because it is easier to browse that way. More discussion info can be founded [here](llamastack#2418)   Closes llamastack#2418 ## Test Plan  1. Run `./scripts/unit-tests.sh` 2. Integration tests via CI wokrflow --------- Signed-off-by: Sébastien Han <[email protected]> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Francisco Arceo <[email protected]> Co-authored-by: Sébastien Han <[email protected]>

…3807) # What does this PR do? Updates CONTRIBUTING.md with the following changes: - Use Python 3.12 (and why) - Use pre-commit==4.3.0 - Recommend using -v with pre-commit to get detailed info about why it is failing if it fails. - Instructs users to go to the docs/ directory before rebuilding the docs (it doesn't work unless you do that). Signed-off-by: Bill Murdock <[email protected]>

@iamemilio

) # What does this PR do? As discussed on discord, we do not need to reinvent the wheel for telemetry. Instead we'll lean into the canonical OTEL stack. Logs/traces/metrics will still be sent via OTEL - they just won't be stored on, queried through Stack. This is the first of many PRs to remove telemetry API from Stack. 1) removed webmethod decorators to remove from API spec 2) removed tests as @iamemilio is adding them on otel directly. ## Test Plan

…ric embedding models for NVIDIA Inference Provider (llamastack#3804) # What does this PR do?  Previously, the NVIDIA inference provider implemented a custom `openai_embeddings` method with a hardcoded `input_type="query"` parameter, which is required by NVIDIA asymmetric embedding models([https://github.com/llamastack/llama-stack/pull/3205](https://github.com/llamastack/llama-stack/pull/3205)). Recently `extra_body` parameter is added to the embeddings API ([https://github.com/llamastack/llama-stack/pull/3794](https://github.com/llamastack/llama-stack/pull/3794)). So, this PR updates the NVIDIA inference provider to use the base `OpenAIMixin.openai_embeddings` method instead and pass the `input_type` through the `extra_body` parameter for asymmetric embedding models.   ## Test Plan  Run the following command for the ```embedding_model```: ```nvidia/llama-3.2-nv-embedqa-1b-v2```, ```nvidia/nv-embedqa-e5-v5```, ```nvidia/nv-embedqa-mistral-7b-v2```, and ```snowflake/arctic-embed-l```. ``` pytest -s -v tests/integration/inference/test_openai_embeddings.py --stack-config="inference=nvidia" --embedding-model={embedding_model} --env NVIDIA_API_KEY={nvidia_api_key} --env NVIDIA_BASE_URL="https://integrate.api.nvidia.com" --inference-mode=record ```

…ck#3803) # What does this PR do? Enables automatic embedding model detection for vector stores and by using a `default_configured` boolean that can be defined in the `run.yaml`.   ## Test Plan - Unit tests - Integration tests - Simple example below: Spin up the stack: ```bash uv run llama stack build --distro starter --image-type venv --run ``` Then test with OpenAI's client: ```python from openai import OpenAI client = OpenAI(base_url="http://localhost:8321/v1/", api_key="none") vs = client.vector_stores.create() ``` Previously you needed: ```python vs = client.vector_stores.create( extra_body={ "embedding_model": "sentence-transformers/all-MiniLM-L6-v2", "embedding_dimension": 384, } ) ``` The `extra_body` is now unnecessary. --------- Signed-off-by: Francisco Javier Arceo <[email protected]>

…stack#3811) # What does this PR do? Support reading embedding model and dimensions from metadata for vector store ## Test Plan Unit Tests

…metadata keys (llamastack#3813) # Add support for Google Gemini `gemini-embedding-001` embedding model and correctly registers model type MR message created with the assistance of Claude-4.5-sonnet This resolves llamastack#3755 ## What does this PR do? This PR adds support for the `gemini-embedding-001` Google embedding model to the llama-stack Gemini provider. This model provides high-dimensional embeddings (3072 dimensions) compared to the existing `text-embedding-004` model (768 dimensions). Old embeddings models (such as text-embedding-004) will be deprecated soon according to Google ([Link](https://developers.googleblog.com/en/gemini-embedding-available-gemini-api/)) ## Problem The Gemini provider only supported the `text-embedding-004` embedding model. The newer `gemini-embedding-001` model, which provides higher-dimensional embeddings for improved semantic representation, was not available through llama-stack. ## Solution This PR consists of three commits that implement, fix the model registration, and enable embedding generation: ### Commit 1: Initial addition of gemini-embedding-001 Added metadata for `gemini-embedding-001` to the `embedding_model_metadata` dictionary: ```python embedding_model_metadata: dict[str, dict[str, int]] = { "text-embedding-004": {"embedding_dimension": 768, "context_length": 2048}, "gemini-embedding-001": {"embedding_dimension": 3072, "context_length": 2048}, # NEW } ``` **Issue discovered:** The model was not being registered correctly because the dictionary keys didn't match the model IDs returned by Gemini's API. ### Commit 2: Fix model ID matching with `models/` prefix Updated both dictionary keys to include the `models/` prefix to match Gemini's OpenAI-compatible API response format: ```python embedding_model_metadata: dict[str, dict[str, int]] = { "models/text-embedding-004": {"embedding_dimension": 768, "context_length": 2048}, # UPDATED "models/gemini-embedding-001": {"embedding_dimension": 3072, "context_length": 2048}, # UPDATED } ``` **Root cause:** Gemini's OpenAI-compatible API returns model IDs with the `models/` prefix (e.g., `models/text-embedding-004`). The `OpenAIMixin.list_models()` method directly matches these IDs against the `embedding_model_metadata` dictionary keys. Without the prefix, the models were being registered as LLMs instead of embedding models. ### Commit 3: Fix embedding generation for providers without usage stats Fixed a bug in `OpenAIMixin.openai_embeddings()` that prevented embedding generation for providers (like Gemini) that don't return usage statistics: ```python # Before (Line 351-354): usage = OpenAIEmbeddingUsage( prompt_tokens=response.usage.prompt_tokens, # ← Crashed with AttributeError total_tokens=response.usage.total_tokens, ) # After (Lines 351-362): if response.usage: usage = OpenAIEmbeddingUsage( prompt_tokens=response.usage.prompt_tokens, total_tokens=response.usage.total_tokens, ) else: usage = OpenAIEmbeddingUsage( prompt_tokens=0, # Default when not provided total_tokens=0, # Default when not provided ) ``` **Impact:** This fix enables embedding generation for **all** Gemini embedding models, not just the newly added one. ## Changes ### Modified Files **`llama_stack/providers/remote/inference/gemini/gemini.py`** - Line 17: Updated `text-embedding-004` key to `models/text-embedding-004` - Line 18: Added `models/gemini-embedding-001` with correct metadata **`llama_stack/providers/utils/inference/openai_mixin.py`** - Lines 351-362: Added null check for `response.usage` to handle providers without usage statistics ## Key Technical Details ### Model ID Matching Flow 1. `list_provider_model_ids()` calls Gemini's `/v1/models` endpoint 2. API returns model IDs like: `models/text-embedding-004`, `models/gemini-embedding-001` 3. `OpenAIMixin.list_models()` (line 410) checks: `if metadata := self.embedding_model_metadata.get(provider_model_id)` 4. If matched, registers as `model_type: "embedding"` with metadata; otherwise registers as `model_type: "llm"` ### Why Both Keys Needed the Prefix The `text-embedding-004` model was already working because there was likely separate configuration or manual registration handling it. For auto-discovery to work correctly for **both** models, both keys must match the API's model ID format exactly. ## How to test this PR Verified the changes by: 1. **Model Auto-Discovery**: Started llama-stack server and confirmed models are auto-discovered from Gemini API 2. **Model Registration**: Confirmed both embedding models are correctly registered and visible ```bash curl http://localhost:8325/v1/models | jq '.data[] | select(.provider_id == "gemini" and .model_type == "embedding")' ``` **Results:** - ✅ `gemini/models/text-embedding-004` - 768 dimensions - `model_type: "embedding"` - ✅ `gemini/models/gemini-embedding-001` - 3072 dimensions - `model_type: "embedding"` 3. **Before Fix (Commit 1)**: Models appeared as `model_type: "llm"` without embedding metadata 4. **After Fix (Commit 2)**: Models correctly identified as `model_type: "embedding"` with proper metadata 5. **Generate Embeddings**: Verified embedding generation works ```bash curl -X POST http://localhost:8325/v1/embeddings \ -H "Content-Type: application/json" \ -d '{"model": "gemini/models/gemini-embedding-001", "input": "test"}' | \ jq '.data[0].embedding | length' ```

…lamastack#3810) This PR updates the Conversation item related types and improves a couple critical parts of the implemenation: - it creates a streaming output item for the final assistant message output by the model. until now we only added content parts and included that message in the final response. - rewrites the conversation update code completely to account for items other than messages (tool calls, outputs, etc.) ## Test Plan Used the test script from llamastack/llama-stack-client-python#281 for this ``` TEST_API_BASE_URL=http://localhost:8321/v1 \ pytest tests/integration/test_agent_turn_step_events.py::test_client_side_function_tool -xvs ```

…ck#3521) Fixed KeyError when chunks don't have document_id in metadata or chunk_metadata. Updated logging to safely extract document_id using getattr and RAG memory to handle different document_id locations. Added test for missing document_id scenarios. Fixes issue llamastack#3494 where /v1/vector-io/insert would crash with KeyError. Fixed KeyError when chunks don't have document_id in metadata or chunk_metadata. Updated logging to safely extract document_id using getattr and RAG memory to handle different document_id locations. Added test for missing document_id scenarios. # What does this PR do? Fixes a KeyError crash in `/v1/vector-io/insert` when chunks are missing `document_id` fields. The API was failing even though `document_id` is optional according to the schema. Closes llamastack#3494 ## Test Plan **Before fix:** - POST to `/v1/vector-io/insert` with chunks → 500 KeyError - Happened regardless of where `document_id` was placed **After fix:** - Same request works fine → 200 OK - Tested with Postman using FAISS backend - Added unit test covering missing `document_id` scenarios

# What does this PR do? ## Test Plan <img width="1506" height="653" alt="image" src="https://github.com/user-attachments/assets/6c28b8e8-effe-41ab-8e31-72482c05662d" />

…ack#3817) We were generating "FunctionToolCall" items even for MCP (and file-search, etc.) server-side calls. ID mismatches, etc. galore.

…ack#3808) # What does this PR do? - Removed sqlite sink from telemetry config. - Removed related code - Updated doc related to telemetry ## Test Plan CI

…lamastack#3819) Handle a base case when no stored messages exist because no Response call has been made. ## Test Plan ``` ./scripts/integration-tests.sh --stack-config server:ci-tests \ --suite responses --inference-mode record-if-missing --pattern test_conversation_responses ```

# What does this PR do? Have closed the previous PR due to merge conflicts with multiple PRs Addressed all comments from llamastack#3768 (sorry for carrying over to this one) ## Test Plan Added UTs and integration tests

Wanted to re-enable Responses CI but it seems to hang for some reason due to some interactions with conversations_store or responses_store. ## Test Plan ``` # library client ./scripts/integration-tests.sh --stack-config ci-tests --suite responses # server ./scripts/integration-tests.sh --stack-config server:ci-tests --suite responses ```

Its unused Signed-off-by: Derek Higgins <[email protected]>

derekhiggins added the re-record-tests label Aug 13, 2025

derekhiggins force-pushed the ntsh branch 4 times, most recently from 104d6a7 to ace97da Compare August 13, 2025 11:28

derekhiggins force-pushed the main branch from 9614828 to 962e72e Compare August 18, 2025 15:31

dependabot bot and others added 11 commits September 15, 2025 09:46

build: Bump version to 0.2.22

ececc32

derekhiggins force-pushed the ntsh branch from 4c0a91f to e0e2b1b Compare September 17, 2025 07:49

derekhiggins and others added 12 commits September 17, 2025 10:18

fix: unbound variable PR_HEAD_REPO (llamastack#3469)

fad4843

Add default value for PR_HEAD_REPO to prevent 'unbound variable' error when no PR exists for a branch. Signed-off-by: Derek Higgins <[email protected]>

docs: update documentation links (llamastack#3459)

9fe8097

# What does this PR do? * Updates documentation links from readthedocs to llamastack.github.io ## Test Plan * Manual testing

chore: update the ollama inference impl to use OpenAIMixin for openai…

ea396a5

…-compat functions (llamastack#3395) # What does this PR do? update Ollama inference provider to use OpenAIMixin for openai-compat endpoints ## Test Plan ci

dependabot bot and others added 26 commits October 11, 2025 18:00

chore!: Safety api refactoring to use OpenAIMessageParam (llamastack#…

3bb6ef3

…3796) # What does this PR do? Remove usage of deprecated `Message` from Safety apis ## Test Plan CI

feat: Allow :memory: for kvstore (llamastack#3696)

b95f095

## Test Plan added unit tests

fix: record job checking wrong directory (llamastack#3799)

642126e

Fixed CI job to check the correct directory for file changes Artifacts are now stored in multiple directories not just ./tests/integration/recordings Signed-off-by: Derek Higgins <[email protected]>

chore: Support embedding params from metadata for Vector Store (llama…

ce8ea2f

…stack#3811) # What does this PR do? Support reading embedding model and dimensions from metadata for vector store ## Test Plan Unit Tests

chore: mark recordings as generated files (llamastack#3816)

d709eeb

# What does this PR do? ## Test Plan <img width="1506" height="653" alt="image" src="https://github.com/user-attachments/assets/6c28b8e8-effe-41ab-8e31-72482c05662d" />

fix(responses): fix subtle bugs in non-function tool calling (llamast…

0a96a7f

…ack#3817) We were generating "FunctionToolCall" items even for MCP (and file-search, etc.) server-side calls. ID mismatches, etc. galore.

chore!: BREAKING CHANGE: remove sqlite from telemetry config (llamast…

6ba9db3

…ack#3808) # What does this PR do? - Removed sqlite sink from telemetry config. - Removed related code - Updated doc related to telemetry ## Test Plan CI

feat: Add responses and safety impl extra_body (llamastack#3781)

99141c2

# What does this PR do? Have closed the previous PR due to merge conflicts with multiple PRs Addressed all comments from llamastack#3768 (sorry for carrying over to this one) ## Test Plan Added UTs and integration tests

derekhiggins force-pushed the ntsh branch 2 times, most recently from c2952b6 to 018b84c Compare October 16, 2025 09:18

chore: remove test_cases/openai/responses.json

9006b93

Its unused Signed-off-by: Derek Higgins <[email protected]>

derekhiggins force-pushed the ntsh branch from 018b84c to 9006b93 Compare October 16, 2025 09:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ntsh #4

ntsh #4

Uh oh!

derekhiggins commented Aug 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

38 participants

ntsh #4

Are you sure you want to change the base?

ntsh #4

Uh oh!

Conversation

derekhiggins commented Aug 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

38 participants