LCORE-1452: Update llama stack to 0.5.x in rag-content by syedriko · Pull Request #142 · lightspeed-core/rag-content

syedriko · 2026-03-27T14:58:27Z

Description

Upgrade llama stack dependency in LCORE to 0.5.0 to match the rest of Lightspeed Core.

Type of change

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

Assisted-by: (e.g., Claude, CodeRabbit, Ollama, etc., N/A if not used)
Generated by: (e.g., tool name and version; N/A if not used)

Related Tickets & Documents

Related Issue #
Closes # https://redhat.atlassian.net/browse/LCORE-1452

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Please provide detailed steps to perform tests related to this code change.
How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

Release Notes

New Features
- Added embeddings model with sentence-transformer support for enhanced text encoding capabilities
Dependencies
- Updated llama-stack, llama-stack-api, and llama-stack-client to version 0.5.0
- Updated multiple dependencies for improved compatibility and performance
- Added new dependencies: httpx-sse, circuitbreaker, mcp, oracledb, oci, and pyopenssl
Chores
- Updated container image to include embeddings model artifacts
- Updated build tooling configuration and dependency lock files

tisnik

LGTM

coderabbitai · 2026-03-27T15:31:09Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

This PR updates project dependencies, including llama-stack from 0.4.x to 0.5.0, and adds sentence-embedding model training infrastructure. It replaces requests with httpx-sse in Tekton configurations, updates Python requirements files and hashes, introduces a wheel-hash augmentation utility, and adds a complete embeddings model directory with configuration files and a TPU/XLA-based training script.

Changes

Cohort / File(s)	Summary
Tekton Pipeline Configuration `.tekton/rag-tool-pull-request.yaml`, `.tekton/rag-tool-push.yaml`	Removed `requests` from pip binary package whitelist and added `httpx-sse` to both pull-request and push pipeline configurations.
Python Dependency Versions `pyproject.toml`	Updated `llama-stack`, `llama-stack-api`, and `llama-stack-client` from 0.4.x to 0.5.0; tightened `chardet` constraint from `>=5.2.0` to pinned `==7.2.0`.
Build Requirements `requirements-build.txt`	Bumped build tooling versions (`hatchling` 1.26.3→1.29.0, `setuptools-scm` 9.2.2→10.0.5, `setuptools-rust` 1.12.0→1.12.1); added new pinned dependencies (`dunamai==1.26.0`, `jinja2==3.1.6`, `markupsafe==3.0.3`, `tomlkit==0.14.0`, `uv-dynamic-versioning==0.14.0`, `vcs-versioning==1.1.0`); reorganized attribution lines and removed deprecated llama-cloud/llama-parse entries.
Dependency Hash Files `requirements.hashes.source.txt`, `requirements.hashes.wheel.txt`, `requirements.hashes.wheel.pypi.txt`	Updated multiple package pins and SHA-256 hashes (e.g., `faker`, `fastapi`, `googleapis-common-protos`, `llama-index` series, `nltk`, `openai`, `pypdf`); removed `requests` pin and added `httpx-sse==0.4.3`, `circuitbreaker==2.1.3`, `mcp==1.26.0`, `oci==2.168.3`, `oracledb==3.4.2`, `pyopenssl==25.3.0`; updated `starlette` 0.52.1→1.0.0 and added `sse-starlette==3.3.3`; updated `griffe`/`griffecli`/`griffelib` 2.0.0→2.0.2 with revised hash entries.
Build Infrastructure `scripts/augment_wheel_hashes.py`, `scripts/konflux_requirements.sh`	Added new Python utility to augment wheel hash requirements from PyPI metadata for configurable packages (currently `markupsafe`); updated shell script to invoke `uv pip compile` with `--python-platform x86_64-unknown-linux-gnu` (replacing `--universal`), and added post-processing steps for `augment_wheel_hashes.py` and `dedupe_requirements_build.py`.
Container Image `Containerfile`	Added `COPY embeddings_model ./embeddings_model` to include pre-built embeddings model artifacts in the container image.
Embeddings Model Configuration `embeddings_model/.gitattributes`, `embeddings_model/config.json`, `embeddings_model/config_sentence_transformers.json`, `embeddings_model/data_config.json`, `embeddings_model/modules.json`, `embeddings_model/sentence_bert_config.json`, `embeddings_model/special_tokens_map.json`, `embeddings_model/tokenizer_config.json`, `embeddings_model/README.md`, `embeddings_model/version.txt`	Added MPNet sentence-embedding model configuration (`sentence-transformers/all-mpnet-base-v2`) with pooling, tokenizer, and module definitions; Git LFS tracking for large artifacts; comprehensive model card documentation; dataset configuration for 1.17B training tuples from StackExchange and other corpora.
Embeddings Model Training `embeddings_model/train_script.py`	Added TPU/XLA-based training entrypoint with `AutoModelForSentenceEmbedding` wrapper, `train_function()` for multi-device training with CLIP-style loss, and `produce_data()` producer for multiprocessing dataset iteration and batch enqueuing; supports checkpoint saving and configurable batch/step parameters.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~65 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 54.17% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title accurately reflects the main objective: upgrading llama-stack dependencies from 0.4.x to 0.5.0 across the rag-content codebase.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

requirements-build.txt (1)

122-136: Two versions of setuptools-scm present.

The build requirements include both setuptools-scm==10.0.4 and setuptools-scm==9.2.2. While this is auto-generated by pybuild-deps based on different package requirements, having multiple versions could cause build inconsistencies if not handled properly by the build system.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@requirements-build.txt` around lines 122 - 136, The requirements file lists
two conflicting pins for the same package (setuptools-scm==10.0.4 and
setuptools-scm==9.2.2); deduplicate by choosing a single compatible version
(prefer the newer setuptools-scm==10.0.4) and remove the other pin from
requirements-build.txt, or replace both exact pins with a single compatible
constraint (e.g., setuptools-scm>=9.2.2,<11) to satisfy dependent packages;
after editing the setuptools-scm entries, re-run your dependency
resolution/generation step (pybuild-deps/pip-compile) and verify the build still
succeeds to ensure no downstream package requires the older exact version.

scripts/augment_wheel_hashes.py (2)

61-62: Redundant condition in wheel matching.

The condition "py3-none-any" in fn already matches any filename containing that substring, making fn.endswith("-py3-none-any.whl") redundant.

-        elif "py3-none-any" in fn or fn.endswith("-py3-none-any.whl"):
+        elif "py3-none-any" in fn:

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@scripts/augment_wheel_hashes.py` around lines 61 - 62, The condition in the
wheel matching branch is redundant: replace the combined check in the branch
that currently uses '"py3-none-any" in fn or fn.endswith("-py3-none-any.whl")'
(where the variables fn, best, and u are used) with a single, precise test (e.g.
keep only '"py3-none-any" in fn' or only 'fn.endswith("-py3-none-any.whl")') so
the elif becomes a single-condition match and still appends (1, u) to best.

36-46: Consider handling URLError for network failures.

Currently, only HTTPError with code 404 is caught. Network issues like timeouts or DNS failures will raise urllib.error.URLError and crash the script. For a build script, you may want to either catch and log these gracefully or add retry logic.

-    except urllib.error.HTTPError as e:
+    except urllib.error.URLError as e:
+        if isinstance(e, urllib.error.HTTPError) and e.code == 404:
+            return None
+        raise
-        if e.code == 404:
-            return None
-        raise

Alternatively, if network failures should be fatal (current behavior), document this in a docstring.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@scripts/augment_wheel_hashes.py` around lines 36 - 46, The load_pypi_release
function only handles urllib.error.HTTPError 404 but will crash on network
failures (urllib.error.URLError); update load_pypi_release to catch
urllib.error.URLError in addition to HTTPError, and either (a) implement simple
retry logic with exponential backoff (e.g., 3 attempts with increasing sleep)
around urllib.request.urlopen and log the URLError on each failure before
retrying, returning None after exhausting retries, or (b) if you prefer not to
retry, catch URLError, log a clear message and return None; reference the
load_pypi_release function and the urllib.request.urlopen call when making the
change.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/augment_wheel_hashes.py`:
- Around line 1-134: Run black on scripts/augment_wheel_hashes.py to fix
formatting (e.g., run `black scripts/augment_wheel_hashes.py`) and add concise
docstrings for the six public functions parse_pkg_header, load_pypi_release,
linux_cp312_x86_64_wheel_sha256, augment_block, augment_file, and main
describing purpose, args and return types; ensure each docstring is a short
triple-quoted string immediately under the def line to satisfy pydocstyle D103
and then re-run CI checks.

---

Nitpick comments:
In `@requirements-build.txt`:
- Around line 122-136: The requirements file lists two conflicting pins for the
same package (setuptools-scm==10.0.4 and setuptools-scm==9.2.2); deduplicate by
choosing a single compatible version (prefer the newer setuptools-scm==10.0.4)
and remove the other pin from requirements-build.txt, or replace both exact pins
with a single compatible constraint (e.g., setuptools-scm>=9.2.2,<11) to satisfy
dependent packages; after editing the setuptools-scm entries, re-run your
dependency resolution/generation step (pybuild-deps/pip-compile) and verify the
build still succeeds to ensure no downstream package requires the older exact
version.

In `@scripts/augment_wheel_hashes.py`:
- Around line 61-62: The condition in the wheel matching branch is redundant:
replace the combined check in the branch that currently uses '"py3-none-any" in
fn or fn.endswith("-py3-none-any.whl")' (where the variables fn, best, and u are
used) with a single, precise test (e.g. keep only '"py3-none-any" in fn' or only
'fn.endswith("-py3-none-any.whl")') so the elif becomes a single-condition match
and still appends (1, u) to best.
- Around line 36-46: The load_pypi_release function only handles
urllib.error.HTTPError 404 but will crash on network failures
(urllib.error.URLError); update load_pypi_release to catch urllib.error.URLError
in addition to HTTPError, and either (a) implement simple retry logic with
exponential backoff (e.g., 3 attempts with increasing sleep) around
urllib.request.urlopen and log the URLError on each failure before retrying,
returning None after exhausting retries, or (b) if you prefer not to retry,
catch URLError, log a clear message and return None; reference the
load_pypi_release function and the urllib.request.urlopen call when making the
change.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5c35d4e9-bbbb-4ba6-b494-80c7dbf319b6

📥 Commits

Reviewing files that changed from the base of the PR and between 94246bf and 26dd750.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (9)

.tekton/rag-tool-pull-request.yaml
.tekton/rag-tool-push.yaml
pyproject.toml
requirements-build.txt
requirements.hashes.source.txt
requirements.hashes.wheel.pypi.txt
requirements.hashes.wheel.txt
scripts/augment_wheel_hashes.py
scripts/konflux_requirements.sh

scripts/augment_wheel_hashes.py

coderabbitai

🧹 Nitpick comments (1)

scripts/augment_wheel_hashes.py (1)

36-48: Consider logging when PyPI requests fail for non-404 errors.

The function silently re-raises non-404 HTTP errors, which is acceptable since the calling script (konflux_requirements.sh) runs with set -ex. However, a brief log message before re-raising could aid debugging in CI logs.

💡 Optional: Add logging for failed requests

 def load_pypi_release(name: str, version: str) -> dict | None:
     norm = name.replace("_", "-").lower()
     url = f"https://pypi.org/pypi/{norm}/{version}/json"
     req = urllib.request.Request(
         url, headers={"User-Agent": "augment-wheel-hashes (konflux)"}
     )
     try:
         with urllib.request.urlopen(req, timeout=120) as r:
             return json.load(r)
     except urllib.error.HTTPError as e:
         if e.code == 404:
             return None
+        print(f"PyPI request failed for {name}=={version}: {e}", file=sys.stderr)
         raise

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@scripts/augment_wheel_hashes.py` around lines 36 - 48, In load_pypi_release,
add a brief log before re-raising non-404 HTTP errors so CI logs show the failed
PyPI request; catch urllib.error.HTTPError in the existing except block, and
when e.code != 404 call the module logger (e.g., logging.getLogger(__name__)) to
log a concise message that includes the requested URL, the HTTP status (e.code)
and the exception message/traceback, then re-raise the exception so existing
behavior is preserved.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@scripts/augment_wheel_hashes.py`:
- Around line 36-48: In load_pypi_release, add a brief log before re-raising
non-404 HTTP errors so CI logs show the failed PyPI request; catch
urllib.error.HTTPError in the existing except block, and when e.code != 404 call
the module logger (e.g., logging.getLogger(__name__)) to log a concise message
that includes the requested URL, the HTTP status (e.code) and the exception
message/traceback, then re-raise the exception so existing behavior is
preserved.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 48ebe7c8-0e6b-4dd0-b396-9e97f92fbd1c

📥 Commits

Reviewing files that changed from the base of the PR and between 26dd750 and 6824415.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (9)

.tekton/rag-tool-pull-request.yaml
.tekton/rag-tool-push.yaml
pyproject.toml
requirements-build.txt
requirements.hashes.source.txt
requirements.hashes.wheel.pypi.txt
requirements.hashes.wheel.txt
scripts/augment_wheel_hashes.py
scripts/konflux_requirements.sh

✅ Files skipped from review due to trivial changes (1)

requirements.hashes.wheel.pypi.txt

🚧 Files skipped from review as they are similar to previous changes (5)

.tekton/rag-tool-pull-request.yaml
.tekton/rag-tool-push.yaml
requirements.hashes.wheel.txt
scripts/konflux_requirements.sh
pyproject.toml

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pyproject.toml`:
- Around line 50-52: The upgrade to llama-stack 0.5.0 introduces a breaking
change where the Embeddings API rejects explicit nulls; update any calls to
client.embeddings.create() to never pass explicit null/None for optional fields
by building the payload conditionally (filter out keys with None) before calling
embeddings.create(), and similarly ensure client.vector_stores.create() and any
RAGDocument construction (llama_stack_api.rag_tool.RAGDocument) do not include
null fields; also validate AsyncLlamaStackAsLibraryClient initialization and run
the library-client mode tests because the 0.5.0 FastAPI router refactor can
change client behavior—add/adjust unit tests that cover embeddings.create() and
vector store creation to assert no nulls are sent and fix calling code
accordingly.

In `@requirements-build.txt`:
- Around line 122-136: The build requirements file contains conflicting hard
pins for the same packages (e.g., setuptools-scm, setuptools, hatchling) which
breaks the single-file consumption in the build pipeline; update the generation
step rather than hand-editing the file: inspect and fix the upstream source
(requirements.source.txt and the pybuild-deps compile invocation) to ensure
unique resolved pins, or adjust dependency resolution rules so the generator
deduplicates and selects a single coherent version for setuptools-scm,
setuptools, and hatchling; re-run pybuild-deps compile to produce a consistent
requirements-build.txt used by Containerfile and scripts/konflux_requirements.sh
and verify the final file contains only one pin per package.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0fdd3a52-f905-4de6-91d3-f8f2cfb9396e

📥 Commits

Reviewing files that changed from the base of the PR and between 6824415 and 6afbceb.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (9)

.tekton/rag-tool-pull-request.yaml
.tekton/rag-tool-push.yaml
pyproject.toml
requirements-build.txt
requirements.hashes.source.txt
requirements.hashes.wheel.pypi.txt
requirements.hashes.wheel.txt
scripts/augment_wheel_hashes.py
scripts/konflux_requirements.sh

✅ Files skipped from review due to trivial changes (1)

scripts/augment_wheel_hashes.py

🚧 Files skipped from review as they are similar to previous changes (6)

requirements.hashes.wheel.pypi.txt
.tekton/rag-tool-pull-request.yaml
.tekton/rag-tool-push.yaml
requirements.hashes.wheel.txt
scripts/konflux_requirements.sh
requirements.hashes.source.txt

coderabbitai · 2026-03-27T16:08:41Z

pyproject.toml

+    "llama-stack==0.5.0",
+    "llama-stack-api==0.5.0",
+    "llama-stack-client==0.5.0",


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check whether tests cover the exact llama-stack API touchpoints used by runtime code. rg -n -C3 --type=py 'AsyncLlamaStackAsLibraryClient|rag_tool\.RAGDocument|vector_stores\.create\s*\(|embeddings\.create\s*\('

Repository: lightspeed-core/rag-content

Length of output: 4480

🌐 Web query:

`What are the documented breaking changes (if any) between llama-stack 0.4.x and 0.5.0 for:

llama_stack.core.library_client.AsyncLlamaStackAsLibraryClient

llama_stack_api.rag_tool.RAGDocument

client.vector_stores.create(...)

client.embeddings.create(...)
Please include official release notes or API docs links.`

💡 Result:

No documented breaking changes specific to the queried components between llama-stack 0.4.x and 0.5.0. The 0.5.0 release includes general breaking changes from a major architectural refactoring to use FastAPI routers across all APIs, which may indirectly affect client usage: - Post-Training API endpoints restructured (path params) #4606 - Embeddings API rejects explicit null for optional fields #4644 - Safety API provider interface changed to request objects #4643 - Meta-Reference GPU inference provider removed #4828 - Scope-based endpoint authorization removed #4734 For client.embeddings.create(...): The Embeddings API change (#4644) is a breaking change - explicit null values for optional fields are now rejected. For the library client (llama_stack.core.library_client.AsyncLlamaStackAsLibraryClient): Early development fixes predate 0.5.0, but the FastAPI refactor may impact internal routing used by the library client mode. No specific mentions found for: - llama_stack.core.library_client.AsyncLlamaStackAsLibraryClient breaking changes - llama_stack_api.rag_tool.RAGDocument changes (RAG tool docs reference it in pre-0.5 contexts, with migration notes to vector stores) - client.vector_stores.create(...) breaking changes (Vector stores API evolved with OpenAI compatibility enhancements pre-0.5) Official sources: - Release notes: https://github.com/llamastack/llama-stack/blob/main/RELEASE_NOTES.md - Releases page: https://github.com/llamastack/llama-stack/releases - PyPI versions confirm 0.5.0 released 2026-02-05

Citations:

1: https://github.com/llamastack/llama-stack/blob/main/RELEASE_NOTES.md

2: https://github.com/meta-llama/llama-stack/releases

3: https://pypi.org/project/llama-stack/

Verify embeddings.create() compatibility with llama-stack 0.5.0 null handling change.

Lines 50–52 upgrade llama-stack to 0.5.0. Runtime code uses four llama-stack APIs:

AsyncLlamaStackAsLibraryClient (line 357)

llama_stack_api.rag_tool.RAGDocument (line 360)

client.vector_stores.create() (lines 473–576)

client.embeddings.create() (line 537)

Tests exist for client/document instantiation, but llama-stack 0.5.0 includes a documented breaking change (#4644): the Embeddings API now rejects explicit null values for optional fields. Line 537 calls embeddings.create() with parameters that may be affected. Additionally, the 0.5.0 FastAPI router refactor may impact the library client mode beyond what is currently documented. Pre-merge tests did not fully pass—review test coverage for these specific APIs before merging.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pyproject.toml` around lines 50 - 52, The upgrade to llama-stack 0.5.0 introduces a breaking change where the Embeddings API rejects explicit nulls; update any calls to client.embeddings.create() to never pass explicit null/None for optional fields by building the payload conditionally (filter out keys with None) before calling embeddings.create(), and similarly ensure client.vector_stores.create() and any RAGDocument construction (llama_stack_api.rag_tool.RAGDocument) do not include null fields; also validate AsyncLlamaStackAsLibraryClient initialization and run the library-client mode tests because the 0.5.0 FastAPI router refactor can change client behavior—add/adjust unit tests that cover embeddings.create() and vector store creation to assert no nulls are sent and fix calling code accordingly.

requirements-build.txt

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

scripts/dedupe_requirements_build.py (1)

61-78: ⚠️ Potential issue | 🟠 Major

Don't auto-resolve conflicting build pins with max().

When pybuild-deps emits foo==A and foo==B, that is a real constraint mismatch. Keeping the numerically highest version hides the conflict and can leave the build image with a version some backend does not accept. This helper should only merge exact duplicates, or fail generation when versions differ.

Suggested change

 def dedupe_pin_blocks(blocks: list[list[str]]) -> list[str]:
-    """Pick the highest version and merge ``# via`` comment lines from all blocks."""
-
-    def version_key(b: list[str]) -> Version:
-        parsed = parse_pin_line(b[0])
-        assert parsed is not None
-        return Version(parsed[1])
-
-    winner = max(blocks, key=version_key)
+    """Merge duplicate blocks for the same exact pin."""
+    parsed_versions: list[str] = []
+    for block in blocks:
+        parsed = parse_pin_line(block[0])
+        assert parsed is not None
+        parsed_versions.append(parsed[1])
+    versions = sorted(set(parsed_versions), key=Version)
+    if len(versions) != 1:
+        raise ValueError(f"conflicting build pins: {versions}")
+
+    winner = blocks[0]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@scripts/dedupe_requirements_build.py` around lines 61 - 78, The current
dedupe_pin_blocks uses max() to pick a "winner" version which silently hides
conflicting pins; change dedupe_pin_blocks (and its inner version_key / use of
parse_pin_line) to first parse all block versions and check they are all
identical, and if not raise a clear exception (e.g., ValueError) indicating the
conflicting versions for the package instead of choosing the highest; only when
all parsed versions match should you proceed to merge the "# via" tail lines and
return [winner_line, *merged_tail]; also ensure parse_pin_line failures are
handled deterministically (fail fast) rather than asserting.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/konflux_requirements.sh`:
- Around line 85-88: The PyPI wheel lockfile isn't being architecture-augmented:
after generating the PyPI lockfile (variable WHEEL_HASH_FILE_PYPI) with uv pip
compile, call the same augmenter used for RHOAI by running
augment_wheel_hashes.py on WHEEL_HASH_FILE_PYPI (i.e., invoke
scripts/augment_wheel_hashes.py "$WHEEL_HASH_FILE_PYPI") so the PyPI lockfile
receives aarch64-specific manylinux hashes like requirements.hashes.wheel.txt
does.

---

Duplicate comments:
In `@scripts/dedupe_requirements_build.py`:
- Around line 61-78: The current dedupe_pin_blocks uses max() to pick a "winner"
version which silently hides conflicting pins; change dedupe_pin_blocks (and its
inner version_key / use of parse_pin_line) to first parse all block versions and
check they are all identical, and if not raise a clear exception (e.g.,
ValueError) indicating the conflicting versions for the package instead of
choosing the highest; only when all parsed versions match should you proceed to
merge the "# via" tail lines and return [winner_line, *merged_tail]; also ensure
parse_pin_line failures are handled deterministically (fail fast) rather than
asserting.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 883cad03-a7f9-4113-b2c6-1b6a6c3b00ab

📥 Commits

Reviewing files that changed from the base of the PR and between 6afbceb and 1b618b6.

📒 Files selected for processing (3)

requirements-build.txt
scripts/dedupe_requirements_build.py
scripts/konflux_requirements.sh

coderabbitai · 2026-03-27T17:16:20Z

scripts/konflux_requirements.sh

+# RHOAI-only --generate-hashes can omit the hash for the Linux cp312 manylinux wheel pip actually installs.
+uv pip compile "$WHEEL_FILE" --refresh --generate-hashes --index-url $RHOAI_INDEX_URL --python-version 3.12 --python-platform x86_64-unknown-linux-gnu --emit-index-url --no-deps --no-annotate > "$WHEEL_HASH_FILE"
+uv run python scripts/augment_wheel_hashes.py "$WHEEL_HASH_FILE"
+uv pip compile "$WHEEL_FILE_PYPI" --refresh --generate-hashes --python-version 3.12 --python-platform x86_64-unknown-linux-gnu --emit-index-url --no-deps --no-annotate > "$WHEEL_HASH_FILE_PYPI"


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail printf '=== wheel lockfile consumers ===\n' rg -n -C2 'requirements\.hashes\.wheel(\.pypi)?\.txt' printf '\n=== architecture/platform references ===\n' rg -n -C2 '(aarch64|arm64|x86_64|amd64|linux/(arm64|amd64)|python-platform|platforms?)'

Repository: lightspeed-core/rag-content

Length of output: 50384

🏁 Script executed:

# Check if there are any aarch64-specific lock generation or platform handling rg -n 'aarch64.*lock|lock.*aarch64|python-platform.*aarch64|--platform.*aarch64' scripts/ rg -n 'separate|arm64|aarch64' scripts/konflux_requirements.sh # Check if PyPI wheels naturally include aarch64 wheels or if augmentation is critical head -30 requirements.hashes.wheel.pypi.txt

Repository: lightspeed-core/rag-content

Length of output: 2161

🏁 Script executed:

# Check for CI/build config showing builder architecture constraints find . -name "*.yml" -o -name "*.yaml" | head -20 rg -l 'workflow|pipeline|build' --type yaml | head -10 rg -n 'aarch64|arm64|x86.64|arch.*=|platform.*=' .tekton* .github* 2>/dev/null | head -20 # Check Containerfile to see if it's multi-arch grep -n 'FROM\|RUN\|COPY' Containerfile | head -30

Repository: lightspeed-core/rag-content

Length of output: 2546

The PyPI wheel lockfile lacks architecture-aware hash augmentation for aarch64 builders.

The pipeline builds for both x86_64 and aarch64 (confirmed in .tekton/rag-tool-push.yaml lines 28–29, 64 and .tekton/rag-tool-pull-request.yaml lines 31–32, 68), but augment_wheel_hashes.py is invoked only for requirements.hashes.wheel.txt (RHOAI), not for requirements.hashes.wheel.pypi.txt (PyPI). Both lockfiles are compiled with --python-platform x86_64-unknown-linux-gnu and installed on all builder architectures via the shared Containerfile.

When aarch64 builders install from requirements.hashes.wheel.pypi.txt, they will not have aarch64-specific manylinux wheel hashes that the augmenter would normally provide. Apply the augmentation step to the PyPI lockfile as well, or generate a separate aarch64-specific lockfile path for multi-arch deployments.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scripts/konflux_requirements.sh` around lines 85 - 88, The PyPI wheel lockfile isn't being architecture-augmented: after generating the PyPI lockfile (variable WHEEL_HASH_FILE_PYPI) with uv pip compile, call the same augmenter used for RHOAI by running augment_wheel_hashes.py on WHEEL_HASH_FILE_PYPI (i.e., invoke scripts/augment_wheel_hashes.py "$WHEEL_HASH_FILE_PYPI") so the PyPI lockfile receives aarch64-specific manylinux hashes like requirements.hashes.wheel.txt does.

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (3)

embeddings_model/train_script.py (3)

246-276: Complex iteration logic - consider adding clarifying comments.

The Dataset.__iter__ method has nuanced behavior:

Small datasets (< 10M lines) are cached in memory and shuffled each epoch
Large datasets are streamed directly, re-reading the file each epoch

The condition while dataset is None or len(dataset) == 0 on line 251 handles the case where the dataset was too large to cache. Consider adding a brief comment explaining this behavior for maintainability.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@embeddings_model/train_script.py` around lines 246 - 276, The __iter__
method's control flow around dataset, max_dataset_size and the loop "while
dataset is None or len(dataset) == 0" is subtle and needs a concise clarifying
comment: update train_script.py to add a brief comment above that while-loop
explaining that small files (<= max_dataset_size) are cached into the list
`dataset` and then shuffled each epoch, while large files set `dataset = None`
to indicate streaming mode so the file is re-read on each pass; mention that
`self.filepath` is re-opened for streaming and that `dataset is None` is the
signal to stop caching and switch to direct streaming.

18-21: Duplicate import of torch.

torch is imported twice (lines 18 and 21). Remove the duplicate.

♻️ Proposed fix

 import tqdm
 import torch
 from torch import nn
 from torch.utils.data import DataLoader
-import torch
 import torch_xla

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@embeddings_model/train_script.py` around lines 18 - 21, The file currently
imports torch twice; remove the redundant "import torch" duplicate so there is
only one top-level "import torch" alongside "from torch import nn" and "from
torch.utils.data import DataLoader"; update the import block in train_script.py
to have a single torch import and ensure there are no other repeated imports of
the same symbol.

334-339: Process cleanup issues with fork and kill().

Two concerns with process management:

start_method='fork' can be problematic with XLA/TPU as it may inherit file descriptors and locks that cause issues.
p.kill() sends SIGKILL which doesn't allow the producer process to clean up. Consider using p.terminate() (sends SIGTERM) followed by p.join(timeout) for graceful shutdown.

🔧 Proposed fix for graceful shutdown

-    print("It might be that not all processes exit automatically. In that case you must manually kill this process.")
-    print("With 'pkill python' you can kill all remaining python processes")
-    p.kill()
+    p.terminate()
+    p.join(timeout=5)
+    if p.is_alive():
+        print("Producer did not terminate gracefully, forcing kill")
+        p.kill()
+        p.join()
     exit()

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@embeddings_model/train_script.py` around lines 334 - 339, The current spawn
and cleanup code uses xmp.spawn(..., start_method='fork') and p.kill(), which
can cause inherited FDs/locks on XLA/TPU and forces unclean shutdown; change
start_method to 'spawn' (or remove the explicit fork) when calling
xmp.spawn(train_function, args=(args, queue), nprocs=args.nprocs,
start_method=...), and replace p.kill() with a graceful shutdown sequence: call
p.terminate(), then p.join(timeout) to wait for cleanup, and only call p.kill()
if p.is_alive() after the timeout; ensure this update references the xmp.spawn
call, train_function, and the producer process variable p.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@embeddings_model/README.md`:
- Around line 107-114: Fix the spelling typos in the README sentences: change
"developped" to "developed" in the sentence starting "We developped this
model...", change "intented" to "intended" and "ouptuts" to "outputs" in the
sentence beginning "Our model is intented to be used..." so the documentation
reads correctly; update all occurrences of these misspellings in the README
content around those paragraphs (the two sentences shown in the diff).

In `@embeddings_model/train_script.py`:
- Around line 1-6: The CI is failing due to Black formatting and missing
docstrings: run the formatter (uv tool run black
embeddings_model/train_script.py) to fix style, and add clear
pydocstyle-compliant docstrings for the module and the public classes/functions:
add a module-level docstring at top, a class docstring and method docstrings for
AutoModelForSentenceEmbedding, docstrings for train_function and produce_data
explaining parameters/returns and side effects, and docstrings for RedditDataset
and Dataset (including __init__ and __getitem__/__len__ where applicable);
ensure docstrings follow the project's pydocstyle conventions (summary line,
description, parameter and return sections).
- Around line 299-301: Replace the blocking input() prompt with a CLI flag to
allow non-interactive runs: add a boolean argument like --overwrite or --force
to the argparse setup (e.g., parser.add_argument('--overwrite',
action='store_true')), then change the check around args.output to abort with a
non-zero exit (or raise) when the folder exists and args.overwrite is False; if
args.overwrite is True, proceed and optionally remove/empty the folder before
writing. Update references to args.output and the argument parser accordingly.

---

Nitpick comments:
In `@embeddings_model/train_script.py`:
- Around line 246-276: The __iter__ method's control flow around dataset,
max_dataset_size and the loop "while dataset is None or len(dataset) == 0" is
subtle and needs a concise clarifying comment: update train_script.py to add a
brief comment above that while-loop explaining that small files (<=
max_dataset_size) are cached into the list `dataset` and then shuffled each
epoch, while large files set `dataset = None` to indicate streaming mode so the
file is re-read on each pass; mention that `self.filepath` is re-opened for
streaming and that `dataset is None` is the signal to stop caching and switch to
direct streaming.
- Around line 18-21: The file currently imports torch twice; remove the
redundant "import torch" duplicate so there is only one top-level "import torch"
alongside "from torch import nn" and "from torch.utils.data import DataLoader";
update the import block in train_script.py to have a single torch import and
ensure there are no other repeated imports of the same symbol.
- Around line 334-339: The current spawn and cleanup code uses xmp.spawn(...,
start_method='fork') and p.kill(), which can cause inherited FDs/locks on
XLA/TPU and forces unclean shutdown; change start_method to 'spawn' (or remove
the explicit fork) when calling xmp.spawn(train_function, args=(args, queue),
nprocs=args.nprocs, start_method=...), and replace p.kill() with a graceful
shutdown sequence: call p.terminate(), then p.join(timeout) to wait for cleanup,
and only call p.kill() if p.is_alive() after the timeout; ensure this update
references the xmp.spawn call, train_function, and the producer process variable
p.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d8498fb0-6d7f-425d-b9e8-360c66ce93e6

📥 Commits

Reviewing files that changed from the base of the PR and between 1b618b6 and 5c81bd5.

📒 Files selected for processing (15)

Containerfile
embeddings_model/.gitattributes
embeddings_model/1_Pooling/config.json
embeddings_model/README.md
embeddings_model/config.json
embeddings_model/config_sentence_transformers.json
embeddings_model/data_config.json
embeddings_model/modules.json
embeddings_model/sentence_bert_config.json
embeddings_model/special_tokens_map.json
embeddings_model/tokenizer.json
embeddings_model/tokenizer_config.json
embeddings_model/train_script.py
embeddings_model/version.txt
embeddings_model/vocab.txt

✅ Files skipped from review due to trivial changes (9)

embeddings_model/version.txt
embeddings_model/1_Pooling/config.json
embeddings_model/sentence_bert_config.json
embeddings_model/tokenizer_config.json
embeddings_model/.gitattributes
embeddings_model/config.json
embeddings_model/modules.json
embeddings_model/config_sentence_transformers.json
embeddings_model/special_tokens_map.json

coderabbitai · 2026-03-27T19:00:20Z

embeddings_model/README.md

+We developped this model during the 
+[Community week using JAX/Flax for NLP & CV](https://discuss.huggingface.co/t/open-to-the-community-community-week-using-jax-flax-for-nlp-cv/7104), 
+organized by Hugging Face. We developped this model as part of the project:
+[Train the Best Sentence Embedding Model Ever with 1B Training Pairs](https://discuss.huggingface.co/t/train-the-best-sentence-embedding-model-ever-with-1b-training-pairs/7354). We benefited from efficient hardware infrastructure to run the project: 7 TPUs v3-8, as well as intervention from Googles Flax, JAX, and Cloud team member about efficient deep learning frameworks.
+
+## Intended uses
+
+Our model is intented to be used as a sentence and short paragraph encoder. Given an input text, it ouptuts a vector which captures 


⚠️ Potential issue | 🟡 Minor

Minor typos in documentation.

A few spelling errors flagged by static analysis:

Line 107: "developped" → "developed"

Line 114: "intented" → "intended"

Line 114: "ouptuts" → "outputs"

📝 Suggested fixes

-We developped this model during the +We developed this model during the

-Our model is intented to be used as a sentence and short paragraph encoder. Given an input text, it ouptuts a vector which captures +Our model is intended to be used as a sentence and short paragraph encoder. Given an input text, it outputs a vector which captures

🧰 Tools

🪛 LanguageTool

[grammar] ~114-~114: Ensure spelling is correct
Context: ...eworks. ## Intended uses Our model is intented to be used as a sentence and short para...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

[grammar] ~114-~114: Ensure spelling is correct
Context: ...agraph encoder. Given an input text, it ouptuts a vector which captures the semantic i...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@embeddings_model/README.md` around lines 107 - 114, Fix the spelling typos in the README sentences: change "developped" to "developed" in the sentence starting "We developped this model...", change "intented" to "intended" and "ouptuts" to "outputs" in the sentence beginning "Our model is intented to be used..." so the documentation reads correctly; update all occurrences of these misspellings in the README content around those paragraphs (the two sentences shown in the diff).

coderabbitai · 2026-03-27T19:00:20Z

embeddings_model/train_script.py

+"""
+Train script for a single file
+
+Need to set the TPU address first:
+export XRT_TPU_CONFIG="localservice;0;localhost:51011"
+"""


⚠️ Potential issue | 🟡 Minor

Address pipeline failures: Black formatting and missing docstrings.

The CI pipeline reports failures that need to be addressed:

Black formatting: Run uv tool run black embeddings_model/train_script.py to fix formatting issues.

Pydocstyle: Add docstrings to public classes and functions:

AutoModelForSentenceEmbedding class and its methods

train_function

produce_data

RedditDataset and Dataset classes

📝 Example docstring for the module

""" -Train script for a single file +Train script for sentence embedding models on TPU/XLA. -Need to set the TPU address first: -export XRT_TPU_CONFIG="localservice;0;localhost:51011" +This script trains sentence embedding models using a contrastive learning +objective on TPU hardware via XLA. Requires TPU address configuration: + + export XRT_TPU_CONFIG="localservice;0;localhost:51011" """

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"""

Train script for a single file

Need to set the TPU address first:

export XRT_TPU_CONFIG="localservice;0;localhost:51011"

"""

"""

Train script for sentence embedding models on TPU/XLA.

This script trains sentence embedding models using a contrastive learning

objective on TPU hardware via XLA. Requires TPU address configuration:

export XRT_TPU_CONFIG="localservice;0;localhost:51011"

"""

🧰 Tools

🪛 GitHub Actions: Black

[error] 1-1: Black formatting check failed (run: 'uv tool run black --check .'). 1 file would be reformatted: embeddings_model/train_script.py. Run 'uv tool run black .' or 'uv tool run black --write embeddings_model/train_script.py' to fix.

🪛 GitHub Actions: Pydocstyle

[error] 1-1: pydocstyle: D400 First line should end with a period (not 'e')

[error] 246-1: pydocstyle: D105 Missing docstring in magic method __iter__

[error] pydocstyle failed (exit code 1) due to docstring style violations.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@embeddings_model/train_script.py` around lines 1 - 6, The CI is failing due to Black formatting and missing docstrings: run the formatter (uv tool run black embeddings_model/train_script.py) to fix style, and add clear pydocstyle-compliant docstrings for the module and the public classes/functions: add a module-level docstring at top, a class docstring and method docstrings for AutoModelForSentenceEmbedding, docstrings for train_function and produce_data explaining parameters/returns and side effects, and docstrings for RedditDataset and Dataset (including __init__ and __getitem__/__len__ where applicable); ensure docstrings follow the project's pydocstyle conventions (summary line, description, parameter and return sections).

coderabbitai · 2026-03-27T19:00:20Z

embeddings_model/train_script.py

+    if os.path.exists(args.output):
+        print("Output folder already exists.")
+        input("Continue?")


⚠️ Potential issue | 🟠 Major

Blocking input() call prevents automated execution.

The input("Continue?") call blocks execution and requires manual intervention, which prevents running this script in automated or CI environments. Consider using a CLI flag (e.g., --force or --overwrite) to control this behavior.

🔧 Proposed fix

Add a CLI argument:

parser.add_argument('output') +parser.add_argument('--overwrite', action='store_true', help="Overwrite existing output folder without prompting") args = parser.parse_args()

Then update the check:

if os.path.exists(args.output): - print("Output folder already exists.") - input("Continue?") + if not args.overwrite: + print("Output folder already exists. Use --overwrite to continue anyway.") + sys.exit(1)

🧰 Tools

🪛 GitHub Actions: Pydocstyle

[error] pydocstyle failed (exit code 1) due to docstring style violations.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@embeddings_model/train_script.py` around lines 299 - 301, Replace the blocking input() prompt with a CLI flag to allow non-interactive runs: add a boolean argument like --overwrite or --force to the argparse setup (e.g., parser.add_argument('--overwrite', action='store_true')), then change the check around args.output to abort with a non-zero exit (or raise) when the folder exists and args.overwrite is False; if args.overwrite is True, proceed and optionally remove/empty the folder before writing. Update references to args.output and the argument parser accordingly.

tisnik

what are those +33066 new lines?

syedriko · 2026-03-27T19:16:02Z

what are those +33066 new lines?

I'm going to need to fix "LCORE-1538: Add embedding model metadata to the rag-content repo" or the image built in Konflux is not self-sufficient. The Ansible folks asked for this. I moved that into a separate PR: #144.

tisnik approved these changes Mar 27, 2026

View reviewed changes

syedriko force-pushed the syedriko-lcore-1452 branch from fee2ca9 to 26dd750 Compare March 27, 2026 15:30

syedriko force-pushed the syedriko-lcore-1452 branch from 26dd750 to 6824415 Compare March 27, 2026 15:33

coderabbitai bot reviewed Mar 27, 2026

View reviewed changes

scripts/augment_wheel_hashes.py Show resolved Hide resolved

coderabbitai bot reviewed Mar 27, 2026

View reviewed changes

syedriko force-pushed the syedriko-lcore-1452 branch from 6824415 to 2c7fb8d Compare March 27, 2026 15:44

LCORE-1452: Update llama stack to 0.5.x in rag-content

6afbceb

syedriko force-pushed the syedriko-lcore-1452 branch from 2c7fb8d to 6afbceb Compare March 27, 2026 16:04

coderabbitai bot reviewed Mar 27, 2026

View reviewed changes

syedriko force-pushed the syedriko-lcore-1452 branch from 1b618b6 to 6afbceb Compare March 27, 2026 17:31

coderabbitai bot reviewed Mar 27, 2026

View reviewed changes

tisnik reviewed Mar 27, 2026

View reviewed changes

tisnik self-requested a review March 27, 2026 19:04

syedriko force-pushed the syedriko-lcore-1452 branch from 5c81bd5 to 6afbceb Compare March 27, 2026 19:07

syedriko merged commit 3c1856f into lightspeed-core:main Mar 27, 2026
40 of 41 checks passed

Conversation

syedriko commented Mar 27, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Tools used to create PR

Related Tickets & Documents

Checklist before requesting a review

Testing

Summary by CodeRabbit

Release Notes

Uh oh!

tisnik left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

tisnik left a comment

Choose a reason for hiding this comment

Uh oh!

syedriko commented Mar 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

syedriko commented Mar 27, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 27, 2026 •

edited

Loading