LCORE-1452: Update llama stack to 0.5.x in rag-content#142
LCORE-1452: Update llama stack to 0.5.x in rag-content#142syedriko merged 1 commit intolightspeed-core:mainfrom
Conversation
fee2ca9 to
26dd750
Compare
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughThis PR updates project dependencies, including llama-stack from 0.4.x to 0.5.0, and adds sentence-embedding model training infrastructure. It replaces Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~65 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
26dd750 to
6824415
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
requirements-build.txt (1)
122-136: Two versions of setuptools-scm present.The build requirements include both
setuptools-scm==10.0.4andsetuptools-scm==9.2.2. While this is auto-generated by pybuild-deps based on different package requirements, having multiple versions could cause build inconsistencies if not handled properly by the build system.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@requirements-build.txt` around lines 122 - 136, The requirements file lists two conflicting pins for the same package (setuptools-scm==10.0.4 and setuptools-scm==9.2.2); deduplicate by choosing a single compatible version (prefer the newer setuptools-scm==10.0.4) and remove the other pin from requirements-build.txt, or replace both exact pins with a single compatible constraint (e.g., setuptools-scm>=9.2.2,<11) to satisfy dependent packages; after editing the setuptools-scm entries, re-run your dependency resolution/generation step (pybuild-deps/pip-compile) and verify the build still succeeds to ensure no downstream package requires the older exact version.scripts/augment_wheel_hashes.py (2)
61-62: Redundant condition in wheel matching.The condition
"py3-none-any" in fnalready matches any filename containing that substring, makingfn.endswith("-py3-none-any.whl")redundant.- elif "py3-none-any" in fn or fn.endswith("-py3-none-any.whl"): + elif "py3-none-any" in fn:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/augment_wheel_hashes.py` around lines 61 - 62, The condition in the wheel matching branch is redundant: replace the combined check in the branch that currently uses '"py3-none-any" in fn or fn.endswith("-py3-none-any.whl")' (where the variables fn, best, and u are used) with a single, precise test (e.g. keep only '"py3-none-any" in fn' or only 'fn.endswith("-py3-none-any.whl")') so the elif becomes a single-condition match and still appends (1, u) to best.
36-46: Consider handlingURLErrorfor network failures.Currently, only
HTTPErrorwith code 404 is caught. Network issues like timeouts or DNS failures will raiseurllib.error.URLErrorand crash the script. For a build script, you may want to either catch and log these gracefully or add retry logic.- except urllib.error.HTTPError as e: + except urllib.error.URLError as e: + if isinstance(e, urllib.error.HTTPError) and e.code == 404: + return None + raise - if e.code == 404: - return None - raiseAlternatively, if network failures should be fatal (current behavior), document this in a docstring.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/augment_wheel_hashes.py` around lines 36 - 46, The load_pypi_release function only handles urllib.error.HTTPError 404 but will crash on network failures (urllib.error.URLError); update load_pypi_release to catch urllib.error.URLError in addition to HTTPError, and either (a) implement simple retry logic with exponential backoff (e.g., 3 attempts with increasing sleep) around urllib.request.urlopen and log the URLError on each failure before retrying, returning None after exhausting retries, or (b) if you prefer not to retry, catch URLError, log a clear message and return None; reference the load_pypi_release function and the urllib.request.urlopen call when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/augment_wheel_hashes.py`:
- Around line 1-134: Run black on scripts/augment_wheel_hashes.py to fix
formatting (e.g., run `black scripts/augment_wheel_hashes.py`) and add concise
docstrings for the six public functions parse_pkg_header, load_pypi_release,
linux_cp312_x86_64_wheel_sha256, augment_block, augment_file, and main
describing purpose, args and return types; ensure each docstring is a short
triple-quoted string immediately under the def line to satisfy pydocstyle D103
and then re-run CI checks.
---
Nitpick comments:
In `@requirements-build.txt`:
- Around line 122-136: The requirements file lists two conflicting pins for the
same package (setuptools-scm==10.0.4 and setuptools-scm==9.2.2); deduplicate by
choosing a single compatible version (prefer the newer setuptools-scm==10.0.4)
and remove the other pin from requirements-build.txt, or replace both exact pins
with a single compatible constraint (e.g., setuptools-scm>=9.2.2,<11) to satisfy
dependent packages; after editing the setuptools-scm entries, re-run your
dependency resolution/generation step (pybuild-deps/pip-compile) and verify the
build still succeeds to ensure no downstream package requires the older exact
version.
In `@scripts/augment_wheel_hashes.py`:
- Around line 61-62: The condition in the wheel matching branch is redundant:
replace the combined check in the branch that currently uses '"py3-none-any" in
fn or fn.endswith("-py3-none-any.whl")' (where the variables fn, best, and u are
used) with a single, precise test (e.g. keep only '"py3-none-any" in fn' or only
'fn.endswith("-py3-none-any.whl")') so the elif becomes a single-condition match
and still appends (1, u) to best.
- Around line 36-46: The load_pypi_release function only handles
urllib.error.HTTPError 404 but will crash on network failures
(urllib.error.URLError); update load_pypi_release to catch urllib.error.URLError
in addition to HTTPError, and either (a) implement simple retry logic with
exponential backoff (e.g., 3 attempts with increasing sleep) around
urllib.request.urlopen and log the URLError on each failure before retrying,
returning None after exhausting retries, or (b) if you prefer not to retry,
catch URLError, log a clear message and return None; reference the
load_pypi_release function and the urllib.request.urlopen call when making the
change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 5c35d4e9-bbbb-4ba6-b494-80c7dbf319b6
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (9)
.tekton/rag-tool-pull-request.yaml.tekton/rag-tool-push.yamlpyproject.tomlrequirements-build.txtrequirements.hashes.source.txtrequirements.hashes.wheel.pypi.txtrequirements.hashes.wheel.txtscripts/augment_wheel_hashes.pyscripts/konflux_requirements.sh
There was a problem hiding this comment.
🧹 Nitpick comments (1)
scripts/augment_wheel_hashes.py (1)
36-48: Consider logging when PyPI requests fail for non-404 errors.The function silently re-raises non-404 HTTP errors, which is acceptable since the calling script (
konflux_requirements.sh) runs withset -ex. However, a brief log message before re-raising could aid debugging in CI logs.💡 Optional: Add logging for failed requests
def load_pypi_release(name: str, version: str) -> dict | None: norm = name.replace("_", "-").lower() url = f"https://pypi.org/pypi/{norm}/{version}/json" req = urllib.request.Request( url, headers={"User-Agent": "augment-wheel-hashes (konflux)"} ) try: with urllib.request.urlopen(req, timeout=120) as r: return json.load(r) except urllib.error.HTTPError as e: if e.code == 404: return None + print(f"PyPI request failed for {name}=={version}: {e}", file=sys.stderr) raise🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/augment_wheel_hashes.py` around lines 36 - 48, In load_pypi_release, add a brief log before re-raising non-404 HTTP errors so CI logs show the failed PyPI request; catch urllib.error.HTTPError in the existing except block, and when e.code != 404 call the module logger (e.g., logging.getLogger(__name__)) to log a concise message that includes the requested URL, the HTTP status (e.code) and the exception message/traceback, then re-raise the exception so existing behavior is preserved.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@scripts/augment_wheel_hashes.py`:
- Around line 36-48: In load_pypi_release, add a brief log before re-raising
non-404 HTTP errors so CI logs show the failed PyPI request; catch
urllib.error.HTTPError in the existing except block, and when e.code != 404 call
the module logger (e.g., logging.getLogger(__name__)) to log a concise message
that includes the requested URL, the HTTP status (e.code) and the exception
message/traceback, then re-raise the exception so existing behavior is
preserved.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 48ebe7c8-0e6b-4dd0-b396-9e97f92fbd1c
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (9)
.tekton/rag-tool-pull-request.yaml.tekton/rag-tool-push.yamlpyproject.tomlrequirements-build.txtrequirements.hashes.source.txtrequirements.hashes.wheel.pypi.txtrequirements.hashes.wheel.txtscripts/augment_wheel_hashes.pyscripts/konflux_requirements.sh
✅ Files skipped from review due to trivial changes (1)
- requirements.hashes.wheel.pypi.txt
🚧 Files skipped from review as they are similar to previous changes (5)
- .tekton/rag-tool-pull-request.yaml
- .tekton/rag-tool-push.yaml
- requirements.hashes.wheel.txt
- scripts/konflux_requirements.sh
- pyproject.toml
6824415 to
2c7fb8d
Compare
2c7fb8d to
6afbceb
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pyproject.toml`:
- Around line 50-52: The upgrade to llama-stack 0.5.0 introduces a breaking
change where the Embeddings API rejects explicit nulls; update any calls to
client.embeddings.create() to never pass explicit null/None for optional fields
by building the payload conditionally (filter out keys with None) before calling
embeddings.create(), and similarly ensure client.vector_stores.create() and any
RAGDocument construction (llama_stack_api.rag_tool.RAGDocument) do not include
null fields; also validate AsyncLlamaStackAsLibraryClient initialization and run
the library-client mode tests because the 0.5.0 FastAPI router refactor can
change client behavior—add/adjust unit tests that cover embeddings.create() and
vector store creation to assert no nulls are sent and fix calling code
accordingly.
In `@requirements-build.txt`:
- Around line 122-136: The build requirements file contains conflicting hard
pins for the same packages (e.g., setuptools-scm, setuptools, hatchling) which
breaks the single-file consumption in the build pipeline; update the generation
step rather than hand-editing the file: inspect and fix the upstream source
(requirements.source.txt and the pybuild-deps compile invocation) to ensure
unique resolved pins, or adjust dependency resolution rules so the generator
deduplicates and selects a single coherent version for setuptools-scm,
setuptools, and hatchling; re-run pybuild-deps compile to produce a consistent
requirements-build.txt used by Containerfile and scripts/konflux_requirements.sh
and verify the final file contains only one pin per package.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 0fdd3a52-f905-4de6-91d3-f8f2cfb9396e
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (9)
.tekton/rag-tool-pull-request.yaml.tekton/rag-tool-push.yamlpyproject.tomlrequirements-build.txtrequirements.hashes.source.txtrequirements.hashes.wheel.pypi.txtrequirements.hashes.wheel.txtscripts/augment_wheel_hashes.pyscripts/konflux_requirements.sh
✅ Files skipped from review due to trivial changes (1)
- scripts/augment_wheel_hashes.py
🚧 Files skipped from review as they are similar to previous changes (6)
- requirements.hashes.wheel.pypi.txt
- .tekton/rag-tool-pull-request.yaml
- .tekton/rag-tool-push.yaml
- requirements.hashes.wheel.txt
- scripts/konflux_requirements.sh
- requirements.hashes.source.txt
| "llama-stack==0.5.0", | ||
| "llama-stack-api==0.5.0", | ||
| "llama-stack-client==0.5.0", |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check whether tests cover the exact llama-stack API touchpoints used by runtime code.
rg -n -C3 --type=py 'AsyncLlamaStackAsLibraryClient|rag_tool\.RAGDocument|vector_stores\.create\s*\(|embeddings\.create\s*\('Repository: lightspeed-core/rag-content
Length of output: 4480
🌐 Web query:
`What are the documented breaking changes (if any) between llama-stack 0.4.x and 0.5.0 for:
- llama_stack.core.library_client.AsyncLlamaStackAsLibraryClient
- llama_stack_api.rag_tool.RAGDocument
- client.vector_stores.create(...)
- client.embeddings.create(...)
Please include official release notes or API docs links.`
💡 Result:
No documented breaking changes specific to the queried components between llama-stack 0.4.x and 0.5.0. The 0.5.0 release includes general breaking changes from a major architectural refactoring to use FastAPI routers across all APIs, which may indirectly affect client usage: - Post-Training API endpoints restructured (path params) #4606 - Embeddings API rejects explicit null for optional fields #4644 - Safety API provider interface changed to request objects #4643 - Meta-Reference GPU inference provider removed #4828 - Scope-based endpoint authorization removed #4734 For client.embeddings.create(...): The Embeddings API change (#4644) is a breaking change - explicit null values for optional fields are now rejected. For the library client (llama_stack.core.library_client.AsyncLlamaStackAsLibraryClient): Early development fixes predate 0.5.0, but the FastAPI refactor may impact internal routing used by the library client mode. No specific mentions found for: - llama_stack.core.library_client.AsyncLlamaStackAsLibraryClient breaking changes - llama_stack_api.rag_tool.RAGDocument changes (RAG tool docs reference it in pre-0.5 contexts, with migration notes to vector stores) - client.vector_stores.create(...) breaking changes (Vector stores API evolved with OpenAI compatibility enhancements pre-0.5) Official sources: - Release notes: https://github.com/llamastack/llama-stack/blob/main/RELEASE_NOTES.md - Releases page: https://github.com/llamastack/llama-stack/releases - PyPI versions confirm 0.5.0 released 2026-02-05
Citations:
- 1: https://github.com/llamastack/llama-stack/blob/main/RELEASE_NOTES.md
- 2: https://github.com/meta-llama/llama-stack/releases
- 3: https://pypi.org/project/llama-stack/
Verify embeddings.create() compatibility with llama-stack 0.5.0 null handling change.
Lines 50–52 upgrade llama-stack to 0.5.0. Runtime code uses four llama-stack APIs:
AsyncLlamaStackAsLibraryClient(line 357)llama_stack_api.rag_tool.RAGDocument(line 360)client.vector_stores.create()(lines 473–576)client.embeddings.create()(line 537)
Tests exist for client/document instantiation, but llama-stack 0.5.0 includes a documented breaking change (#4644): the Embeddings API now rejects explicit null values for optional fields. Line 537 calls embeddings.create() with parameters that may be affected. Additionally, the 0.5.0 FastAPI router refactor may impact the library client mode beyond what is currently documented. Pre-merge tests did not fully pass—review test coverage for these specific APIs before merging.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pyproject.toml` around lines 50 - 52, The upgrade to llama-stack 0.5.0
introduces a breaking change where the Embeddings API rejects explicit nulls;
update any calls to client.embeddings.create() to never pass explicit null/None
for optional fields by building the payload conditionally (filter out keys with
None) before calling embeddings.create(), and similarly ensure
client.vector_stores.create() and any RAGDocument construction
(llama_stack_api.rag_tool.RAGDocument) do not include null fields; also validate
AsyncLlamaStackAsLibraryClient initialization and run the library-client mode
tests because the 0.5.0 FastAPI router refactor can change client
behavior—add/adjust unit tests that cover embeddings.create() and vector store
creation to assert no nulls are sent and fix calling code accordingly.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
scripts/dedupe_requirements_build.py (1)
61-78:⚠️ Potential issue | 🟠 MajorDon't auto-resolve conflicting build pins with
max().When
pybuild-depsemitsfoo==Aandfoo==B, that is a real constraint mismatch. Keeping the numerically highest version hides the conflict and can leave the build image with a version some backend does not accept. This helper should only merge exact duplicates, or fail generation when versions differ.Suggested change
def dedupe_pin_blocks(blocks: list[list[str]]) -> list[str]: - """Pick the highest version and merge ``# via`` comment lines from all blocks.""" - - def version_key(b: list[str]) -> Version: - parsed = parse_pin_line(b[0]) - assert parsed is not None - return Version(parsed[1]) - - winner = max(blocks, key=version_key) + """Merge duplicate blocks for the same exact pin.""" + parsed_versions: list[str] = [] + for block in blocks: + parsed = parse_pin_line(block[0]) + assert parsed is not None + parsed_versions.append(parsed[1]) + versions = sorted(set(parsed_versions), key=Version) + if len(versions) != 1: + raise ValueError(f"conflicting build pins: {versions}") + + winner = blocks[0]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/dedupe_requirements_build.py` around lines 61 - 78, The current dedupe_pin_blocks uses max() to pick a "winner" version which silently hides conflicting pins; change dedupe_pin_blocks (and its inner version_key / use of parse_pin_line) to first parse all block versions and check they are all identical, and if not raise a clear exception (e.g., ValueError) indicating the conflicting versions for the package instead of choosing the highest; only when all parsed versions match should you proceed to merge the "# via" tail lines and return [winner_line, *merged_tail]; also ensure parse_pin_line failures are handled deterministically (fail fast) rather than asserting.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/konflux_requirements.sh`:
- Around line 85-88: The PyPI wheel lockfile isn't being architecture-augmented:
after generating the PyPI lockfile (variable WHEEL_HASH_FILE_PYPI) with uv pip
compile, call the same augmenter used for RHOAI by running
augment_wheel_hashes.py on WHEEL_HASH_FILE_PYPI (i.e., invoke
scripts/augment_wheel_hashes.py "$WHEEL_HASH_FILE_PYPI") so the PyPI lockfile
receives aarch64-specific manylinux hashes like requirements.hashes.wheel.txt
does.
---
Duplicate comments:
In `@scripts/dedupe_requirements_build.py`:
- Around line 61-78: The current dedupe_pin_blocks uses max() to pick a "winner"
version which silently hides conflicting pins; change dedupe_pin_blocks (and its
inner version_key / use of parse_pin_line) to first parse all block versions and
check they are all identical, and if not raise a clear exception (e.g.,
ValueError) indicating the conflicting versions for the package instead of
choosing the highest; only when all parsed versions match should you proceed to
merge the "# via" tail lines and return [winner_line, *merged_tail]; also ensure
parse_pin_line failures are handled deterministically (fail fast) rather than
asserting.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 883cad03-a7f9-4113-b2c6-1b6a6c3b00ab
📒 Files selected for processing (3)
requirements-build.txtscripts/dedupe_requirements_build.pyscripts/konflux_requirements.sh
| # RHOAI-only --generate-hashes can omit the hash for the Linux cp312 manylinux wheel pip actually installs. | ||
| uv pip compile "$WHEEL_FILE" --refresh --generate-hashes --index-url $RHOAI_INDEX_URL --python-version 3.12 --python-platform x86_64-unknown-linux-gnu --emit-index-url --no-deps --no-annotate > "$WHEEL_HASH_FILE" | ||
| uv run python scripts/augment_wheel_hashes.py "$WHEEL_HASH_FILE" | ||
| uv pip compile "$WHEEL_FILE_PYPI" --refresh --generate-hashes --python-version 3.12 --python-platform x86_64-unknown-linux-gnu --emit-index-url --no-deps --no-annotate > "$WHEEL_HASH_FILE_PYPI" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
printf '=== wheel lockfile consumers ===\n'
rg -n -C2 'requirements\.hashes\.wheel(\.pypi)?\.txt'
printf '\n=== architecture/platform references ===\n'
rg -n -C2 '(aarch64|arm64|x86_64|amd64|linux/(arm64|amd64)|python-platform|platforms?)'Repository: lightspeed-core/rag-content
Length of output: 50384
🏁 Script executed:
# Check if there are any aarch64-specific lock generation or platform handling
rg -n 'aarch64.*lock|lock.*aarch64|python-platform.*aarch64|--platform.*aarch64' scripts/
rg -n 'separate|arm64|aarch64' scripts/konflux_requirements.sh
# Check if PyPI wheels naturally include aarch64 wheels or if augmentation is critical
head -30 requirements.hashes.wheel.pypi.txtRepository: lightspeed-core/rag-content
Length of output: 2161
🏁 Script executed:
# Check for CI/build config showing builder architecture constraints
find . -name "*.yml" -o -name "*.yaml" | head -20
rg -l 'workflow|pipeline|build' --type yaml | head -10
rg -n 'aarch64|arm64|x86.64|arch.*=|platform.*=' .tekton* .github* 2>/dev/null | head -20
# Check Containerfile to see if it's multi-arch
grep -n 'FROM\|RUN\|COPY' Containerfile | head -30Repository: lightspeed-core/rag-content
Length of output: 2546
The PyPI wheel lockfile lacks architecture-aware hash augmentation for aarch64 builders.
The pipeline builds for both x86_64 and aarch64 (confirmed in .tekton/rag-tool-push.yaml lines 28–29, 64 and .tekton/rag-tool-pull-request.yaml lines 31–32, 68), but augment_wheel_hashes.py is invoked only for requirements.hashes.wheel.txt (RHOAI), not for requirements.hashes.wheel.pypi.txt (PyPI). Both lockfiles are compiled with --python-platform x86_64-unknown-linux-gnu and installed on all builder architectures via the shared Containerfile.
When aarch64 builders install from requirements.hashes.wheel.pypi.txt, they will not have aarch64-specific manylinux wheel hashes that the augmenter would normally provide. Apply the augmentation step to the PyPI lockfile as well, or generate a separate aarch64-specific lockfile path for multi-arch deployments.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/konflux_requirements.sh` around lines 85 - 88, The PyPI wheel
lockfile isn't being architecture-augmented: after generating the PyPI lockfile
(variable WHEEL_HASH_FILE_PYPI) with uv pip compile, call the same augmenter
used for RHOAI by running augment_wheel_hashes.py on WHEEL_HASH_FILE_PYPI (i.e.,
invoke scripts/augment_wheel_hashes.py "$WHEEL_HASH_FILE_PYPI") so the PyPI
lockfile receives aarch64-specific manylinux hashes like
requirements.hashes.wheel.txt does.
1b618b6 to
6afbceb
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (3)
embeddings_model/train_script.py (3)
246-276: Complex iteration logic - consider adding clarifying comments.The
Dataset.__iter__method has nuanced behavior:
- Small datasets (< 10M lines) are cached in memory and shuffled each epoch
- Large datasets are streamed directly, re-reading the file each epoch
The condition
while dataset is None or len(dataset) == 0on line 251 handles the case where the dataset was too large to cache. Consider adding a brief comment explaining this behavior for maintainability.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@embeddings_model/train_script.py` around lines 246 - 276, The __iter__ method's control flow around dataset, max_dataset_size and the loop "while dataset is None or len(dataset) == 0" is subtle and needs a concise clarifying comment: update train_script.py to add a brief comment above that while-loop explaining that small files (<= max_dataset_size) are cached into the list `dataset` and then shuffled each epoch, while large files set `dataset = None` to indicate streaming mode so the file is re-read on each pass; mention that `self.filepath` is re-opened for streaming and that `dataset is None` is the signal to stop caching and switch to direct streaming.
18-21: Duplicate import oftorch.
torchis imported twice (lines 18 and 21). Remove the duplicate.♻️ Proposed fix
import tqdm import torch from torch import nn from torch.utils.data import DataLoader -import torch import torch_xla🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@embeddings_model/train_script.py` around lines 18 - 21, The file currently imports torch twice; remove the redundant "import torch" duplicate so there is only one top-level "import torch" alongside "from torch import nn" and "from torch.utils.data import DataLoader"; update the import block in train_script.py to have a single torch import and ensure there are no other repeated imports of the same symbol.
334-339: Process cleanup issues withforkandkill().Two concerns with process management:
start_method='fork'can be problematic with XLA/TPU as it may inherit file descriptors and locks that cause issues.
p.kill()sendsSIGKILLwhich doesn't allow the producer process to clean up. Consider usingp.terminate()(sendsSIGTERM) followed byp.join(timeout)for graceful shutdown.🔧 Proposed fix for graceful shutdown
- print("It might be that not all processes exit automatically. In that case you must manually kill this process.") - print("With 'pkill python' you can kill all remaining python processes") - p.kill() + p.terminate() + p.join(timeout=5) + if p.is_alive(): + print("Producer did not terminate gracefully, forcing kill") + p.kill() + p.join() exit()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@embeddings_model/train_script.py` around lines 334 - 339, The current spawn and cleanup code uses xmp.spawn(..., start_method='fork') and p.kill(), which can cause inherited FDs/locks on XLA/TPU and forces unclean shutdown; change start_method to 'spawn' (or remove the explicit fork) when calling xmp.spawn(train_function, args=(args, queue), nprocs=args.nprocs, start_method=...), and replace p.kill() with a graceful shutdown sequence: call p.terminate(), then p.join(timeout) to wait for cleanup, and only call p.kill() if p.is_alive() after the timeout; ensure this update references the xmp.spawn call, train_function, and the producer process variable p.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@embeddings_model/README.md`:
- Around line 107-114: Fix the spelling typos in the README sentences: change
"developped" to "developed" in the sentence starting "We developped this
model...", change "intented" to "intended" and "ouptuts" to "outputs" in the
sentence beginning "Our model is intented to be used..." so the documentation
reads correctly; update all occurrences of these misspellings in the README
content around those paragraphs (the two sentences shown in the diff).
In `@embeddings_model/train_script.py`:
- Around line 1-6: The CI is failing due to Black formatting and missing
docstrings: run the formatter (uv tool run black
embeddings_model/train_script.py) to fix style, and add clear
pydocstyle-compliant docstrings for the module and the public classes/functions:
add a module-level docstring at top, a class docstring and method docstrings for
AutoModelForSentenceEmbedding, docstrings for train_function and produce_data
explaining parameters/returns and side effects, and docstrings for RedditDataset
and Dataset (including __init__ and __getitem__/__len__ where applicable);
ensure docstrings follow the project's pydocstyle conventions (summary line,
description, parameter and return sections).
- Around line 299-301: Replace the blocking input() prompt with a CLI flag to
allow non-interactive runs: add a boolean argument like --overwrite or --force
to the argparse setup (e.g., parser.add_argument('--overwrite',
action='store_true')), then change the check around args.output to abort with a
non-zero exit (or raise) when the folder exists and args.overwrite is False; if
args.overwrite is True, proceed and optionally remove/empty the folder before
writing. Update references to args.output and the argument parser accordingly.
---
Nitpick comments:
In `@embeddings_model/train_script.py`:
- Around line 246-276: The __iter__ method's control flow around dataset,
max_dataset_size and the loop "while dataset is None or len(dataset) == 0" is
subtle and needs a concise clarifying comment: update train_script.py to add a
brief comment above that while-loop explaining that small files (<=
max_dataset_size) are cached into the list `dataset` and then shuffled each
epoch, while large files set `dataset = None` to indicate streaming mode so the
file is re-read on each pass; mention that `self.filepath` is re-opened for
streaming and that `dataset is None` is the signal to stop caching and switch to
direct streaming.
- Around line 18-21: The file currently imports torch twice; remove the
redundant "import torch" duplicate so there is only one top-level "import torch"
alongside "from torch import nn" and "from torch.utils.data import DataLoader";
update the import block in train_script.py to have a single torch import and
ensure there are no other repeated imports of the same symbol.
- Around line 334-339: The current spawn and cleanup code uses xmp.spawn(...,
start_method='fork') and p.kill(), which can cause inherited FDs/locks on
XLA/TPU and forces unclean shutdown; change start_method to 'spawn' (or remove
the explicit fork) when calling xmp.spawn(train_function, args=(args, queue),
nprocs=args.nprocs, start_method=...), and replace p.kill() with a graceful
shutdown sequence: call p.terminate(), then p.join(timeout) to wait for cleanup,
and only call p.kill() if p.is_alive() after the timeout; ensure this update
references the xmp.spawn call, train_function, and the producer process variable
p.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: d8498fb0-6d7f-425d-b9e8-360c66ce93e6
📒 Files selected for processing (15)
Containerfileembeddings_model/.gitattributesembeddings_model/1_Pooling/config.jsonembeddings_model/README.mdembeddings_model/config.jsonembeddings_model/config_sentence_transformers.jsonembeddings_model/data_config.jsonembeddings_model/modules.jsonembeddings_model/sentence_bert_config.jsonembeddings_model/special_tokens_map.jsonembeddings_model/tokenizer.jsonembeddings_model/tokenizer_config.jsonembeddings_model/train_script.pyembeddings_model/version.txtembeddings_model/vocab.txt
✅ Files skipped from review due to trivial changes (9)
- embeddings_model/version.txt
- embeddings_model/1_Pooling/config.json
- embeddings_model/sentence_bert_config.json
- embeddings_model/tokenizer_config.json
- embeddings_model/.gitattributes
- embeddings_model/config.json
- embeddings_model/modules.json
- embeddings_model/config_sentence_transformers.json
- embeddings_model/special_tokens_map.json
embeddings_model/README.md
Outdated
| We developped this model during the | ||
| [Community week using JAX/Flax for NLP & CV](https://discuss.huggingface.co/t/open-to-the-community-community-week-using-jax-flax-for-nlp-cv/7104), | ||
| organized by Hugging Face. We developped this model as part of the project: | ||
| [Train the Best Sentence Embedding Model Ever with 1B Training Pairs](https://discuss.huggingface.co/t/train-the-best-sentence-embedding-model-ever-with-1b-training-pairs/7354). We benefited from efficient hardware infrastructure to run the project: 7 TPUs v3-8, as well as intervention from Googles Flax, JAX, and Cloud team member about efficient deep learning frameworks. | ||
|
|
||
| ## Intended uses | ||
|
|
||
| Our model is intented to be used as a sentence and short paragraph encoder. Given an input text, it ouptuts a vector which captures |
There was a problem hiding this comment.
Minor typos in documentation.
A few spelling errors flagged by static analysis:
- Line 107: "developped" → "developed"
- Line 114: "intented" → "intended"
- Line 114: "ouptuts" → "outputs"
📝 Suggested fixes
-We developped this model during the
+We developed this model during the -Our model is intented to be used as a sentence and short paragraph encoder. Given an input text, it ouptuts a vector which captures
+Our model is intended to be used as a sentence and short paragraph encoder. Given an input text, it outputs a vector which captures 🧰 Tools
🪛 LanguageTool
[grammar] ~114-~114: Ensure spelling is correct
Context: ...eworks. ## Intended uses Our model is intented to be used as a sentence and short para...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~114-~114: Ensure spelling is correct
Context: ...agraph encoder. Given an input text, it ouptuts a vector which captures the semantic i...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@embeddings_model/README.md` around lines 107 - 114, Fix the spelling typos in
the README sentences: change "developped" to "developed" in the sentence
starting "We developped this model...", change "intented" to "intended" and
"ouptuts" to "outputs" in the sentence beginning "Our model is intented to be
used..." so the documentation reads correctly; update all occurrences of these
misspellings in the README content around those paragraphs (the two sentences
shown in the diff).
embeddings_model/train_script.py
Outdated
| """ | ||
| Train script for a single file | ||
|
|
||
| Need to set the TPU address first: | ||
| export XRT_TPU_CONFIG="localservice;0;localhost:51011" | ||
| """ |
There was a problem hiding this comment.
Address pipeline failures: Black formatting and missing docstrings.
The CI pipeline reports failures that need to be addressed:
-
Black formatting: Run
uv tool run black embeddings_model/train_script.pyto fix formatting issues. -
Pydocstyle: Add docstrings to public classes and functions:
AutoModelForSentenceEmbeddingclass and its methodstrain_functionproduce_dataRedditDatasetandDatasetclasses
📝 Example docstring for the module
"""
-Train script for a single file
+Train script for sentence embedding models on TPU/XLA.
-Need to set the TPU address first:
-export XRT_TPU_CONFIG="localservice;0;localhost:51011"
+This script trains sentence embedding models using a contrastive learning
+objective on TPU hardware via XLA. Requires TPU address configuration:
+
+ export XRT_TPU_CONFIG="localservice;0;localhost:51011"
"""📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| """ | |
| Train script for a single file | |
| Need to set the TPU address first: | |
| export XRT_TPU_CONFIG="localservice;0;localhost:51011" | |
| """ | |
| """ | |
| Train script for sentence embedding models on TPU/XLA. | |
| This script trains sentence embedding models using a contrastive learning | |
| objective on TPU hardware via XLA. Requires TPU address configuration: | |
| export XRT_TPU_CONFIG="localservice;0;localhost:51011" | |
| """ |
🧰 Tools
🪛 GitHub Actions: Black
[error] 1-1: Black formatting check failed (run: 'uv tool run black --check .'). 1 file would be reformatted: embeddings_model/train_script.py. Run 'uv tool run black .' or 'uv tool run black --write embeddings_model/train_script.py' to fix.
🪛 GitHub Actions: Pydocstyle
[error] 1-1: pydocstyle: D400 First line should end with a period (not 'e')
[error] 246-1: pydocstyle: D105 Missing docstring in magic method __iter__
[error] pydocstyle failed (exit code 1) due to docstring style violations.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@embeddings_model/train_script.py` around lines 1 - 6, The CI is failing due
to Black formatting and missing docstrings: run the formatter (uv tool run black
embeddings_model/train_script.py) to fix style, and add clear
pydocstyle-compliant docstrings for the module and the public classes/functions:
add a module-level docstring at top, a class docstring and method docstrings for
AutoModelForSentenceEmbedding, docstrings for train_function and produce_data
explaining parameters/returns and side effects, and docstrings for RedditDataset
and Dataset (including __init__ and __getitem__/__len__ where applicable);
ensure docstrings follow the project's pydocstyle conventions (summary line,
description, parameter and return sections).
embeddings_model/train_script.py
Outdated
| if os.path.exists(args.output): | ||
| print("Output folder already exists.") | ||
| input("Continue?") |
There was a problem hiding this comment.
Blocking input() call prevents automated execution.
The input("Continue?") call blocks execution and requires manual intervention, which prevents running this script in automated or CI environments. Consider using a CLI flag (e.g., --force or --overwrite) to control this behavior.
🔧 Proposed fix
Add a CLI argument:
parser.add_argument('output')
+parser.add_argument('--overwrite', action='store_true', help="Overwrite existing output folder without prompting")
args = parser.parse_args()Then update the check:
if os.path.exists(args.output):
- print("Output folder already exists.")
- input("Continue?")
+ if not args.overwrite:
+ print("Output folder already exists. Use --overwrite to continue anyway.")
+ sys.exit(1)🧰 Tools
🪛 GitHub Actions: Pydocstyle
[error] pydocstyle failed (exit code 1) due to docstring style violations.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@embeddings_model/train_script.py` around lines 299 - 301, Replace the
blocking input() prompt with a CLI flag to allow non-interactive runs: add a
boolean argument like --overwrite or --force to the argparse setup (e.g.,
parser.add_argument('--overwrite', action='store_true')), then change the check
around args.output to abort with a non-zero exit (or raise) when the folder
exists and args.overwrite is False; if args.overwrite is True, proceed and
optionally remove/empty the folder before writing. Update references to
args.output and the argument parser accordingly.
tisnik
left a comment
There was a problem hiding this comment.
what are those +33066 new lines?
5c81bd5 to
6afbceb
Compare
I'm going to need to fix "LCORE-1538: Add embedding model metadata to the rag-content repo" or the image built in Konflux is not self-sufficient. The Ansible folks asked for this. I moved that into a separate PR: #144. |
Description
Upgrade llama stack dependency in LCORE to 0.5.0 to match the rest of Lightspeed Core.
Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit
Release Notes
New Features
Dependencies
Chores