Skip to content

Add pytorch LLM#4

Merged
kmgowda merged 16 commits intomainfrom
kmg-release-2026-2
Jan 31, 2026
Merged

Add pytorch LLM#4
kmgowda merged 16 commits intomainfrom
kmg-release-2026-2

Conversation

@kmgowda
Copy link
Owner

@kmgowda kmgowda commented Jan 15, 2026

Summary by CodeRabbit

  • New Features

    • Local PyTorch LLM for on-device inference, optional training, and performance analyses (throughput, latency, memory, percentiles)
    • AI backend lifecycle support (open/close) for cleaner startup/shutdown
  • Documentation

    • Added venv and conda environment setup instructions and a comprehensive PyTorch LLM README
  • Chores

    • Updated Python requirements and bumped version to 3.26.2.0

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: Keshava Munegowda <keshava.gowda@gmail.com>
Signed-off-by: Keshava Munegowda <keshava.gowda@gmail.com>
Signed-off-by: Keshava Munegowda <keshava.gowda@gmail.com>
Signed-off-by: Keshava Munegowda <keshava.gowda@gmail.com>
the training fails with memory error
Error during training: MPS backend out of memory (MPS allocated: 181.43 GiB, other allocations: 107.86 MiB, max allowed: 182.78 GiB). Tried to allocate 1.53 GiB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).

Signed-off-by: Keshava Munegowda <keshava.gowda@gmail.com>
Signed-off-by: Keshava Munegowda <keshava.gowda@gmail.com>
the training fails withe error on macos with 128GB memory
RuntimeError: MPS backend out of memory (MPS allocated: 178.54 GiB, other allocations: 3.84 GiB, max allowed: 182.78 GiB).
Tried to allocate 506.25 MiB on private pool.
Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).

Signed-off-by: Keshava Munegowda <keshava.gowda@gmail.com>
Signed-off-by: Keshava Munegowda <keshava.gowda@gmail.com>
Signed-off-by: Keshava Munegowda <keshava.gowda@gmail.com>
@coderabbitai
Copy link

coderabbitai bot commented Jan 15, 2026

📝 Walkthrough

Walkthrough

Adds a local PyTorch LLM backend, lifecycle hooks (open/close) on AI interfaces, Excel color constants and a default timeout, dependency updates and README setup additions, and bumps the package version.

Changes

Cohort / File(s) Summary
Documentation & Setup
README.md, src/custom_ai/pytorch_llm/README.md
Added venv and conda setup blocks to README.md; new comprehensive README for the PyTorch LLM backend.
Dependencies
requirements.txt
Updated requirements: downgraded huggingface_hub (1.2.3 → 0.34.0), added torch==2.9.1, transformers==4.57.3, accelerate>=0.20.0; minor formatting change.
New PyTorch LLM Backend
src/custom_ai/pytorch_llm/pytorch_llm.py
New PyTorchLLM class implementing local HF model loading, device/dtype handling, generation, optional per-output training & persistence, analysis methods, CLI arg wiring, open/close lifecycle, and destructor.
AI Interface & Formatting
src/ai/sbk_ai.py
Added DEFAULT_TIMEOUT_SECONDS and ExcelColors constants; replaced inline hex colors with constants; added open(self, args) and close(self, args) delegating to active backend; CLI timeout parsing adjusted.
AI Interface Placeholder
src/genai/genai.py
Added placeholder open(self, args) and close(self, args) on SbkGenAI and minor prompt text tweak.
Integration / Invocation
src/main/sbk_charts.py
Calls ch.open(args) before AI performance augmentation and ch.close(args) afterwards.
Versioning
src/version/sbk_version.py
Bumped __sbk_version__ from "3.26.1.0" to "3.26.2.0".

Sequence Diagram

sequenceDiagram
    participant CLI as Application / CLI
    participant SbkAI as SbkAI Instance
    participant PyLLM as PyTorchLLM Backend
    participant Model as Local Model (HF / CUDA / MPS / CPU)

    CLI->>SbkAI: open(args)
    SbkAI->>PyLLM: open(args)
    PyLLM->>Model: _initialize_model() / load weights to device

    CLI->>SbkAI: generate(prompt)
    SbkAI->>PyLLM: _generate_text(prompt)
    PyLLM->>Model: tokenize & run generation (sampling params)
    Model-->>PyLLM: tokens / logits
    PyLLM->>PyLLM: _clean_generated_text()
    PyLLM-->>SbkAI: (success, text)
    SbkAI-->>CLI: return generated text

    CLI->>SbkAI: close(args)
    SbkAI->>PyLLM: close(args)
    PyLLM->>Model: optional _train_on_output() / _save_model()
    PyLLM->>Model: free resources
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐇 I hop where tensors softly glow,
I open gates and let prompts flow.
I whisper to weights, they hum and grow,
I close the loop — save seeds I sow.
Tiny rabbit bounds: infer, train, go!

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.76% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add pytorch LLM' accurately reflects the main addition in this changeset: a new PyTorchLLM backend implementation with supporting documentation, dependencies, and integration points.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch kmg-release-2026-2

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

🤖 Fix all issues with AI agents
In `@README.md`:
- Around line 131-132: The README currently pins a non-existent Python version
in the conda create command ("conda create -n conda-sbk-charts python=3.14 -y");
update that command to use a released interpreter (e.g., python=3.12 or
python=3.13) so environment creation succeeds, and optionally note a range like
"python>=3.12,<3.14" or "python=3.12" for clarity.

In `@requirements.txt`:
- Around line 13-14: The requirements.txt entry for transformers is pinned to a
non-existent version "transformers~=4.57.3"; update that token to a valid
release (e.g., change "transformers~=4.57.3" to "transformers~=4.57.4" or a
later valid version) while leaving "torch~=2.9.1" unchanged so pip can resolve
packages correctly.

In `@src/custom_ai/pytorch_llm/pytorch_llm.py`:
- Around line 100-105: The call to AutoModelForCausalLM.from_pretrained is using
the wrong keyword argument (dtype) so the dtype is ignored; change the parameter
name to torch_dtype when calling
AutoModelForCausalLM.from_pretrained(saved_model_dir, ...) and pass the same
dtype variable as torch_dtype; apply the identical change to the second
occurrence of the same call (the block around the lines referenced 116-121) so
both places use torch_dtype instead of dtype.
- Around line 100-105: Add a new boolean CLI flag (e.g., --pt-trust-remote-code)
to the existing argument parser with default False and store it in a variable
(e.g., pt_trust_remote_code); then replace the hardcoded trust_remote_code=True
in every model load call (the AutoModelForCausalLM.from_pretrained and any other
.from_pretrained usage in pytorch_llm.py) to use that variable
(trust_remote_code=pt_trust_remote_code) so users must explicitly opt in to
remote code execution.
- Around line 476-478: The autocast device selection in _train_on_example
currently only checks for 'cuda' and falls back to 'cpu'; update it to mirror
_train_on_output by including 'mps' detection: when forming the torch.autocast
call inside _train_on_example, set device_type to 'mps' if 'mps' is in
str(self.device), otherwise 'cuda' if 'cuda' is in str(self.device), else 'cpu',
and keep passing model_dtype as dtype so mixed-precision works correctly on
Apple Silicon.
- Around line 191-195: In open(), _is_initialized is being set to True
unconditionally after calling _initialize_model(); change the logic so you
capture the return value of _initialize_model() and only set
self._is_initialized = True when that call indicates success (e.g., returned
True), otherwise leave it False and handle the failure (log/error/raise or
return a failure status) so retries are possible; update the open() method to
use the boolean result from _initialize_model() and ensure callers can
observe/initiate retry behavior.
- Around line 279-281: The attention_mask is incorrectly cast to the model's
floating dtype; instead, keep it as an integer tensor and only move it to the
model device (or explicitly cast to torch.long). Update the code that currently
does inputs['attention_mask'] = inputs['attention_mask'].to(dtype=model_dtype)
to something like inputs['attention_mask'] =
inputs['attention_mask'].to(device=model.device, dtype=torch.long) (or preserve
original integer dtype while moving device), and make the same change in the
_train_on_example function so 'attention_mask' remains an integer (long) tensor
rather than being converted to model_dtype.
- Around line 440-493: The _train_on_example method returns inconsistent types
(float/None in docstring but (False, str) on exception); change the except block
to return None (or re-raise the exception) and log the error instead of
returning a tuple, and update the method docstring to clearly state it returns
float or None on failure; specifically modify the except block in
_train_on_example to call your logger (e.g., self.logger.error or similar) with
the exception message and then return None, removing the (False, str) tuple.
- Line 123: The call self.model = self.model.to(self.device) is redundant and
harmful when using device_map="auto" because the model is already placed across
devices; remove that unconditional .to(...) or guard it so it only runs when
device_map is not "auto" (e.g., check the loaded config/argument for device_map
before applying .to). Update the model-loading path where self.model is assigned
to avoid forcing .to(self.device) when device_map=="auto", leaving placement to
the loader to preserve model parallelism.
🧹 Nitpick comments (2)
README.md (1)

130-130: Add language specifiers to fenced code blocks.

Per markdownlint MD040, fenced code blocks should have a language specified for proper syntax highlighting.

Proposed fix
-```
+```bash
 # Create a new conda environment with Python 3.14 or higher
-```
+```bash
 conda deactivate

Also applies to: 149-149

src/custom_ai/pytorch_llm/pytorch_llm.py (1)

334-334: Remove unused variable was_training.

The variable was_training is assigned but never used, as flagged by Ruff F841.

Proposed fix
-        was_training = self.model.training
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ffa1190 and e3d801a.

📒 Files selected for processing (7)
  • README.md
  • requirements.txt
  • src/ai/sbk_ai.py
  • src/custom_ai/pytorch_llm/__init__.py
  • src/custom_ai/pytorch_llm/pytorch_llm.py
  • src/genai/genai.py
  • src/main/sbk_charts.py
🧰 Additional context used
🧬 Code graph analysis (4)
src/genai/genai.py (2)
src/custom_ai/pytorch_llm/pytorch_llm.py (2)
  • open (191-195)
  • close (198-219)
src/ai/sbk_ai.py (2)
  • open (194-196)
  • close (198-200)
src/ai/sbk_ai.py (4)
src/custom_ai/pytorch_llm/pytorch_llm.py (3)
  • parse_args (179-188)
  • open (191-195)
  • close (198-219)
src/genai/genai.py (3)
  • parse_args (55-56)
  • open (58-59)
  • close (61-62)
src/custom_ai/lm_studio/lm_studio.py (1)
  • parse_args (98-103)
src/custom_ai/ollama/ollama.py (1)
  • parse_args (113-119)
src/main/sbk_charts.py (3)
src/custom_ai/pytorch_llm/pytorch_llm.py (2)
  • open (191-195)
  • close (198-219)
src/ai/sbk_ai.py (2)
  • open (194-196)
  • close (198-200)
src/genai/genai.py (2)
  • open (58-59)
  • close (61-62)
src/custom_ai/pytorch_llm/pytorch_llm.py (2)
src/genai/genai.py (13)
  • SbkGenAI (25-356)
  • add_args (52-53)
  • parse_args (55-56)
  • open (58-59)
  • get_model_description (65-73)
  • get_throughput_analysis (327-336)
  • get_throughput_prompt (75-122)
  • get_latency_analysis (339-348)
  • get_latency_prompt (124-215)
  • get_total_mb_analysis (351-352)
  • get_total_mb_prompt (218-247)
  • get_percentile_histogram_analysis (355-356)
  • get_percentile_histogram_prompt (249-323)
src/ai/sbk_ai.py (3)
  • add_args (141-168)
  • parse_args (171-192)
  • open (194-196)
🪛 markdownlint-cli2 (0.18.1)
README.md

130-130: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


149-149: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 Ruff (0.14.11)
src/genai/genai.py

58-59: SbkGenAI.open is an empty method in an abstract base class, but has no abstract decorator

(B027)


61-62: SbkGenAI.close is an empty method in an abstract base class, but has no abstract decorator

(B027)

src/custom_ai/pytorch_llm/pytorch_llm.py

133-133: Consider moving this statement to an else block

(TRY300)


135-135: Do not catch blind exception: Exception

(BLE001)


136-136: Use explicit conversion flag

Replace with conversion flag

(RUF010)


191-191: Unused method argument: args

(ARG002)


198-198: Unused method argument: args

(ARG002)


222-222: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


243-243: Consider moving this statement to an else block

(TRY300)


245-245: Do not catch blind exception: Exception

(BLE001)


246-246: Use explicit conversion flag

Replace with conversion flag

(RUF010)


314-314: Consider moving this statement to an else block

(TRY300)


316-316: Do not catch blind exception: Exception

(BLE001)


317-317: Use explicit conversion flag

Replace with conversion flag

(RUF010)


334-334: Local variable was_training is assigned to but never used

Remove assignment to unused variable was_training

(F841)


389-389: Consider moving this statement to an else block

(TRY300)


391-391: Do not catch blind exception: Exception

(BLE001)


392-392: Use explicit conversion flag

Replace with conversion flag

(RUF010)


490-490: Consider moving this statement to an else block

(TRY300)


492-492: Do not catch blind exception: Exception

(BLE001)


493-493: Use explicit conversion flag

Replace with conversion flag

(RUF010)


500-500: Do not catch blind exception: Exception

(BLE001)


501-501: Use explicit conversion flag

Replace with conversion flag

(RUF010)


508-508: Do not catch blind exception: Exception

(BLE001)


509-509: Use explicit conversion flag

Replace with conversion flag

(RUF010)


516-516: Do not catch blind exception: Exception

(BLE001)


517-517: Use explicit conversion flag

Replace with conversion flag

(RUF010)


524-524: Do not catch blind exception: Exception

(BLE001)


525-525: Use explicit conversion flag

Replace with conversion flag

(RUF010)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build
🔇 Additional comments (5)
requirements.txt (1)

8-8: Verify huggingface_hub version compatibility.

The huggingface_hub was downgraded from 1.2.3 to 0.34.0. Ensure this version is compatible with the transformers version being used, as transformers may have minimum version requirements for huggingface_hub.

src/genai/genai.py (1)

58-62: LGTM - Optional lifecycle hooks.

The open() and close() methods are intentionally non-abstract to serve as optional lifecycle hooks. Backends like PyTorchLLM can override them for resource initialization and cleanup (e.g., model loading, training on close), while other backends can use the default no-op implementation. The Ruff B027 warning can be safely ignored as this is the intended design.

src/ai/sbk_ai.py (2)

54-54: LGTM - Clean timeout constant extraction.

Good refactor to extract DEFAULT_TIMEOUT_SECONDS as a module-level constant, making the default value explicit and reusable. The int() conversion in parse_args properly handles string input from command-line arguments.

Also applies to: 138-138, 187-187


194-200: LGTM - Lifecycle delegation pattern.

The open() and close() methods correctly delegate to the active AI backend instance when present. This enables proper resource management for backends like PyTorchLLM that need initialization and cleanup.

src/custom_ai/pytorch_llm/pytorch_llm.py (1)

37-37: No changes needed. openai/gpt-oss-20b is a legitimate Hugging Face model—OpenAI's open-weight GPT-OSS model (~21B parameters) released under Apache 2.0 license and available on Hugging Face in multiple formats.

Likely an incorrect or invalid review comment.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment on lines +131 to +132
# Create a new conda environment with Python 3.14 or higher
conda create -n conda-sbk-charts python=3.14 -y
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Python 3.14 does not exist yet.

Python 3.14 is not released (current stable versions are 3.12/3.13). This will cause conda environment creation to fail. Consider using Python 3.12 or 3.13.

Proposed fix
-# Create a new conda environment with Python 3.14 or higher
-conda create -n conda-sbk-charts python=3.14 -y
+# Create a new conda environment with Python 3.12 or higher
+conda create -n conda-sbk-charts python=3.12 -y
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Create a new conda environment with Python 3.14 or higher
conda create -n conda-sbk-charts python=3.14 -y
# Create a new conda environment with Python 3.12 or higher
conda create -n conda-sbk-charts python=3.12 -y
🤖 Prompt for AI Agents
In `@README.md` around lines 131 - 132, The README currently pins a non-existent
Python version in the conda create command ("conda create -n conda-sbk-charts
python=3.14 -y"); update that command to use a released interpreter (e.g.,
python=3.12 or python=3.13) so environment creation succeeds, and optionally
note a range like "python>=3.12,<3.14" or "python=3.12" for clarity.

Comment on lines +13 to +14
torch~=2.9.1
transformers~=4.57.3
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check latest available versions of torch and transformers on PyPI
echo "Checking torch versions:"
curl -s https://pypi.org/pypi/torch/json | jq -r '.releases | keys | .[-15:]' | sort -V

echo -e "\nChecking transformers versions:"
curl -s https://pypi.org/pypi/transformers/json | jq -r '.releases | keys | .[-15:]' | sort -V

Repository: kmgowda/sbk-charts

Length of output: 469


Fix invalid transformers version in requirements.txt.

Line 14 specifies transformers~=4.57.3, but this version does not exist on PyPI. Available versions jump from 4.9.2 to 4.57.4. Update to transformers~=4.57.4 or later.

The torch~=2.9.1 version is valid and exists on PyPI.

🤖 Prompt for AI Agents
In `@requirements.txt` around lines 13 - 14, The requirements.txt entry for
transformers is pinned to a non-existent version "transformers~=4.57.3"; update
that token to a valid release (e.g., change "transformers~=4.57.3" to
"transformers~=4.57.4" or a later valid version) while leaving "torch~=2.9.1"
unchanged so pip can resolve packages correctly.

Comment on lines +100 to +105
self.model = AutoModelForCausalLM.from_pretrained(
saved_model_dir,
device_map="auto",
dtype=dtype,
trust_remote_code=True
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Wrong parameter name: use torch_dtype instead of dtype.

The AutoModelForCausalLM.from_pretrained() method expects torch_dtype, not dtype. This will cause the dtype setting to be silently ignored.

Proposed fix
             self.model = AutoModelForCausalLM.from_pretrained(
                 saved_model_dir,
                 device_map="auto",
-                dtype=dtype,
+                torch_dtype=dtype,
                 trust_remote_code=True
             )

Apply the same fix at lines 116-121.

Also applies to: 116-121

🤖 Prompt for AI Agents
In `@src/custom_ai/pytorch_llm/pytorch_llm.py` around lines 100 - 105, The call to
AutoModelForCausalLM.from_pretrained is using the wrong keyword argument (dtype)
so the dtype is ignored; change the parameter name to torch_dtype when calling
AutoModelForCausalLM.from_pretrained(saved_model_dir, ...) and pass the same
dtype variable as torch_dtype; apply the identical change to the second
occurrence of the same call (the block around the lines referenced 116-121) so
both places use torch_dtype instead of dtype.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's verify the file exists and get its context
fd src/custom_ai/pytorch_llm/pytorch_llm.py

Repository: kmgowda/sbk-charts

Length of output: 44


🏁 Script executed:

# Search for pytorch_llm files
fd pytorch_llm

# Also search for files containing AutoModelForCausalLM to understand context
rg AutoModelForCausalLM --type py -l

Repository: kmgowda/sbk-charts

Length of output: 170


🏁 Script executed:

# Check file size
wc -l src/custom_ai/pytorch_llm/pytorch_llm.py

# Read the file to understand context
cat -n src/custom_ai/pytorch_llm/pytorch_llm.py

Repository: kmgowda/sbk-charts

Length of output: 24290


Add configurable --pt-trust-remote-code flag to control arbitrary code execution.

The code hardcodes trust_remote_code=True when loading models (lines 100-105, 106-109, 112-115, 116-121), which enables arbitrary code execution from model repositories. Add a command-line flag --pt-trust-remote-code to the existing argument parser that defaults to False, allowing users to explicitly opt-in to this security-critical feature only when needed.

🤖 Prompt for AI Agents
In `@src/custom_ai/pytorch_llm/pytorch_llm.py` around lines 100 - 105, Add a new
boolean CLI flag (e.g., --pt-trust-remote-code) to the existing argument parser
with default False and store it in a variable (e.g., pt_trust_remote_code); then
replace the hardcoded trust_remote_code=True in every model load call (the
AutoModelForCausalLM.from_pretrained and any other .from_pretrained usage in
pytorch_llm.py) to use that variable (trust_remote_code=pt_trust_remote_code) so
users must explicitly opt in to remote code execution.

trust_remote_code=True
)

self.model = self.model.to(self.device)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Redundant .to(device) after device_map="auto".

When using device_map="auto", the model is automatically placed on appropriate devices. Calling .to(self.device) afterwards can cause issues with model parallelism and is unnecessary.

Proposed fix
-            self.model = self.model.to(self.device)
+            # Note: device_map="auto" handles device placement automatically
🤖 Prompt for AI Agents
In `@src/custom_ai/pytorch_llm/pytorch_llm.py` at line 123, The call self.model =
self.model.to(self.device) is redundant and harmful when using device_map="auto"
because the model is already placed across devices; remove that unconditional
.to(...) or guard it so it only runs when device_map is not "auto" (e.g., check
the loaded config/argument for device_map before applying .to). Update the
model-loading path where self.model is assigned to avoid forcing
.to(self.device) when device_map=="auto", leaving placement to the loader to
preserve model parallelism.

Comment on lines +279 to +281
# Ensure attention mask is the same dtype as the model
if 'attention_mask' in inputs:
inputs['attention_mask'] = inputs['attention_mask'].to(dtype=model_dtype)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Incorrect dtype for attention_mask.

Attention masks should remain as integer tensors (long/int), not be cast to the model's floating-point dtype. This can cause errors or incorrect behavior.

Proposed fix
-            # Ensure attention mask is the same dtype as the model
-            if 'attention_mask' in inputs:
-                inputs['attention_mask'] = inputs['attention_mask'].to(dtype=model_dtype)
+            # Note: attention_mask should remain as integer type, no conversion needed

The same issue exists in _train_on_example at lines 472-474.

🤖 Prompt for AI Agents
In `@src/custom_ai/pytorch_llm/pytorch_llm.py` around lines 279 - 281, The
attention_mask is incorrectly cast to the model's floating dtype; instead, keep
it as an integer tensor and only move it to the model device (or explicitly cast
to torch.long). Update the code that currently does inputs['attention_mask'] =
inputs['attention_mask'].to(dtype=model_dtype) to something like
inputs['attention_mask'] = inputs['attention_mask'].to(device=model.device,
dtype=torch.long) (or preserve original integer dtype while moving device), and
make the same change in the _train_on_example function so 'attention_mask'
remains an integer (long) tensor rather than being converted to model_dtype.

Comment on lines +440 to +493
def _train_on_example(self, prompt: str, target: str) -> None:
"""Train the model on a single example.

Args:
prompt: The input prompt
target: The target response

Returns:
float: The loss value if successful, None otherwise
"""
try:
# Combine prompt and target with separator
text = f"{prompt}{target}{self.tokenizer.eos_token}"

# Get model's dtype for consistent typing
model_dtype = next(self.model.parameters()).dtype

# Tokenize the input
inputs = self.tokenizer(
text,
return_tensors="pt",
padding="max_length",
max_length=self.max_length,
truncation=True
)

# Move inputs to the correct device and type
inputs = {k: v.to(self.device) for k, v in inputs.items()}

# Create labels (shifted input) and ensure correct dtype
labels = inputs["input_ids"].clone()

# Ensure attention mask is the same dtype as the model
if 'attention_mask' in inputs:
inputs['attention_mask'] = inputs['attention_mask'].to(dtype=model_dtype)

# Forward pass with autocast for mixed precision training
with torch.autocast(device_type='cuda' if 'cuda' in str(self.device) else 'cpu',
dtype=model_dtype):
outputs = self.model(
input_ids=inputs["input_ids"],
attention_mask=inputs.get("attention_mask", None),
labels=labels
)

# Backward pass and optimize
loss = outputs.loss
if loss is not None:
loss.backward()
return loss.item()
return None

except Exception as e:
return False, f"Error in text generation: {str(e)}"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Inconsistent return type in _train_on_example.

The docstring states the method returns float or None, but line 493 returns a tuple (False, str). This type inconsistency can cause issues for callers.

Proposed fix
         except Exception as e:
-            return False, f"Error in text generation: {str(e)}"
+            print(f"Error in training: {e!r}")
+            return None
🧰 Tools
🪛 Ruff (0.14.11)

490-490: Consider moving this statement to an else block

(TRY300)


492-492: Do not catch blind exception: Exception

(BLE001)


493-493: Use explicit conversion flag

Replace with conversion flag

(RUF010)

🤖 Prompt for AI Agents
In `@src/custom_ai/pytorch_llm/pytorch_llm.py` around lines 440 - 493, The
_train_on_example method returns inconsistent types (float/None in docstring but
(False, str) on exception); change the except block to return None (or re-raise
the exception) and log the error instead of returning a tuple, and update the
method docstring to clearly state it returns float or None on failure;
specifically modify the except block in _train_on_example to call your logger
(e.g., self.logger.error or similar) with the exception message and then return
None, removing the (False, str) tuple.

Comment on lines +476 to +478
# Forward pass with autocast for mixed precision training
with torch.autocast(device_type='cuda' if 'cuda' in str(self.device) else 'cpu',
dtype=model_dtype):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing MPS device handling in _train_on_example autocast.

The _train_on_example method doesn't handle 'mps' device type in autocast, unlike _train_on_output (line 292). This inconsistency could cause issues on Apple Silicon.

Proposed fix
-            with torch.autocast(device_type='cuda' if 'cuda' in str(self.device) else 'cpu', 
+            device_type = 'mps' if 'mps' in str(self.device) else 'cuda' if 'cuda' in str(self.device) else 'cpu'
+            with torch.autocast(device_type=device_type,
                               dtype=model_dtype):
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Forward pass with autocast for mixed precision training
with torch.autocast(device_type='cuda' if 'cuda' in str(self.device) else 'cpu',
dtype=model_dtype):
# Forward pass with autocast for mixed precision training
device_type = 'mps' if 'mps' in str(self.device) else 'cuda' if 'cuda' in str(self.device) else 'cpu'
with torch.autocast(device_type=device_type,
dtype=model_dtype):
🤖 Prompt for AI Agents
In `@src/custom_ai/pytorch_llm/pytorch_llm.py` around lines 476 - 478, The
autocast device selection in _train_on_example currently only checks for 'cuda'
and falls back to 'cpu'; update it to mirror _train_on_output by including 'mps'
detection: when forming the torch.autocast call inside _train_on_example, set
device_type to 'mps' if 'mps' is in str(self.device), otherwise 'cuda' if 'cuda'
is in str(self.device), else 'cpu', and keep passing model_dtype as dtype so
mixed-precision works correctly on Apple Silicon.

Comment on lines 73 to +76
ch.parse_args(args)
ch.open(args)
ch.add_performance_details()
ch.close(args)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Wrap open/close lifecycle in try/finally for exception safety.

If add_performance_details() raises an exception, close() will never be called. This is particularly important for the PyTorchLLM backend where close() handles training completion and model persistence.

🛠️ Suggested fix
     ch.parse_args(args)
-    ch.open(args)
-    ch.add_performance_details()
-    ch.close(args)
+    ch.open(args)
+    try:
+        ch.add_performance_details()
+    finally:
+        ch.close(args)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
ch.parse_args(args)
ch.open(args)
ch.add_performance_details()
ch.close(args)
ch.parse_args(args)
ch.open(args)
try:
ch.add_performance_details()
finally:
ch.close(args)

Signed-off-by: Keshava Munegowda <keshava.gowda@gmail.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/custom_ai/pytorch_llm/pytorch_llm.py`:
- Around line 485-493: The method currently calls loss.backward() but never
updates parameters; after loss.backward() call in the function containing
loss.backward() add a call to optimizer.step() and then optimizer.zero_grad()
(or zero gradients before the backward as per existing pattern) to perform the
weight update; also fix the except block that returns (False, str) so the
function adheres to its docstring by returning None on error (or re-raising) and
optionally logging the exception instead of returning a tuple—update references
to loss.backward(), optimizer.step(), optimizer.zero_grad(), and the except
block return to ensure a consistent float|None return type.
♻️ Duplicate comments (5)
src/custom_ai/pytorch_llm/pytorch_llm.py (5)

100-105: Wrong parameter name: use torch_dtype instead of dtype.

The AutoModelForCausalLM.from_pretrained() method expects torch_dtype, not dtype. This causes the dtype setting to be silently ignored, and the model will load with default precision.

Proposed fix
             self.model = AutoModelForCausalLM.from_pretrained(
                 saved_model_dir,
                 device_map="auto",
-                dtype=dtype,
+                torch_dtype=dtype,
                 trust_remote_code=True
             )

Apply the same fix at lines 116-121.


123-123: Redundant .to(device) after device_map="auto".

When using device_map="auto", the model is automatically distributed across devices. Calling .to(self.device) can break model parallelism and cause device mismatch errors for large models.

Proposed fix
-            self.model = self.model.to(self.device)
+            # Note: device_map="auto" handles device placement automatically

191-196: Bug: _is_initialized set to True regardless of initialization result.

If _initialize_model() returns False (initialization failed), _is_initialized is still set to True on line 195, masking the failure and preventing retry attempts.

Proposed fix
 def open(self, args):
     # Reinitialize model if needed
     if not self._is_initialized:
-        self._initialize_model()
-        self._is_initialized = True
+        if not self._initialize_model():
+            print("Warning: Model initialization failed in open()")
+            return

279-281: Incorrect dtype for attention_mask.

Attention masks should remain as integer tensors (torch.long), not be cast to the model's floating-point dtype. This can cause errors or incorrect behavior during forward pass.

Proposed fix
-            # Ensure attention mask is the same dtype as the model
-            if 'attention_mask' in inputs:
-                inputs['attention_mask'] = inputs['attention_mask'].to(dtype=model_dtype)
+            # Note: attention_mask should remain as integer type, no conversion needed

The same issue exists in _train_on_example at lines 472-474.


476-478: Missing MPS device handling in autocast.

Unlike _train_on_output (line 292), this method doesn't handle 'mps' device type in autocast, causing inconsistent behavior on Apple Silicon.

Proposed fix
-            with torch.autocast(device_type='cuda' if 'cuda' in str(self.device) else 'cpu', 
+            device_type = 'mps' if 'mps' in str(self.device) else 'cuda' if 'cuda' in str(self.device) else 'cpu'
+            with torch.autocast(device_type=device_type,
                               dtype=model_dtype):
🧹 Nitpick comments (4)
src/custom_ai/pytorch_llm/pytorch_llm.py (1)

334-334: Unused variable was_training.

The variable was_training is assigned but never used. This appears to be leftover code from an incomplete implementation to restore training mode after generation.

Either remove it or implement the restoration logic:

Option 1: Remove unused variable
-        was_training = self.model.training
Option 2: Implement training mode restoration
         was_training = self.model.training
         try:
+            self.model.eval()
             # ... generation code ...
+        finally:
+            if was_training:
+                self.model.train()
src/custom_ai/pytorch_llm/README.md (3)

128-140: Python code should use Python code block, not bash.

The model download example contains Python code but is wrapped in a bash code block. This affects syntax highlighting and copy-paste usability.

Proposed fix
 1. Download a model from Hugging Face Hub:
-   ```bash
+   ```python
    from transformers import AutoModelForCausalLM, AutoTokenizer

175-184: Add language specifier to fenced code block.

The directory structure code block should specify a language (e.g., text or plaintext) for consistent formatting.

Proposed fix
-```
+```text
 src/
 └── custom_ai/
     └── pytorch_llm/
         ├── __init__.py
         ├── pytorch_llm.py      # Main implementation
         └── README.md           # This document
 ```

184-187: Add blank line before License heading.

Missing blank line before the ## License heading affects Markdown rendering consistency.

Proposed fix
         └── README.md           # This document

License

</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>📜 Review details</summary>

**Configuration used**: defaults

**Review profile**: CHILL

**Plan**: Pro

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between e3d801a6a028eb836baa8cff6e72abbaa17048c2 and 37af43e01b81b45b042ff6d7c98fdcb12cd0fd39.

</details>

<details>
<summary>⛔ Files ignored due to path filters (2)</summary>

* `samples/charts/.DS_Store` is excluded by `!**/.DS_Store`
* `samples/charts/sbk-file-rocksdb-read-pytorch-llm.xlsx` is excluded by `!**/*.xlsx`

</details>

<details>
<summary>📒 Files selected for processing (3)</summary>

* `src/custom_ai/pytorch_llm/README.md`
* `src/custom_ai/pytorch_llm/pytorch_llm.py`
* `src/version/sbk_version.py`

</details>

<details>
<summary>🧰 Additional context used</summary>

<details>
<summary>🧬 Code graph analysis (1)</summary>

<details>
<summary>src/custom_ai/pytorch_llm/pytorch_llm.py (2)</summary><blockquote>

<details>
<summary>src/genai/genai.py (14)</summary>

* `SbkGenAI` (25-356)
* `add_args` (52-53)
* `parse_args` (55-56)
* `open` (58-59)
* `close` (61-62)
* `get_model_description` (65-73)
* `get_throughput_analysis` (327-336)
* `get_throughput_prompt` (75-122)
* `get_latency_analysis` (339-348)
* `get_latency_prompt` (124-215)
* `get_total_mb_analysis` (351-352)
* `get_total_mb_prompt` (218-247)
* `get_percentile_histogram_analysis` (355-356)
* `get_percentile_histogram_prompt` (249-323)

</details>
<details>
<summary>src/ai/sbk_ai.py (4)</summary>

* `add_args` (141-168)
* `parse_args` (171-192)
* `open` (194-196)
* `close` (198-200)

</details>

</blockquote></details>

</details><details>
<summary>🪛 markdownlint-cli2 (0.18.1)</summary>

<details>
<summary>src/custom_ai/pytorch_llm/README.md</summary>

177-177: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>

</details>
<details>
<summary>🪛 Ruff (0.14.11)</summary>

<details>
<summary>src/custom_ai/pytorch_llm/pytorch_llm.py</summary>

133-133: Consider moving this statement to an `else` block

(TRY300)

---

135-135: Do not catch blind exception: `Exception`

(BLE001)

---

136-136: Use explicit conversion flag

Replace with conversion flag

(RUF010)

---

191-191: Unused method argument: `args`

(ARG002)

---

198-198: Unused method argument: `args`

(ARG002)

---

222-222: PEP 484 prohibits implicit `Optional`

Convert to `T | None`

(RUF013)

---

243-243: Consider moving this statement to an `else` block

(TRY300)

---

245-245: Do not catch blind exception: `Exception`

(BLE001)

---

246-246: Use explicit conversion flag

Replace with conversion flag

(RUF010)

---

314-314: Consider moving this statement to an `else` block

(TRY300)

---

316-316: Do not catch blind exception: `Exception`

(BLE001)

---

317-317: Use explicit conversion flag

Replace with conversion flag

(RUF010)

---

334-334: Local variable `was_training` is assigned to but never used

Remove assignment to unused variable `was_training`

(F841)

---

389-389: Consider moving this statement to an `else` block

(TRY300)

---

391-391: Do not catch blind exception: `Exception`

(BLE001)

---

392-392: Use explicit conversion flag

Replace with conversion flag

(RUF010)

---

490-490: Consider moving this statement to an `else` block

(TRY300)

---

492-492: Do not catch blind exception: `Exception`

(BLE001)

---

493-493: Use explicit conversion flag

Replace with conversion flag

(RUF010)

---

500-500: Do not catch blind exception: `Exception`

(BLE001)

---

501-501: Use explicit conversion flag

Replace with conversion flag

(RUF010)

---

508-508: Do not catch blind exception: `Exception`

(BLE001)

---

509-509: Use explicit conversion flag

Replace with conversion flag

(RUF010)

---

516-516: Do not catch blind exception: `Exception`

(BLE001)

---

517-517: Use explicit conversion flag

Replace with conversion flag

(RUF010)

---

524-524: Do not catch blind exception: `Exception`

(BLE001)

---

525-525: Use explicit conversion flag

Replace with conversion flag

(RUF010)

</details>

</details>

</details>

<details>
<summary>⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)</summary>

* GitHub Check: build

</details>

<details>
<summary>🔇 Additional comments (6)</summary><blockquote>

<details>
<summary>src/version/sbk_version.py (1)</summary><blockquote>

`11-11`: **LGTM!**

Version bump to `3.26.1.1` is appropriate for the new PyTorch LLM backend addition.

</blockquote></details>
<details>
<summary>src/custom_ai/pytorch_llm/pytorch_llm.py (4)</summary><blockquote>

`198-220`: **LGTM on training lifecycle in `close()`.**

The training flow correctly sets the model to training mode, iterates over outputs, and conditionally saves the model. The unused `args` parameter matches the interface contract from `SbkGenAI`.

---

`495-526`: **LGTM on analysis methods.**

The analysis methods correctly delegate to prompt generation and text generation, following the interface pattern from `SbkGenAI`. Exception handling provides graceful error reporting.

---

`527-534`: **LGTM on destructor cleanup.**

The `__del__` method properly cleans up model and tokenizer resources, and clears CUDA cache when available.

---

`36-46`: This comment is incorrect. The `openai/gpt-oss-20b` model is a legitimate, real Hugging Face model—OpenAI's open-weight 21B-parameter language model released under Apache 2.0 and available on the Hugging Face Hub. Users will not encounter download failures; the model can be loaded using `AutoModelForCausalLM.from_pretrained()` as intended.

The README acknowledges that this default model consumes over 200GB of RAM, which may be impractical for many setups. If resource constraints are a concern, users can specify alternative models via the `--pt-model` flag (e.g., `mistralai/Mistral-7B-v0.1`).



> Likely an incorrect or invalid review comment.

</blockquote></details>
<details>
<summary>src/custom_ai/pytorch_llm/README.md (1)</summary><blockquote>

`19-19`: **Verify the default model claim.**

The documentation mentions `openai/gpt-oss-20b` consuming more than 200GB RAM. If this model doesn't exist on Hugging Face (as noted in the code review), this claim may be misleading. Consider updating with a verified model and accurate memory requirements.

</blockquote></details>

</blockquote></details>

<sub>✏️ Tip: You can disable this entire section by setting `review_details` to `false` in your review settings.</sub>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

Signed-off-by: Keshava Munegowda <keshava.gowda@gmail.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@src/custom_ai/pytorch_llm/pytorch_llm.py`:
- Around line 56-61: Update the class/module docstring to reflect the real
default model: replace the mention of "gpt2" with the actual DEFAULT_MODEL value
("openai/gpt-oss-20b") so the docstring matches the constant DEFAULT_MODEL used
in this file (refer to DEFAULT_MODEL in src/custom_ai/pytorch_llm/pytorch_llm.py
and the surrounding class/docstring near the top of the file).
- Around line 357-358: The max_new_tokens calculation currently uses
max(DEFAULT_MAX_LENGTH, inputs['input_ids'].shape[1]) which can let
input+generation exceed the model context; change the calculation in the method
where max_new_tokens is set (referencing max_new_tokens, DEFAULT_MAX_LENGTH,
inputs['input_ids'] and self.max_length) to compute max_new_tokens = max(256,
self.max_length - inputs['input_ids'].shape[1]) (ensure the subtraction is
clamped so it never goes below 256 and result is an int) so that input tokens +
generated tokens never exceed the model context window.
♻️ Duplicate comments (7)
src/custom_ai/pytorch_llm/pytorch_llm.py (7)

100-105: Wrong parameter name: use torch_dtype instead of dtype.

The AutoModelForCausalLM.from_pretrained() method expects torch_dtype, not dtype. This will cause the dtype setting to be silently ignored, and the model will load with default precision.

Proposed fix
             self.model = AutoModelForCausalLM.from_pretrained(
                 saved_model_dir,
                 device_map="auto",
-                dtype=dtype,
+                torch_dtype=dtype,
                 trust_remote_code=True
             )

Apply the same fix at lines 116-121.


123-123: Redundant .to(device) after device_map="auto".

When using device_map="auto", the model is automatically distributed across available devices. Calling .to(self.device) can break model parallelism or cause errors when the model spans multiple GPUs.

Proposed fix
-            self.model = self.model.to(self.device)
+            # Note: device_map="auto" handles device placement automatically

142-177: Add configurable --pt-trust-remote-code flag.

The code hardcodes trust_remote_code=True when loading models, which enables arbitrary code execution from model repositories. Add a CLI flag that defaults to False, allowing users to explicitly opt-in.

Proposed fix
         parser.add_argument(
             "--pt-top-p",
             type=float,
             help=f"Top-p sampling parameter (default: {DEFAULT_TOP_P})",
             default=DEFAULT_TOP_P
         )
+        parser.add_argument(
+            "--pt-trust-remote-code",
+            action="store_true",
+            help="Trust remote code when loading models (default: False, security risk)",
+            default=False
+        )

Then update parse_args to store self.trust_remote_code = args.pt_trust_remote_code and use it in _initialize_model.


191-195: Bug: _is_initialized set to True regardless of initialization result.

If _initialize_model() returns False, _is_initialized is still set to True on line 195, masking the failure and preventing retry attempts.

Proposed fix
     def open(self, args):
         # Reinitialize model if needed
         if not self._is_initialized:
-            self._initialize_model()
-            self._is_initialized = True
+            if not self._initialize_model():
+                print("Warning: Model initialization failed in open()")
+                return False
+        return True

279-281: Incorrect dtype for attention_mask.

Attention masks should remain as integer tensors (long/int), not be cast to the model's floating-point dtype. This can cause errors or incorrect behavior during attention computation.

Proposed fix
-            # Ensure attention mask is the same dtype as the model
-            if 'attention_mask' in inputs:
-                inputs['attention_mask'] = inputs['attention_mask'].to(dtype=model_dtype)
+            # Note: attention_mask should remain as integer type, no conversion needed

484-492: Critical: Missing optimizer.step() - training is ineffective.

loss.backward() computes gradients but optimizer.step() is never called, so weights are never updated. This makes the training completely non-functional.

Also, line 492 returns (False, str) but the docstring states it returns float or None.

Proposed fix
             # Backward pass and optimize
             loss = outputs.loss
             if loss is not None:
                 loss.backward()
+                if not hasattr(self, 'optimizer'):
+                    self.optimizer = torch.optim.AdamW(self.model.parameters(), lr=5e-5)
+                self.optimizer.step()
+                self.optimizer.zero_grad()
                 return loss.item()
             return None
             
         except Exception as e:
-            return False, f"Error in text generation: {str(e)}"
+            print(f"Error in training: {e!r}")
+            return None

471-477: Attention mask dtype and missing MPS handling.

Two issues in this block:

  1. Line 472-473: Attention mask incorrectly cast to float dtype
  2. Line 476: Missing 'mps' device type handling (unlike _train_on_output at line 292)
Proposed fix
-            # Ensure attention mask is the same dtype as the model
-            if 'attention_mask' in inputs:
-                inputs['attention_mask'] = inputs['attention_mask'].to(dtype=model_dtype)
+            # Note: attention_mask should remain as integer type
             
             # Forward pass with autocast for mixed precision training
-            with torch.autocast(device_type='cuda' if 'cuda' in str(self.device) else 'cpu', 
+            device_type = 'mps' if 'mps' in str(self.device) else 'cuda' if 'cuda' in str(self.device) else 'cpu'
+            with torch.autocast(device_type=device_type,
                               dtype=model_dtype):
🧹 Nitpick comments (1)
src/custom_ai/pytorch_llm/pytorch_llm.py (1)

222-222: Update type hint to explicit Optional or union syntax.

Per PEP 484, implicit Optional is prohibited. The type hint should be explicit.

Proposed fix
-    def _save_model(self, output_dir: str = None) -> bool:
+    def _save_model(self, output_dir: str | None = None) -> bool:
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 37af43e and 02d02f3.

📒 Files selected for processing (1)
  • src/custom_ai/pytorch_llm/pytorch_llm.py
🧰 Additional context used
🧬 Code graph analysis (1)
src/custom_ai/pytorch_llm/pytorch_llm.py (2)
src/genai/genai.py (14)
  • SbkGenAI (25-356)
  • add_args (52-53)
  • parse_args (55-56)
  • open (58-59)
  • close (61-62)
  • get_model_description (65-73)
  • get_throughput_analysis (327-336)
  • get_throughput_prompt (75-122)
  • get_latency_analysis (339-348)
  • get_latency_prompt (124-215)
  • get_total_mb_analysis (351-352)
  • get_total_mb_prompt (218-247)
  • get_percentile_histogram_analysis (355-356)
  • get_percentile_histogram_prompt (249-323)
src/ai/sbk_ai.py (4)
  • add_args (141-168)
  • parse_args (171-192)
  • open (194-196)
  • close (198-200)
🪛 Ruff (0.14.11)
src/custom_ai/pytorch_llm/pytorch_llm.py

133-133: Consider moving this statement to an else block

(TRY300)


135-135: Do not catch blind exception: Exception

(BLE001)


136-136: Use explicit conversion flag

Replace with conversion flag

(RUF010)


191-191: Unused method argument: args

(ARG002)


198-198: Unused method argument: args

(ARG002)


222-222: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


243-243: Consider moving this statement to an else block

(TRY300)


245-245: Do not catch blind exception: Exception

(BLE001)


246-246: Use explicit conversion flag

Replace with conversion flag

(RUF010)


314-314: Consider moving this statement to an else block

(TRY300)


316-316: Do not catch blind exception: Exception

(BLE001)


317-317: Use explicit conversion flag

Replace with conversion flag

(RUF010)


388-388: Consider moving this statement to an else block

(TRY300)


390-390: Do not catch blind exception: Exception

(BLE001)


391-391: Use explicit conversion flag

Replace with conversion flag

(RUF010)


489-489: Consider moving this statement to an else block

(TRY300)


491-491: Do not catch blind exception: Exception

(BLE001)


492-492: Use explicit conversion flag

Replace with conversion flag

(RUF010)


499-499: Do not catch blind exception: Exception

(BLE001)


500-500: Use explicit conversion flag

Replace with conversion flag

(RUF010)


507-507: Do not catch blind exception: Exception

(BLE001)


508-508: Use explicit conversion flag

Replace with conversion flag

(RUF010)


515-515: Do not catch blind exception: Exception

(BLE001)


516-516: Use explicit conversion flag

Replace with conversion flag

(RUF010)


523-523: Do not catch blind exception: Exception

(BLE001)


524-524: Use explicit conversion flag

Replace with conversion flag

(RUF010)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build
🔇 Additional comments (6)
src/custom_ai/pytorch_llm/pytorch_llm.py (6)

36-46: LGTM on configuration defaults.

The device auto-detection logic correctly prioritizes CUDA, then MPS, then CPU. The default parameters are reasonable for a causal language model.


198-219: LGTM on close method logic.

The training loop and model saving flow are well-structured. The early break on None loss appropriately handles training failures.


396-410: LGTM on model description.

Clear and informative description output with appropriate initialization check.


412-437: LGTM on text cleaning utility.

The regex-based cleanup for repeated punctuation is well-implemented with proper character escaping.


494-524: LGTM on analysis entry points.

All four analysis methods follow a consistent pattern delegating to _generate_text with appropriate error handling.


526-533: LGTM on resource cleanup.

The destructor properly checks for attribute existence before deletion and clears CUDA cache when available.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment on lines +357 to +358
# Calculate max_new_tokens, ensuring it's at least DEFAULT_MAX_LENGTH
max_new_tokens = max(DEFAULT_MAX_LENGTH, inputs['input_ids'].shape[1])
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cat -n src/custom_ai/pytorch_llm/pytorch_llm.py | head -50

Repository: kmgowda/sbk-charts

Length of output: 1902


🏁 Script executed:

# Find DEFAULT_MAX_LENGTH definition
rg "DEFAULT_MAX_LENGTH" src/custom_ai/pytorch_llm/pytorch_llm.py -B 2 -A 2

Repository: kmgowda/sbk-charts

Length of output: 905


🏁 Script executed:

# Look at the _generate_text method context around lines 357-358
cat -n src/custom_ai/pytorch_llm/pytorch_llm.py | sed -n '320,390p'

Repository: kmgowda/sbk-charts

Length of output: 3726


🏁 Script executed:

# Check if self.max_length is used in the class
rg "self\.max_length" src/custom_ai/pytorch_llm/pytorch_llm.py -B 1 -A 1

Repository: kmgowda/sbk-charts

Length of output: 881


Fix max_new_tokens calculation to respect model context window.

The current logic max(DEFAULT_MAX_LENGTH, inputs['input_ids'].shape[1]) causes the total sequence length to exceed the model's context window. If the input is 100 tokens, max_new_tokens becomes 2048, resulting in 2148 total tokens (exceeding the 2048 context limit). Use instead:

max_new_tokens = max(256, self.max_length - inputs['input_ids'].shape[1])

This ensures the generated tokens + input tokens fit within the model's context window while maintaining a minimum generation length.

🤖 Prompt for AI Agents
In `@src/custom_ai/pytorch_llm/pytorch_llm.py` around lines 357 - 358, The
max_new_tokens calculation currently uses max(DEFAULT_MAX_LENGTH,
inputs['input_ids'].shape[1]) which can let input+generation exceed the model
context; change the calculation in the method where max_new_tokens is set
(referencing max_new_tokens, DEFAULT_MAX_LENGTH, inputs['input_ids'] and
self.max_length) to compute max_new_tokens = max(256, self.max_length -
inputs['input_ids'].shape[1]) (ensure the subtraction is clamped so it never
goes below 256 and result is an int) so that input tokens + generated tokens
never exceed the model context window.

Signed-off-by: Keshava Munegowda <keshava.gowda@gmail.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@src/custom_ai/pytorch_llm/pytorch_llm.py`:
- Around line 239-252: close currently assumes self.model is initialized and
calls self.model.train(), which can raise AttributeError if _initialize_model()
in open() failed and self.model is None; update close to guard that call by
checking self.model is not None (and optionally log or skip training when model
is missing) before calling self.model.train(), and likewise skip invoking
_train_on_output or iterating self.output_list when self.model is None or
train_mode is True but model is uninitialized; reference the close method, the
self.model attribute, train_mode flag, and _train_on_output/output_list to
locate where to add the guard.
- Around line 74-75: The instance-level list output_list on PyTorchLLM is never
cleared (it is appended to in _generate_text and reused across open()/close()
cycles), so modify the open() method to reset/clear self.output_list (e.g., set
to [] or .clear()) when initializing the session (alongside setting
self._is_initialized = True) to ensure previous session outputs are not carried
into future _generate_text or close() operations.
♻️ Duplicate comments (1)
src/custom_ai/pytorch_llm/pytorch_llm.py (1)

480-533: Multiple issues already flagged in past reviews.

This method has several known issues (missing optimizer.step(), inconsistent return type, attention_mask dtype, missing MPS handling) that were identified in previous review iterations. Please address those comments.

🧹 Nitpick comments (3)
src/custom_ai/pytorch_llm/pytorch_llm.py (3)

143-146: Bare except with pass silently swallows all exceptions.

This catches every exception type (including KeyboardInterrupt, SystemExit) and discards it without logging. At minimum, catch a specific exception type or log the failure.

Proposed fix
                             try:
                                 config_file = cached_file(model_name, 'config.json')
-                            except:
-                                pass
+                            except Exception:
+                                config_file = None  # Fall through to fallback logic

263-263: Use explicit Optional type annotation.

Per PEP 484, use Optional[str] or str | None instead of implicit optional via default value.

Proposed fix
-    def _save_model(self, output_dir: str = None) -> bool:
+    def _save_model(self, output_dir: str | None = None) -> bool:

567-574: Consider removing __del__ in favor of explicit cleanup.

__del__ is unreliable for resource management:

  • Not guaranteed to be called (circular references, interpreter shutdown)
  • Can cause issues if exceptions occur during garbage collection
  • Already have close() for explicit cleanup

The close() method is the proper place for cleanup. Consider moving CUDA cache clearing there and removing __del__, or making PyTorchLLM a context manager.

Proposed approach
 def close(self, args):
     # ... existing training logic ...
+    
+    # Cleanup resources
+    if torch.cuda.is_available():
+        torch.cuda.empty_cache()

-def __del__(self):
-    """Clean up resources when the object is destroyed."""
-    if hasattr(self, 'model') and self.model is not None:
-        del self.model
-    if hasattr(self, 'tokenizer') and self.tokenizer is not None:
-        del self.tokenizer
-    if torch.cuda.is_available():
-        torch.cuda.empty_cache()

Comment on lines +74 to +75
self._is_initialized = False
self.output_list = []
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

output_list accumulates indefinitely across sessions.

The list is initialized once and appended to in _generate_text but never cleared. If the same PyTorchLLM instance is reused across multiple open()/close() cycles, outputs from previous sessions persist and are re-trained on in subsequent close() calls.

Consider clearing the list in open():

Proposed fix
 def open(self, args):
     # Reinitialize model if needed
+    self.output_list = []  # Clear previous session outputs
     if not self._is_initialized:
🤖 Prompt for AI Agents
In `@src/custom_ai/pytorch_llm/pytorch_llm.py` around lines 74 - 75, The
instance-level list output_list on PyTorchLLM is never cleared (it is appended
to in _generate_text and reused across open()/close() cycles), so modify the
open() method to reset/clear self.output_list (e.g., set to [] or .clear()) when
initializing the session (alongside setting self._is_initialized = True) to
ensure previous session outputs are not carried into future _generate_text or
close() operations.

Comment on lines +239 to +252
def close(self, args):
loss = None
# If in training mode and we have a target, train on the generated output
if self.train_mode:
self.model.train()
print("\n" + "=" * 50)
print("🚀 Starting training on generated output")

for output_text in self.output_list:
print(f"📄 Generated length: {len(output_text)} chars")
loss = self._train_on_output(output_text)
if loss is None:
break
print("=" * 50)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Potential AttributeError if model initialization failed.

If _initialize_model() failed in open(), self.model remains None. Then close() calls self.model.train() on Line 243, which raises AttributeError: 'NoneType' object has no attribute 'train'.

Proposed fix
 def close(self, args):
     loss = None
     # If in training mode and we have a target, train on the generated output
-    if self.train_mode:
+    if self.train_mode and self.model is not None:
         self.model.train()
🧰 Tools
🪛 Ruff (0.14.13)

239-239: Unused method argument: args

(ARG002)

🤖 Prompt for AI Agents
In `@src/custom_ai/pytorch_llm/pytorch_llm.py` around lines 239 - 252, close
currently assumes self.model is initialized and calls self.model.train(), which
can raise AttributeError if _initialize_model() in open() failed and self.model
is None; update close to guard that call by checking self.model is not None (and
optionally log or skip training when model is missing) before calling
self.model.train(), and likewise skip invoking _train_on_output or iterating
self.output_list when self.model is None or train_mode is True but model is
uninitialized; reference the close method, the self.model attribute, train_mode
flag, and _train_on_output/output_list to locate where to add the guard.

Signed-off-by: Keshava Munegowda <keshava.gowda@gmail.com>
Signed-off-by: Keshava Munegowda <keshava.gowda@gmail.com>
Signed-off-by: Keshava Munegowda <keshava.gowda@gmail.com>
Signed-off-by: Keshava Munegowda <keshava.gowda@gmail.com>
@kmgowda kmgowda merged commit 6abf7ce into main Jan 31, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant