Skip to content

Conversation

@acanaveras
Copy link

@acanaveras acanaveras commented Feb 13, 2026

@gwarmstrong This is the Telco Team. Per our conversation two weeks ago, we would like to add this Tutorial to create a reasoning model for Telco NoC Incidences. Thanks

Summary by CodeRabbit

  • New Features

    • Added comprehensive tutorial for fine-tuning a reasoning model to autonomously diagnose and resolve telecom network incidents using a NOC Reasoning Agent pipeline.
    • Added data preparation, synthetic data generation, model training, and evaluation scripts for incident resolution workflows.
    • Added visualization and analysis tools for reasoning traces and model performance metrics.
  • Documentation

    • Added end-to-end tutorial documenting setup, data processing, training, and evaluation procedures for network incident handling.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

31 files reviewed, 5 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +52 to +56
site/
site/
site/
site/
site/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

site/ is duplicated 5 times

Suggested change
site/
site/
site/
site/
site/
site/

@@ -0,0 +1,377 @@
---
date: 2025-09-05
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The date 2025-09-05 is in the future. Should be a valid past date

Comment on lines +132 to +134
skip = True
if skip and os.path.exists(output_path):
os.remove(output_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable skip is hardcoded to True, which unconditionally deletes the output file on every run. This destroys the resume functionality mentioned on line 136. Either remove this code block or make skip a CLI argument

Comment on lines +15 to +28
from src.tools import (
ALL_TOOLS,
Check_Alarm_Status,
Check_Apply_Configuration,
Check_Element_Health,
Check_Element_Neighbors,
Check_External_Issues,
Check_Performance,
Check_remote_files,
Create_Ticket,
Execute_Remote_Action,
Orchestration_tool,
Triage_Toolkit_Tool,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These imports reference src.tools but the script is located in recipes/noc-reasoning-agent/scripts/. Check that PYTHONPATH is set correctly before running, otherwise imports will fail

Comment on lines +174 to +204
react_agent = create_react_agent(chat_llm, tools=[], checkpointer=checkpointer)
for index, row in tqdm(unprocessed_df.iterrows(), total=len(unprocessed_df), desc="Processing Rows"):
# print(row)
# exit()

# --- Step 2: Get the row-specific tools using the helper function. ---
# row_tools = get_bound_tools(row)

config = {"configurable": {"thread_id": str(index)}}
input_question = row["input"]
system_message = row["system"]
input_messages = [{"role": "system", "content": system_message}, {"role": "user", "content": input_question}]

try:
# We still invoke one-by-one, but the tool creation is now efficient.

## WE need to fix this. Add system message and tool calling messages
# separate user input messages
TOOLS = {
"Check_Alarm_Status": lambda args: Check_Alarm_Status(**args),
"Check_Element_Neighbors": lambda args: Check_Element_Neighbors(**args),
"Check_Element_Health": lambda args: Check_Element_Health(**args),
"Execute_Remote_Action": lambda args: Execute_Remote_Action(**args),
"Check_External_Issues": lambda args: Check_External_Issues(**args),
"Check_Apply_Configuration": lambda args: Check_Apply_Configuration(**args),
"Check_Performance": lambda args: Check_Performance(**args),
"Create_Ticket": lambda args: Create_Ticket(**args),
"Orchestration_tool": lambda args: Orchestration_tool(**args),
"Triage_Toolkit_Tool": lambda args: Triage_Toolkit_Tool(**args),
"Check_remote_files": lambda args: Check_remote_files(**args),
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created ReAct agent with empty tools=[] on line 174, then manually maps tool calls on lines 192-204. Consider using LangGraph's built-in tool binding (as shown in create_agent_with_tools.py) to simplify this logic

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 13, 2026

📝 Walkthrough

Walkthrough

This PR introduces a complete end-to-end NOC Reasoning Agent recipe featuring documentation, configuration, prompts, Python utility scripts for data preparation, model training coordination, agent construction, evaluation, and visualization. It enables fine-tuning Qwen models to autonomously diagnose and resolve telecom network incidents using structured reasoning and tool integration.

Changes

Cohort / File(s) Summary
Documentation and Ignore Files
.gitignore, recipes/noc-reasoning-agent/.gitignore, docs/tutorials/posts/noc-reasoning-agent.md
Added project-level and recipe-specific ignore rules, and comprehensive end-to-end tutorial covering setup, data processing, synthetic data generation, SFT training, evaluation, and deployment for the NOC reasoning agent.
Configuration Files
recipes/noc-reasoning-agent/configs/config.ini, recipes/noc-reasoning-agent/configs/noc_reasoning_sft.yaml, recipes/noc-reasoning-agent/configs/noc_reasoning_sft_6.yaml
Defines model download paths, data source mappings, and two distinct SFT dataset preparation pipelines with filtering, deduplication, and quality control steps for incident data processing.
Prompt Definitions
recipes/noc-reasoning-agent/prompts/formatting_prompt.yaml, recipes/noc-reasoning-agent/prompts/prompt_incident.yaml, recipes/noc-reasoning-agent/prompts/prompt_reasoning.yaml, recipes/noc-reasoning-agent/prompts/shortened_prompt_reasoning.yaml
Provides role-based system prompts and tool specifications for incident diagnosis, chain-of-thought reasoning, and structured output formatting to guide model behavior during inference.
Agent and Tool Definitions
recipes/noc-reasoning-agent/scripts/tools.py, recipes/noc-reasoning-agent/scripts/create_agent_with_tools.py, recipes/noc-reasoning-agent/scripts/create_agent_with_tools_batch.py
Implements 11 domain-specific tool functions for network diagnostics, and two ReAct agent implementations (sequential and batch-based) that integrate tools with LangChain and HuggingFace models for incident resolution.
Data Filtering and Categorization
recipes/noc-reasoning-agent/scripts/filtering/filter_rows.py, recipes/noc-reasoning-agent/scripts/filtering/match_keywords.py
Provides filtering utilities for CSV incident data (auto-recovery, soft-solve, problem/close code validation) and incident categorization by keyword matching against close code schemas.
Data Preparation and Transformation
recipes/noc-reasoning-agent/scripts/utils/create_input_jsonl_from_incidents.py, recipes/noc-reasoning-agent/scripts/utils/reasoning_processes.py, recipes/noc-reasoning-agent/scripts/utils/schema_columns.py, recipes/noc-reasoning-agent/scripts/utils/split_incident_data.py, recipes/noc-reasoning-agent/scripts/utils/split_mocktools_answers.py, recipes/noc-reasoning-agent/scripts/utils/format_reasoning_json.py
Converts incident CSV to JSONL format, maintains fault-category mappings to reasoning processes, defines canonical schema columns, splits data by incident for train/test, merges tool responses, and performs complex reasoning trace extraction and token-aware curriculum compilation for Qwen models.
Pipeline Orchestration
recipes/noc-reasoning-agent/scripts/ns_pipelines/generate_synthetic_data.py, recipes/noc-reasoning-agent/scripts/ns_pipelines/prepare_react_agent.py
Orchestrates synthetic data generation via Nemo Skills and consolidates data from multiple JSONL sources for ReAct agent training preparation.
Evaluation Utilities
recipes/noc-reasoning-agent/scripts/evaluation/evaluation_with_judge.py, recipes/noc-reasoning-agent/scripts/evaluation/problem_code_evaluation.py, recipes/noc-reasoning-agent/scripts/evaluation/score.py, recipes/noc-reasoning-agent/scripts/utils/token_usage.py
Computes ROUGE, BERTScore, and LLM-judge metrics for reasoning quality; validates close code predictions; aggregates evaluation scores; and analyzes token usage with histogram visualization and optional correlation analysis.
Visualization Utilities
recipes/noc-reasoning-agent/scripts/visualization/extract_representation_columns.py, recipes/noc-reasoning-agent/scripts/visualization/extract_scores.py, recipes/noc-reasoning-agent/scripts/visualization/generate_trace_visualization.py
Enriches JSONL records with CSV metadata, selects representative samples per category, aggregates evaluation metrics, and generates interactive HTML visualizations of incident resolution traces with step-by-step reasoning and work notes.

Sequence Diagrams

sequenceDiagram
    participant User
    participant DataPrepPipeline as Data Prep Pipeline
    participant SFTConfig as SFT Config
    participant Model as Qwen Model
    participant Training as Training Engine

    User->>DataPrepPipeline: CSV Incidents
    DataPrepPipeline->>DataPrepPipeline: Filter (auto-recover, soft-solve, codes)
    DataPrepPipeline->>DataPrepPipeline: Categorize by Keywords
    DataPrepPipeline->>DataPrepPipeline: Convert to JSONL
    DataPrepPipeline->>DataPrepPipeline: Enrich with Reasoning Processes
    DataPrepPipeline->>SFTConfig: Prepared JSONL
    SFTConfig->>SFTConfig: Apply Deduplication & Filtering
    SFTConfig->>SFTConfig: Shuffle & Downsample
    SFTConfig->>Model: Format as SFT Manifest
    Model->>Training: Tokenize with Qwen Tokenizer
    Training->>Training: Fine-tune with Megatron Backend
    Training->>User: Trained Model
Loading
sequenceDiagram
    participant User
    participant Agent as ReAct Agent
    participant ToolExecutor as Tool Executor
    participant Model as LM Model
    participant Tools as Domain Tools
    participant Memory as Memory Saver

    User->>Agent: Incident JSONL Row
    Agent->>Memory: Load Checkpoint (skip if processed)
    Memory->>Agent: Resume State
    Agent->>Model: Format User Message + System Prompt
    Model->>Model: Generate Response (max_tokens, temp, top_p)
    Model->>Agent: Tool Call or Conclusion
    alt Tool Call Detected
        Agent->>ToolExecutor: Parse Tool Name & Arguments
        ToolExecutor->>Tools: Execute Tool (Check_Alarm, Health, etc.)
        Tools->>ToolExecutor: Tool Response
        ToolExecutor->>Agent: Observation
        Agent->>Model: Add Observation to Context
        Model->>Agent: Next Action
    else Conclusion
        Agent->>Memory: Save Final Response
        Agent->>User: Output JSONL with Response
    end
Loading
sequenceDiagram
    participant User
    participant EvalPipeline as Evaluation Pipeline
    participant Metrics as Metric Calculators
    participant LLMJudge as LLM Judge
    participant Output as Output JSONL

    User->>EvalPipeline: Agent Responses + Expected Answers
    EvalPipeline->>EvalPipeline: Extract Reasoning & Output
    EvalPipeline->>Metrics: Compute ROUGE (rouge1, rougeL)
    Metrics->>EvalPipeline: ROUGE Scores
    EvalPipeline->>Metrics: Compute BERTScore F1
    Metrics->>EvalPipeline: BERTScore Results
    EvalPipeline->>LLMJudge: Score Final Output (LLM-based)
    LLMJudge->>EvalPipeline: Score + Reasoning
    EvalPipeline->>LLMJudge: Score Reasoning Quality (LLM-based)
    LLMJudge->>EvalPipeline: Score + Reasoning
    EvalPipeline->>Output: Augment Row with All Metrics
    EvalPipeline->>User: Evaluation JSONL
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 2
❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 54.26% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Merge Conflict Detection ⚠️ Warning ❌ Merge conflicts detected (2 files):

⚔️ .github/workflows/tests.yml (content)
⚔️ .gitignore (content)

These conflicts must be resolved before merging into main.
Resolve conflicts locally and push changes to this branch.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title 'Add noc reasoning tutorial' directly aligns with the main objective of adding tutorial material for a NoC reasoning agent, clearly summarizing the primary purpose of the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
⚔️ Resolve merge conflicts (beta)
  • Auto-commit resolved conflicts to branch add-noc-reasoning-tutorial
  • Post resolved changes as copyable diffs in a comment

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

🤖 Fix all issues with AI agents
In `@recipes/noc-reasoning-agent/scripts/create_agent_with_tools_batch.py`:
- Around line 132-135: The code currently hardcodes skip = True which causes
deletion of output_path every run and breaks resume logic; change this so skip
is not hardcoded—make it a CLI flag, environment variable, or default False—and
only call os.remove(output_path) when skip is explicitly set by the user; ensure
the block around skip, os.path.exists(output_path), and os.remove(output_path)
is gated by the configurable skip variable and leave
load_processed_indices(output_path) unchanged so resume logic
(load_processed_indices / processed_indices) can work when skip is False.
- Around line 156-157: The call to AutoModelForCausalLM.from_pretrained in
create_agent_with_tools_batch.py is using the wrong parameter name (dtype=) so
the intended precision is ignored; update the call to pass the dtype via
torch_dtype (e.g., torch_dtype=dtype) so the model loads in the specified
bfloat16/float16 precision, ensuring you modify the
AutoModelForCausalLM.from_pretrained(...) invocation accordingly.

In `@recipes/noc-reasoning-agent/scripts/evaluation/evaluation_with_judge.py`:
- Line 12: The parser.add_argument("--output_file", help="Path to output") can
leave args.output_file as None and cause pd.DataFrame.to_json(args.output_file,
...) to raise a ValueError; update the argument declaration
(parser.add_argument) to either set required=True or provide a sensible default
(e.g., default="evaluation_results.json"), and/or add a guard before writing
that validates args.output_file is not None and raises a clear error message;
ensure references to args.output_file and the DataFrame.to_json call are updated
accordingly.

In `@recipes/noc-reasoning-agent/scripts/utils/format_reasoning_json.py`:
- Around line 153-166: The code currently uses two regex matches (m and
m_no_args) and then unconditionally calls m.groups(), which creates dead code
and a latent crash; instead, after performing the initial match m =
re.match(r"^\s*([A-Za-z_]\w*)\s*\[(.*)\]\s*$", ... , re.DOTALL), check if m is
truthy and if not immediately return the original string (or handle the error),
and only then call m.groups() to assign tool_name and raw_args; remove the
separate m_no_args branch entirely so empty brackets are handled by the first
regex and m.groups() is always called on a valid match.

In `@recipes/noc-reasoning-agent/scripts/utils/token_usage.py`:
- Around line 207-276: The __main__ block contains leftover debug logic that
hardcodes csv_file/model, calls AutoTokenizer directly (unsafe if transformers
missing), and calls exit() before main(), making main() unreachable; remove or
replace the entire if __name__ == "__main__": block so it simply parses
CLI/default args and invokes main(), ensure any tokenizer creation uses the
existing guarded import/creation logic (e.g., reuse the module's
AutoTokenizer-safe factory or wrap AutoTokenizer.from_pretrained in the same
try/except used elsewhere), eliminate hardcoded paths and stray exit() calls,
and wire csv_file, column_name, model, output_csv, output_plot into main()
parameters or argparse so the script runs via main() correctly.

In
`@recipes/noc-reasoning-agent/scripts/visualization/generate_trace_visualization.py`:
- Around line 26-71: The function parse_final_reasoning contains an early return
and debug prints that make the intended parsing logic unreachable; remove the
premature return and the print statements and restructure the function to (1)
first find the final "Finish" block using trace_block_match or last_index as a
fallback, (2) set finish_text deterministically (return a clear error string if
neither is found), (3) run the step_pattern regex against that finish_text to
extract Thought/Action/Observation entries, format them into parsed_steps, and
finally return the joined string; refer to parse_final_reasoning,
trace_block_match, last_index, finish_text, step_pattern, and parsed_steps when
making these changes.
- Around line 74-88: The function find_finish_action assigns finish_text only
when last_index != -1 but always returns finish_text, causing UnboundLocalError
when "Finish" isn't found; update find_finish_action to handle the not-found
case by returning a safe default (e.g., None or empty string) or raising a clear
exception, and ensure finish_text is defined in both branches (use last_index,
generation_text, and finish_text to locate/construct the return value or the
fallback).
🟠 Major comments (22)
recipes/noc-reasoning-agent/scripts/ns_pipelines/generate_synthetic_data.py-33-34 (1)

33-34: ⚠️ Potential issue | 🟠 Major

Stub silently does nothing — user gets no feedback when using a non-default --llm.

If a user passes --llm with any value other than "qwen2.5-32b-instruct", generate_synthetic_data_oss_gpt is called and silently completes without action. This violates the principle that errors should not pass silently. Either raise NotImplementedError or restrict the --llm choices.

♻️ Suggested fix (option A: raise error)
 def generate_synthetic_data_oss_gpt(args, cluster, num_gpus):
-    pass
+    raise NotImplementedError("OSS GPT generation path is not yet implemented")
♻️ Suggested fix (option B: restrict choices)
-    parser.add_argument("--llm", type=str, default="qwen2.5-32b-instruct", help="The LLM to use for generation")
+    parser.add_argument(
+        "--llm", type=str, default="qwen2.5-32b-instruct",
+        choices=["qwen2.5-32b-instruct"],
+        help="The LLM to use for generation",
+    )

As per coding guidelines, "Follow the Zen of Python principles: … ensure errors never pass silently unless explicitly silenced".

Also applies to: 57-62

recipes/noc-reasoning-agent/scripts/ns_pipelines/generate_synthetic_data.py-43-55 (1)

43-55: ⚠️ Potential issue | 🟠 Major

--gpus argument is defined but never used; --num_gpus duplicates its purpose.

A user setting --gpus 4 would have no effect — only --num_gpus is read (line 54). Additionally, line 55 misleadingly prints "specified via --gpus". Remove the unused --gpus argument.

♻️ Suggested fix
-    # parser.add_argument('--gpus', type=int, default=detect_num_gpus(),
-    #                     help='Number of GPUs to use for generation')
-    parser.add_argument("--gpus", type=int, default=8, help="Number of GPUs to use for generation")
     parser.add_argument("--llm", type=str, default="qwen2.5-32b-instruct", help="The LLM to use for generation")
     parser.add_argument(
         "--num_gpus", type=int, default=8, help="Number of GPUs to use (auto-detected if not specified)"
     )
 
     args = parser.parse_args()
     cluster = "local"
 
     num_gpus = args.num_gpus
-    print(f"Using {num_gpus} GPUs (specified via --gpus)")
+    print(f"Using {num_gpus} GPUs (specified via --num_gpus)")

As per coding guidelines, "Avoid silently ignoring unused user-passed parameters; the code should fail if a required argument is not specified or if unsupported arguments are provided."

recipes/noc-reasoning-agent/scripts/visualization/extract_representation_columns.py-47-56 (1)

47-56: 🛠️ Refactor suggestion | 🟠 Major

Compute all enrichment before writing the output file.

The current code reads, enriches, and writes line-by-line in a single pass. If an error occurs mid-processing (e.g., malformed JSON on line N), a partial output file is left behind. Separate the read/compute phase from the write phase.

Proposed fix
-enriched_rows = []
-with open(jsonl_path, "r", encoding="utf-8") as fin, open(output_jsonl_path, "w", encoding="utf-8") as fout:
+enriched_rows = []
+with open(jsonl_path, "r", encoding="utf-8") as fin:
     for line in fin:
         if not line.strip():
             continue
         row = json.loads(line)
         row = enrich_row(row, csv_idx)
         enriched_rows.append(row)
-        fout.write(json.dumps(row, ensure_ascii=False) + "\n")
+
+with open(output_jsonl_path, "w", encoding="utf-8") as fout:
+    for row in enriched_rows:
+        fout.write(json.dumps(row, ensure_ascii=False) + "\n")

As per coding guidelines, "perform all computations before re-opening files for writing to avoid accidental data loss if code fails during execution".

recipes/noc-reasoning-agent/scripts/utils/reasoning_processes.py-200-212 (1)

200-212: 🛠️ Refactor suggestion | 🟠 Major

Use direct dictionary access instead of .get() — key is guaranteed to exist.

"Service-off" is defined in PROBLEM_CODE_REASONING_PROCESS at line 2, so .get("Service-off", "") masks a potential future error if the key were removed. Direct access ["Service-off"] will fail fast with a clear KeyError.

♻️ Proposed fix
-DEFAULT_REASONING_PROCESS = PROBLEM_CODE_REASONING_PROCESS.get("Service-off", "")
+DEFAULT_REASONING_PROCESS = PROBLEM_CODE_REASONING_PROCESS["Service-off"]

 SYNTHETIC_FAULT_CATEGORY_REASONING = {
-    "power_ac_failure_recovery": PROBLEM_CODE_REASONING_PROCESS.get("Service-off", ""),
-    "power_dc_rectifier_recovery": PROBLEM_CODE_REASONING_PROCESS.get("Service-off", ""),
-    "power_battery_discharge_response": PROBLEM_CODE_REASONING_PROCESS.get("Service-off", ""),
-    "power_generator_failure_recovery": PROBLEM_CODE_REASONING_PROCESS.get("Service-off", ""),
-    "env_high_temperature_response": PROBLEM_CODE_REASONING_PROCESS.get("Service-off", ""),
-    "env_hvac_fault_recovery": PROBLEM_CODE_REASONING_PROCESS.get("Service-off", ""),
-    "env_water_intrusion_response": PROBLEM_CODE_REASONING_PROCESS.get("Service-off", ""),
+    "power_ac_failure_recovery": PROBLEM_CODE_REASONING_PROCESS["Service-off"],
+    "power_dc_rectifier_recovery": PROBLEM_CODE_REASONING_PROCESS["Service-off"],
+    "power_battery_discharge_response": PROBLEM_CODE_REASONING_PROCESS["Service-off"],
+    "power_generator_failure_recovery": PROBLEM_CODE_REASONING_PROCESS["Service-off"],
+    "env_high_temperature_response": PROBLEM_CODE_REASONING_PROCESS["Service-off"],
+    "env_hvac_fault_recovery": PROBLEM_CODE_REASONING_PROCESS["Service-off"],
+    "env_water_intrusion_response": PROBLEM_CODE_REASONING_PROCESS["Service-off"],
 }

As per coding guidelines, "Do not use .get() for accessing dictionary keys if the code expects them to be present; use direct dictionary access dict[key] instead to allow proper error handling and fail fast with clear errors".

recipes/noc-reasoning-agent/scripts/visualization/extract_scores.py-35-42 (1)

35-42: 🛠️ Refactor suggestion | 🟠 Major

File is read and parsed twice; reuse the existing DataFrame.

The JSONL file is loaded identically at lines 9–12 and 36–39, and df from line 15 already holds the same data. Remove the second read and reuse the existing df.

♻️ Proposed fix
-# Load JSONL file into a list
-data = []
-with open(file_path, "r", encoding="utf-8") as f:
-    for line in f:
-        data.append(json.loads(line))
-
-# Convert to DataFrame
-df = pd.DataFrame(data)
-
 # Select only the relevant columns
 columns_to_display = ["expected_answer", "agent_response", "llm_judge_reason"]
recipes/noc-reasoning-agent/scripts/evaluation/evaluation_with_judge.py-121-125 (1)

121-125: ⚠️ Potential issue | 🟠 Major

BERTScore is computed per-row — extremely slow due to repeated model loading.

bert_score() is called once per row (line 122), which reloads the model each time. For a dataset of any meaningful size, this will be orders of magnitude slower than a single batched call. Collect all pairs first and call bert_score() once.

⚡ Sketch of batched approach
# Collect all pairs first
references = []
candidates = []
for index, row in df.iterrows():
    # ... extract generated_reasoning_part and expected content ...
    candidates.append(generated_reasoning_part)
    references.append(conclusion_expected + reasoning_expected)

# Single batched BERTScore call
P, R, F1 = bert_score(candidates, references, lang="en", verbose=True)

# Then iterate again to assemble output rows with F1[i].item()
recipes/noc-reasoning-agent/scripts/visualization/generate_trace_visualization.py-221-293 (1)

221-293: 🛠️ Refactor suggestion | 🟠 Major

File is read twice and contains ~35 lines of commented-out code.

The input file is read manually (lines 224–226) and then again via pd.read_json (line 233). The first read is entirely superseded by the DataFrame path (line 293 overwrites data). Remove the first read. Also, the large commented-out selection_criteria block (lines 246–280) and scattered debug comments should be cleaned up before merging.

recipes/noc-reasoning-agent/scripts/utils/split_incident_data.py-28-29 (1)

28-29: ⚠️ Potential issue | 🟠 Major

O(n) membership checks on lists — use sets for train/test partitions.

train_set and test_set are lists, so number in train_set (line 42) and number in test_set (line 57) are O(n) per lookup. With potentially thousands of incidents across multiple iteration files, this becomes a significant bottleneck.

⚡ Proposed fix
-    train_set = incidents[:n_train]
-    test_set = incidents[n_train:]
+    train_set = set(incidents[:n_train])
+    test_set = set(incidents[n_train:])

Also applies to: 42-42, 57-57

recipes/noc-reasoning-agent/scripts/visualization/generate_trace_visualization.py-192-218 (1)

192-218: ⚠️ Potential issue | 🟠 Major

Unescaped values injected into HTML summary.

Lines 198–199 insert incident_data.get(...) values directly into the HTML without calling esc(), unlike the rest of the template. This is an XSS vector if incident data contains HTML-special characters.

🛡️ Proposed fix
-            <li>Category: {incident_data.get("incident_classification", incident_data.get("category"))}</li>
-            <li>Problem Code: {incident_data.get("fault_category", incident_data.get("u_problem_code"))}</li>
+            <li>Category: {esc(incident_data.get("incident_classification", incident_data.get("category")))}</li>
+            <li>Problem Code: {esc(incident_data.get("fault_category", incident_data.get("u_problem_code")))}</li>
recipes/noc-reasoning-agent/prompts/prompt_incident.yaml-50-51 (1)

50-51: ⚠️ Potential issue | 🟠 Major

\think should be <think> to match the pipeline's expected tag format.

The YAML prompt files use \think, but split_incident_data.py (line 67) injects background = "\n<think>\n" with angle brackets. The training configuration (noc_reasoning_sft.yaml, noc_reasoning_sft_6.yaml) also filters for and expects XML-style <think> and </think> tags (e.g., "<think>thinking</think>summary"). In YAML block scalars, \think is treated as literal text, not an escape sequence, so it won't match the <think> tags injected by the pipeline. This issue also appears in prompt_reasoning.yaml (line 166) and shortened_prompt_reasoning.yaml (line 147).

recipes/noc-reasoning-agent/scripts/utils/create_input_jsonl_from_incidents.py-9-10 (1)

9-10: ⚠️ Potential issue | 🟠 Major

Relative imports require package initialization; this will fail with ImportError.

from .reasoning_processes import ... and from .schema_columns import ... (lines 9-10) perform relative imports, which require the parent directory to be a Python package (with __init__.py). There is no __init__.py in recipes/noc-reasoning-agent/scripts/utils/, scripts/, or any parent directory. Running the script directly with python create_input_jsonl_from_incidents.py will raise ImportError: attempted relative import with no known parent package.

All other scripts in this repository use absolute imports; switch to those or add __init__.py files to make utils/ and scripts/ proper packages.

recipes/noc-reasoning-agent/scripts/filtering/filter_rows.py-93-98 (1)

93-98: 🛠️ Refactor suggestion | 🟠 Major

Duplicated CSV loading with overly broad except Exception — extract a helper.

The try: read_csv(encoding="latin1") except Exception: read_csv() pattern is repeated four times. The broad except Exception catches everything (e.g., FileNotFoundError, MemoryError) when only encoding errors are expected.

Extract a helper and narrow the exception:

Suggested refactor
+def _load_csv(path: str) -> pd.DataFrame:
+    """Load CSV, falling back from latin1 to default encoding."""
+    try:
+        return pd.read_csv(path, encoding="latin1")
+    except UnicodeDecodeError:
+        return pd.read_csv(path)
+
+
 def filter_auto(input_csv, output_csv):
     """Filter incident data and save results to a new CSV."""
-    try:
-        df = pd.read_csv(input_csv, encoding="latin1")
-    except Exception:
-        df = pd.read_csv(input_csv)
+    df = _load_csv(input_csv)

As per coding guidelines, "Do not catch exceptions when they are not normally expected to be raised; let code fail with clear errors instead of silently misbehaving."

Also applies to: 116-121, 131-135, 147-151

recipes/noc-reasoning-agent/scripts/tools.py-13-20 (1)

13-20: ⚠️ Potential issue | 🟠 Major

Check_Alarm_Status is missing the _safe_tool_val wrapper, unlike every other tool function.

All other tool functions pass through _safe_tool_val() to sanitize NaN/None values, but Check_Alarm_Status returns the raw .get() result directly. If the row contains NaN for the "Check_Alarm_Status" key, .get() returns the NaN value (since the key exists), which will not be JSON-safe.

 def Check_Alarm_Status(row: pd.Series, site_or_element_id: str) -> str:
     ...
-    return row.get("Check_Alarm_Status", "NotApplicable")
+    return _safe_tool_val(row.get("Check_Alarm_Status", "NotApplicable"))
recipes/noc-reasoning-agent/scripts/ns_pipelines/prepare_react_agent.py-30-30 (1)

30-30: ⚠️ Potential issue | 🟠 Major

Replace assert with a proper check for runtime validation.

assert statements are stripped in optimized mode (python -O), so this validation would silently disappear. Use a conditional raise instead.

-    assert len(tool_calls) == len(tool_response)
+    if len(tool_calls) != len(tool_response):
+        raise ValueError(
+            f"Mismatched tool_call/tool_response counts: {len(tool_calls)} vs {len(tool_response)}"
+        )

As per coding guidelines, "ensure errors never pass silently unless explicitly silenced."

recipes/noc-reasoning-agent/scripts/create_agent_with_tools.py-28-35 (1)

28-35: ⚠️ Potential issue | 🟠 Major

Silent except Exception: continue swallows all errors during resume loading.

This catches and silently ignores any exception (including unexpected ones like PermissionError, MemoryError, etc.) while parsing the output file. Per the coding guidelines, don't catch exceptions not normally expected. The batch variant (create_agent_with_tools_batch.py line 69) correctly catches only json.JSONDecodeError.

-            except Exception:
-                # ignore malformed lines so we can keep going
-                continue
+            except json.JSONDecodeError:
+                print(f"[WARNING] Skipping malformed line in output file: {line}")

As per coding guidelines, "Do not catch exceptions when they are not normally expected to be raised; let code fail with clear errors instead of silently misbehaving."

recipes/noc-reasoning-agent/scripts/create_agent_with_tools.py-92-211 (1)

92-211: ⚠️ Potential issue | 🟠 Major

Heavy objects recreated on every loop iteration — significant performance waste.

llm_with_tools, MemorySaver(), and create_react_agent(...) are all created inside the per-row loop (lines 209-211). The tool closures are also redefined on every iteration (lines 102-205). Since the model and LLM wrappers don't change between rows, the agent and tool binding should be constructed once before the loop.

The tool functions can capture row via a factory function to avoid the B023 closure issue flagged by static analysis, while still being defined once.

Suggested restructuring (sketch)
+    # Build agent once outside the loop
+    checkpointer = MemorySaver()
+
     for index, row in tqdm(df.iterrows()):
         if index in processed_indices:
             continue
 
         input_question = row.get("input", "")
         print(f"Processing input {index + 1}/{len(df)}")
 
-        `@tool`
-        def Check_Alarm_Status(site_or_element_id: str) -> str:
-            ...
-        # ... all tool definitions ...
-
-        tools = [...]
-        llm_with_tools = chat_llm.bind_tools(tools)
-        checkpointer = MemorySaver()
-        react_agent = create_react_agent(llm_with_tools, tools, checkpointer=checkpointer)
+        # Define row-bound tools using a factory or partial
+        tools = _make_tools(row)
+        llm_with_tools = chat_llm.bind_tools(tools)
+        react_agent = create_react_agent(llm_with_tools, tools, checkpointer=checkpointer)

Note: Reusing the MemorySaver outside the loop is safe; the thread_id in config already isolates per-row state.

recipes/noc-reasoning-agent/scripts/utils/format_reasoning_json.py-10-12 (1)

10-12: ⚠️ Potential issue | 🟠 Major

_incident_id can silently return None, corrupting downstream data.

If neither incident_identifier nor number is present, this returns None. That None is then used as a dictionary key throughout (responses[number], formatted_steps_taken[number], full_steps[number]), silently merging unrelated incidents under a single None key. This is a data integrity issue.

Proposed fix
 def _incident_id(data):
     """Synthetic schema uses incident_identifier; legacy uses number."""
-    return data.get("incident_identifier") or data.get("number")
+    incident_id = data.get("incident_identifier") or data.get("number")
+    if incident_id is None:
+        raise ValueError(f"Missing incident identifier in data: {list(data.keys())}")
+    return incident_id
recipes/noc-reasoning-agent/scripts/utils/format_reasoning_json.py-63-69 (1)

63-69: 🛠️ Refactor suggestion | 🟠 Major

Bare Exception catch silently swallows unexpected errors.

Line 66 catches all exceptions and just prints them, masking bugs in the parsing logic. Per coding guidelines, exceptions that are not normally expected should not be caught — let them propagate so issues surface clearly. The outer JSONDecodeError catch on line 68 is fine.

Proposed fix — remove the bare catch
                 except json.JSONDecodeError as e:
                     print(text)
                     print(f"Error decoding JSON: {e}")
-                except Exception as e:
-                    print(f"An unexpected error occurred: {e}")

As per coding guidelines, "Do not catch exceptions when they are not normally expected to be raised; let code fail with clear errors instead of silently misbehaving".

recipes/noc-reasoning-agent/scripts/utils/format_reasoning_json.py-314-323 (1)

314-323: 🛠️ Refactor suggestion | 🟠 Major

Broad except Exception silently discards incidents on any error.

This catches and suppresses all errors — including programming bugs in token_converting or data access issues — and quietly removes the affected incidents from the output. This makes debugging very difficult and violates the fail-fast principle.

Consider catching only the specific expected exception (e.g., KeyError, ValueError) or at minimum, logging the incident data that caused the failure.

Proposed fix
             try:
                 for i in range(len(steps_taken[number])):
                     if steps_taken[number][i]["tool_call"]:
                         steps_taken[number][i]["tool_call"] = token_converting(
                             steps_taken[number][i]["tool_call"], model
                         )
                     steps_taken[number][i]["thinking"] = reasoning_steps[number][steps_taken[number][i]["step_number"]]
-            except Exception as e:
-                print(e)
+            except (KeyError, ValueError) as e:
+                print(f"Error merging steps for incident {number}: {e}")
                 broken_numbers.append(number)

As per coding guidelines, "Do not catch exceptions when they are not normally expected to be raised; let code fail with clear errors instead of silently misbehaving".

recipes/noc-reasoning-agent/scripts/utils/format_reasoning_json.py-445-448 (1)

445-448: ⚠️ Potential issue | 🟠 Major

Case C silently drops intermediate non-tool-call steps — likely a bug.

When a step has no tool_call and is not the final step, step_text is computed but never appended to current_assistant_content. The comment on line 447 states "We already added to current_assistant_content at the top of loop" but no such addition exists. This means intermediate reasoning steps are lost from the conversation context, and their thinking/actions won't appear in subsequent background fields.

If these steps should accumulate context, the content needs to be appended:

Proposed fix (if accumulation is intended)
         # --- CASE C: Intermediate Step (just accumulation) ---
         else:
-            # We already added to current_assistant_content at the top of loop
-            pass
+            # Accumulate intermediate thinking into the conversation context
+            current_assistant_content.append(
+                {"role": "assistant", "content": step_text}
+            )
recipes/noc-reasoning-agent/scripts/utils/format_reasoning_json.py-85-87 (1)

85-87: ⚠️ Potential issue | 🟠 Major

Silent data corruption when "final<|message|>" delimiter is missing.

If text does not contain "final<|message|>", rfind returns -1, and the slice becomes text[15:] (i.e., -1 + 16), silently producing garbage data for the thinking content with no warning.

Proposed fix
-            thinking = text[text.rfind("final<|message|>") + len("final<|message|>") :]
+            delimiter = "final<|message|>"
+            idx = text.rfind(delimiter)
+            if idx == -1:
+                print(f"Warning: delimiter '{delimiter}' not found for incident {number}, step {step_number}")
+                continue
+            thinking = text[idx + len(delimiter) :]
recipes/noc-reasoning-agent/scripts/utils/format_reasoning_json.py-476-489 (1)

476-489: 🛠️ Refactor suggestion | 🟠 Major

Same broad except Exception pattern — masks bugs during compilation.

Same issue as in merge_reasoning_steps. This silently counts programming errors as "incorrect incidents" and continues, making it hard to distinguish data issues from code bugs.

Proposed fix
-                except Exception as e:
-                    print(f"Error for incident {number}: {e}")
+                except (KeyError, ValueError) as e:
+                    print(f"Error for incident {number}: {e}", flush=True)
                     incorrect_incidents += 1

As per coding guidelines, "Do not catch exceptions when they are not normally expected to be raised; let code fail with clear errors instead of silently misbehaving".

🟡 Minor comments (29)
recipes/noc-reasoning-agent/scripts/utils/split_mocktools_answers.py-94-94 (1)

94-94: ⚠️ Potential issue | 🟡 Minor

Output key "problem code" contains a space, inconsistent with all other keys.

All other output keys use underscores (e.g., u_region2, short_description). This looks like an accidental deviation. If intentional (matching downstream expectations), please confirm; otherwise rename to "problem_code".

-                "problem code": d1.get("u_problem_code", ""),
+                "problem_code": d1.get("u_problem_code", ""),
recipes/noc-reasoning-agent/scripts/utils/split_mocktools_answers.py-60-71 (1)

60-71: ⚠️ Potential issue | 🟡 Minor

Same .get("number") and silent-skip concern in file2 loading.

Apply the same fix as suggested for the file1 block above.

Suggested fix
                     d = json.loads(line)
-                    num = d.get("number")
-                    if num:
-                        data2[num] = d.get("generation", "")
-                except json.JSONDecodeError as e:
-                    print(f"Error decoding JSON in file2 on line {line_num}: '{line}'. Error: {e}")
+                    num = d["number"]
+                    data2[num] = d.get("generation", "")
+                except (json.JSONDecodeError, KeyError) as e:
+                    raise ValueError(f"Error processing file2 on line {line_num}: '{line}'. Error: {e}") from e
recipes/noc-reasoning-agent/scripts/utils/split_mocktools_answers.py-46-57 (1)

46-57: ⚠️ Potential issue | 🟡 Minor

Use direct dictionary access for expected keys instead of .get().

Lines 53 and 67 use d.get("number") then silently skip records where the key is missing. If "number" is a required field in these JSONL files, direct access (d["number"]) would fail fast with a clear KeyError, which is preferable. Additionally, the JSONDecodeError catch-and-print on lines 56–57 (and 70–71) silently drops malformed lines, which can mask data corruption.

Suggested fix for file1 loading
                     d = json.loads(line)
-                    num = d.get("number")
-                    if num:
-                        data1[num] = d
-                except json.JSONDecodeError as e:
-                    print(f"Error decoding JSON in file1 on line {line_num}: '{line}'. Error: {e}")
+                    num = d["number"]
+                    data1[num] = d
+                except (json.JSONDecodeError, KeyError) as e:
+                    raise ValueError(f"Error processing file1 on line {line_num}: '{line}'. Error: {e}") from e

As per coding guidelines, "Do not use .get() for accessing dictionary keys if the code expects them to be present" and "Do not catch exceptions when they are not normally expected to be raised; let code fail with clear errors instead of silently misbehaving."

recipes/noc-reasoning-agent/scripts/visualization/extract_representation_columns.py-60-64 (1)

60-64: ⚠️ Potential issue | 🟡 Minor

Misleading section comment: says solved_category but code uses resolution_method / close_code.

Line 60 says "Bucket by solved_category" but line 63 derives the category from resolution_method or close_code, not solved_category. This will confuse future readers.

Proposed fix
-# ---------- Bucket by solved_category ----------
+# ---------- Bucket by resolution_method / close_code ----------
recipes/noc-reasoning-agent/scripts/visualization/extract_representation_columns.py-32-44 (1)

32-44: ⚠️ Potential issue | 🟡 Minor

Use direct dictionary access for guaranteed keys; validate num is not None.

Two issues in enrich_row:

  1. Line 33: If the row contains neither "incident_identifier" nor "number", num silently becomes None, and the lookup silently fails — producing records with None fields and no indication of a data problem. This should raise an error.

  2. Lines 41-43: info is sourced from csv_idx (built on line 28 from the validated columns time_to_resolve, solved_category, solved_reason), so these keys are guaranteed to exist. Per coding guidelines, use direct dictionary access instead of .get().

Proposed fix
 def enrich_row(row, lookup):
-    num = row.get("incident_identifier", row.get("number"))
+    if "incident_identifier" in row:
+        num = row["incident_identifier"]
+    elif "number" in row:
+        num = row["number"]
+    else:
+        raise KeyError(f"Row missing both 'incident_identifier' and 'number': {row}")
     info = lookup.get(num)
     if info is None:
-        # No match: set to None (or choose sensible defaults)
         row["time_to_resolve"] = None
-        row["solved_category"] = row.get("solved_category")  # preserve if already present
-        row["solved_reason"] = row.get("solved_reason")
+        row["solved_category"] = row.get("solved_category")
+        row["solved_reason"] = row.get("solved_reason")
     else:
-        row["time_to_resolve"] = info.get("time_to_resolve")
-        row["solved_category"] = info.get("solved_category")
-        row["solved_reason"] = info.get("solved_reason")
+        row["time_to_resolve"] = info["time_to_resolve"]
+        row["solved_category"] = info["solved_category"]
+        row["solved_reason"] = info["solved_reason"]
     return row

As per coding guidelines, "Do not use .get() for accessing dictionary keys if the code expects them to be present; use direct dictionary access dict[key] instead to allow proper error handling and fail fast with clear errors" and "ensure errors never pass silently unless explicitly silenced".

.gitignore-52-56 (1)

52-56: ⚠️ Potential issue | 🟡 Minor

Remove duplicate site/ entries.

Five identical site/ ignore patterns were added. Git only needs one entry per pattern, so the duplicates serve no purpose and suggest an accidental copy-paste or merge error.

🧹 Proposed fix to remove duplicates
 .idea
 site/
-site/
-site/
-site/
-site/
recipes/noc-reasoning-agent/configs/config.ini-1-10 (1)

1-10: ⚠️ Potential issue | 🟡 Minor

Config references appear stale relative to the tutorial.

The [download] section lists qwen2.5-14, qwen2.5-32, gpt-oss-120b, and nemotron-49b-1.5, but the tutorial (noc-reasoning-agent.md) trains and evaluates with Qwen3-32B — which is absent here. Similarly, original_data_path points to data/anonymized-Incidents_Last_6_Months.csv while the tutorial instructs users to place data as data/synthetic_incidents.csv. If this config is meant to be consumed by scripts in this recipe, the mismatches could confuse users or break automation.

recipes/noc-reasoning-agent/prompts/formatting_prompt.yaml-37-37 (1)

37-37: ⚠️ Potential issue | 🟡 Minor

Malformed sentence in Create_Ticket description.

"Logs and routes tickets to departments (e.g., Field Ops) via the ticketing system. It (e.g., Jira/ServiceNow)." — the second sentence is a fragment with a dangling "It" and double space. This is part of the prompt the model will see.

docs/tutorials/posts/noc-reasoning-agent.md-1-4 (1)

1-4: ⚠️ Potential issue | 🟡 Minor

Incorrect or backdated publication date.

date: 2025-09-05 predates the PR creation (Feb 2026). If this is intentional for ordering purposes, a comment would help; otherwise, update to the actual publication date.

recipes/noc-reasoning-agent/scripts/utils/reasoning_processes.py-51-51 (1)

51-51: ⚠️ Potential issue | 🟡 Minor

Typo in tool name: Check_Performancel()Check_Performance().

Extra trailing "l" in the "Degraded Prach" workflow at step 7. This text is injected into prompts and could confuse the model's tool-calling behavior.

docs/tutorials/posts/noc-reasoning-agent.md-352-365 (1)

352-365: ⚠️ Potential issue | 🟡 Minor

Missing expected evaluation results for tested models.

The tutorial says "You should see a meaningful improvement" but doesn't provide concrete accuracy numbers or expected ranges. Including representative results (e.g., baseline vs. fine-tuned close-code accuracy on a reference dataset) would make the tutorial more actionable and help users validate their setup. As per coding guidelines, "When adding new benchmarks, add documentation with example commands for how to run evaluation, expected results for tested models, and any dataset-specific details".

recipes/noc-reasoning-agent/prompts/formatting_prompt.yaml-17-17 (1)

17-17: ⚠️ Potential issue | 🟡 Minor

Stray character in tool description.

Line 17 ends with a dangling "I" after the period: "...cell status). I". This will be seen by the model and may degrade prompt quality.

✏️ Proposed fix
-  - Purpose: Polls to monitor the system, EMS/NM/Telemetry, and retrieve device health metrics (e.g., RU radiation, cell status). I
+  - Purpose: Polls to monitor the system, EMS/NM/Telemetry, and retrieve device health metrics (e.g., RU radiation, cell status).
docs/tutorials/posts/noc-reasoning-agent.md-229-244 (1)

229-244: ⚠️ Potential issue | 🟡 Minor

Typo: "indicence" → "incidence".

--log_dir=/workspace/prepare-sft-data-indicence (line 231) and the equivalent at line 296 both contain the same typo.

✏️ Proposed fix
-    --log_dir=/workspace/prepare-sft-data-indicence \
+    --log_dir=/workspace/prepare-sft-data-incidence \
docs/tutorials/posts/noc-reasoning-agent.md-96-118 (1)

96-118: ⚠️ Potential issue | 🟡 Minor

In-place file overwriting risks data loss.

The second, third, and fourth filter_rows.py calls use the same path for both --input_csv and --output_csv (data/filtered_file.csv). If the script reads the CSV lazily or fails mid-write, data is lost with no backup. Consider using distinct intermediate filenames or documenting that the script fully reads before writing.

recipes/noc-reasoning-agent/scripts/visualization/extract_scores.py-44-53 (1)

44-53: ⚠️ Potential issue | 🟡 Minor

Missing guard for display columns — will crash with KeyError if columns are absent.

The metrics section (lines 20–24) defensively checks which columns exist, but columns_to_display is accessed directly without validation. If the JSONL doesn't contain one of ["expected_answer", "agent_response", "llm_judge_reason"], this will raise an unguarded KeyError.

🛡️ Proposed fix
 columns_to_display = ["expected_answer", "agent_response", "llm_judge_reason"]
+available_display = [c for c in columns_to_display if c in df.columns]
+if not available_display:
+    print("No display columns found; skipping sample output.")
+else:
-df_subset = df[columns_to_display].head(10)
+    df_subset = df[available_display].head(10)
recipes/noc-reasoning-agent/scripts/utils/split_incident_data.py-58-62 (1)

58-62: ⚠️ Potential issue | 🟡 Minor

Debug prints left in production code.

Lines 59 (print(row["response"])), 60 (# exit()), and 62 (print("in")) are debug artifacts that will produce excessive output and should be removed.

recipes/noc-reasoning-agent/prompts/prompt_incident.yaml-48-48 (1)

48-48: ⚠️ Potential issue | 🟡 Minor

Close code list contains both "Reset RU" and "RU Reset".

These appear to be duplicates with reversed word order. This could confuse the model during inference. Consider keeping only one canonical form.

recipes/noc-reasoning-agent/scripts/utils/create_input_jsonl_from_incidents.py-144-144 (1)

144-144: ⚠️ Potential issue | 🟡 Minor

Debug print(required_incidents) left in.

This prints the full list of required incident IDs to stdout, which is noisy and potentially leaks data in production pipelines.

recipes/noc-reasoning-agent/scripts/evaluation/evaluation_with_judge.py-148-149 (1)

148-149: ⚠️ Potential issue | 🟡 Minor

Misleading success message.

Line 149 prints "Evaluations saved to evaluations.jsonl" regardless of the actual output path. Use the actual path:

🐛 Fix
-print("Evaluations saved to evaluations.jsonl")
+print(f"Evaluations saved to {args.output_file}")
recipes/noc-reasoning-agent/scripts/utils/split_incident_data.py-96-99 (1)

96-99: ⚠️ Potential issue | 🟡 Minor

--preview argument is parsed but never used.

The preview argument (line 97) is accepted but has no effect. Either implement the preview functionality or remove the argument.

As per coding guidelines, "Avoid silently ignoring unused user-passed parameters; the code should fail if a required argument is not specified or if unsupported arguments are provided."

recipes/noc-reasoning-agent/scripts/filtering/match_keywords.py-254-261 (1)

254-261: ⚠️ Potential issue | 🟡 Minor

Unused pattern parameter – compiled regex patterns are dead code.

find_keyword_matches accepts a pattern argument (the compiled regex) but never uses it. The regex patterns compiled on lines 281-282 are passed into find_keyword_matches on lines 290 and 296 but silently discarded. The function only performs an exact in check against the keywords list.

Either remove the pattern parameter and the regex compilation, or refactor the function to actually use the regex for substring matching (which would change semantics — matching keywords within a field vs. exact match).

Option A: Remove unused pattern (keeps current exact-match behavior)
-def find_keyword_matches(row, pattern, keywords):
+def find_keyword_matches(row, keywords):
     """Finds which specific keywords from a list match within a DataFrame row."""
     # Synthetic schema uses resolution_method (same meaning as close_code)
     resolution_method = row.get("resolution_method", row.get("close_code", ""))
     if resolution_method and str(resolution_method).lower().replace(" ", "") in keywords:
         return True
 
     return False

And update the call sites:

-    hw_pattern = re.compile("|".join(re.escape(k) for k in hw_keywords), flags=re.IGNORECASE)
-    sw_pattern = re.compile("|".join(re.escape(k) for k in sw_keywords), flags=re.IGNORECASE)
-
     ...
-        hw_matches = find_keyword_matches(row, hw_pattern, hw_keywords)
+        hw_matches = find_keyword_matches(row, hw_keywords)
         ...
-        sw_matches = find_keyword_matches(row, sw_pattern, sw_keywords)
+        sw_matches = find_keyword_matches(row, sw_keywords)

Also applies to: 280-296

recipes/noc-reasoning-agent/scripts/create_agent_with_tools.py-46-46 (1)

46-46: ⚠️ Potential issue | 🟡 Minor

Missing required=True on --input argument.

--input has no default and no required=True, so args.input will be None if omitted, causing a TypeError on Path(args.input) at line 59.

-    parser.add_argument("--input", help="Path to the JSONL file containing inputs and answers")
+    parser.add_argument("--input", required=True, help="Path to the JSONL file containing inputs and answers")

As per coding guidelines, "the code should fail if a required argument is not specified."

recipes/noc-reasoning-agent/scripts/create_agent_with_tools_batch.py-126-126 (1)

126-126: ⚠️ Potential issue | 🟡 Minor

--limit argument lacks type=int, will fail with a confusing error on non-numeric input.

args.limit is a string by default. Line 141 does int(args.limit), which raises ValueError for non-numeric input. Declare the type on the argument:

-    parser.add_argument("--limit", required=False)
+    parser.add_argument("--limit", type=int, required=False, default=None)
recipes/noc-reasoning-agent/configs/noc_reasoning_sft.yaml-58-58 (1)

58-58: ⚠️ Potential issue | 🟡 Minor

Comment contradicts the value: remove_no_think_tags: false but comment says "Enabled, as requested".

The value is false (disabled), but the comment reads "Enabled, as requested". This is confusing and likely a copy-paste error. If the filter should be enabled, change to true; otherwise, fix the comment.

-  remove_no_think_tags: false         # Enabled, as requested
+  remove_no_think_tags: true          # Enabled, as requested
recipes/noc-reasoning-agent/scripts/filtering/filter_rows.py-180-188 (1)

180-188: ⚠️ Potential issue | 🟡 Minor

Invalid filter_type silently does nothing.

If the user passes --filter_type=foo, none of the elif branches match and the script exits silently with no output. Per coding guidelines, unsupported arguments should cause a failure.

Suggested fix
     elif args.filter_type == "close_codes":
         filter_close_codes(args.input_csv, args.output_csv)
+    else:
+        parser.error(f"Unknown filter_type: {args.filter_type!r}. Choose from: auto, soft_solve, problem_codes, close_codes")

Or better, use choices on the argument itself:

-    parser.add_argument("--filter_type", type=str, default="auto")
+    parser.add_argument("--filter_type", type=str, default="auto",
+                         choices=["auto", "soft_solve", "problem_codes", "close_codes"])

As per coding guidelines, "Avoid silently ignoring unused user-passed parameters; the code should fail if a required argument is not specified or if unsupported arguments are provided."

recipes/noc-reasoning-agent/scripts/ns_pipelines/prepare_react_agent.py-22-52 (1)

22-52: ⚠️ Potential issue | 🟡 Minor

get_tools has an inconsistent return type: tuple (None, None) vs. dict.

On the error path (line 50), the function returns (None, None) (a 2-tuple), but on the success path (line 52), it returns matches (a dict). The caller on line 73 assigns the result to a single variable and compares with == (None, None), which works incidentally but is fragile and confusing.

Return None on the empty case and check if matches is None at the call site.

Suggested fix
     if not matches:
-        # print("No tools!")
-        return None, None
-    # print(matches)
+        return None
     return matches

And update the caller:

-                        matches = get_tools(d["initial_background"])
-                        if matches == (None, None):
+                        matches = get_tools(d["initial_background"])
+                        if matches is None:
recipes/noc-reasoning-agent/scripts/utils/format_reasoning_json.py-130-130 (1)

130-130: ⚠️ Potential issue | 🟡 Minor

Typo in user-facing message: "disgarded" → "discarded".

-    print(f"{incorrect_incidents} incidents were not parsed correctly and disgarded.")
+    print(f"{incorrect_incidents} incidents were not parsed correctly and discarded.")
recipes/noc-reasoning-agent/scripts/utils/format_reasoning_json.py-46-52 (1)

46-52: ⚠️ Potential issue | 🟡 Minor

Off-by-one in end_index check: ]-not-found case passes the guard.

When "]" is absent, rfind returns -1, so end_index becomes 0 after the + 1. The guard end_index != -1 on line 52 evaluates to True, allowing a nonsensical slice text[start_index:0] (empty string) that falls through to a JSONDecodeError instead of the intended error message on line 61.

Proposed fix
-                    end_index = text.rfind("]") + 1
+                    end_index_raw = text.rfind("]")
+                    end_index = end_index_raw + 1

-                    if start_index != -1 and end_index != -1:
+                    if start_index != -1 and end_index_raw != -1:
recipes/noc-reasoning-agent/scripts/utils/format_reasoning_json.py-505-511 (1)

505-511: ⚠️ Potential issue | 🟡 Minor

Missing validation: output_dir and output_file can be None for their respective modes.

When parse_types == "compile_reasoning", output_dir is passed to compile_reasoning which calls os.makedirs(output_dir, ...) — this crashes with TypeError if None. Similarly, output_file can be None for steps_extraction. Add upfront validation like the existing check for reasoning_jsonl.

Proposed fix
 def main(jsonl_file, input_file, output_file, parse_types, reasoning_jsonl=None, output_dir=None):
     if parse_types == "steps_extraction":
+        if not output_file:
+            raise ValueError("Please specify an output file by specifying --output")
         prepare_data_for_reasoning_traces(jsonl_file, input_file, output_file)
     elif parse_types == "compile_reasoning":
         if not reasoning_jsonl:
             raise ValueError("Please specify a reasoning jsonl file by specifying --reasoning_jsonl")
+        if not output_dir:
+            raise ValueError("Please specify an output directory by specifying --output_dir")
         compile_reasoning(jsonl_file, input_file, output_dir, reasoning_jsonl)

Comment on lines +132 to +135
skip = True
if skip and os.path.exists(output_path):
os.remove(output_path)
processed_indices = load_processed_indices(output_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Hardcoded skip = True deletes the output file on every run, completely breaking resume logic.

Lines 132-134 unconditionally delete the output file before load_processed_indices is called on line 135. This means the resume logic (lines 57-71, 144-148) will never find any previously processed rows — the file is always gone. All the resume infrastructure becomes dead code.

This looks like a debug artifact that was accidentally left in.

-    skip = True
-    if skip and os.path.exists(output_path):
-        os.remove(output_path)
     processed_indices = load_processed_indices(output_path)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
skip = True
if skip and os.path.exists(output_path):
os.remove(output_path)
processed_indices = load_processed_indices(output_path)
processed_indices = load_processed_indices(output_path)
🤖 Prompt for AI Agents
In `@recipes/noc-reasoning-agent/scripts/create_agent_with_tools_batch.py` around
lines 132 - 135, The code currently hardcodes skip = True which causes deletion
of output_path every run and breaks resume logic; change this so skip is not
hardcoded—make it a CLI flag, environment variable, or default False—and only
call os.remove(output_path) when skip is explicitly set by the user; ensure the
block around skip, os.path.exists(output_path), and os.remove(output_path) is
gated by the configurable skip variable and leave
load_processed_indices(output_path) unchanged so resume logic
(load_processed_indices / processed_indices) can work when skip is False.

Comment on lines +156 to +157
model = AutoModelForCausalLM.from_pretrained(
args.weights_dir, dtype=dtype, device_map=device_map, low_cpu_mem_usage=True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# Find and read the batch variant file
fd "create_agent_with_tools_batch.py" --type f

Repository: NVIDIA-NeMo/Skills

Length of output: 130


🏁 Script executed:

# Find the non-batch variant for comparison
fd "create_agent_with_tools.py" --type f

Repository: NVIDIA-NeMo/Skills

Length of output: 124


🌐 Web query:

AutoModelForCausalLM.from_pretrained torch_dtype parameter transformers library

💡 Result:

AutoModelForCausalLM.from_pretrained(..., torch_dtype=...) (Hugging Face Transformers) controls the PyTorch floating-point dtype the model is instantiated/loaded into, which can significantly reduce memory usage for large LMs.

Common values:

  • A specific dtype (e.g. torch.float16, torch.bfloat16, torch.float32) to force loading in that dtype. Example: AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16).[1]
  • "auto" to automatically choose a dtype based on the checkpoint/model metadata (the docs describe it as deriving dtype automatically, commonly from the weights / config where available). Example: AutoModelForCausalLM.from_pretrained(..., torch_dtype="auto", device_map="auto").[2]

Notes:

  • This is only available for floating dtypes (per the model docs’ “Model Instantiation dtype” section).[1]
  • It’s often paired with device_map="auto" when loading big models to fit across available devices.[2]

Sources:
[1] Transformers “Models” / “Model Instantiation dtype” documentation (examples using torch_dtype=torch.float16 and "auto", and noting floating dtypes only). (huggingface.co)
[2] Transformers “Loading models” guide showing AutoModelForCausalLM.from_pretrained(..., torch_dtype="auto", device_map="auto"). (huggingface.co)

Citations:


🏁 Script executed:

cat -n recipes/noc-reasoning-agent/scripts/create_agent_with_tools_batch.py | sed -n '150,165p'

Repository: NVIDIA-NeMo/Skills

Length of output: 921


🏁 Script executed:

cat -n recipes/noc-reasoning-agent/scripts/create_agent_with_tools.py | sed -n '70,80p'

Repository: NVIDIA-NeMo/Skills

Length of output: 499


Incorrect parameter name: dtype= should be torch_dtype=.

AutoModelForCausalLM.from_pretrained() uses torch_dtype as the parameter name, not dtype. Using dtype will silently ignore the parameter, and the model will load in the default precision (float32) instead of the intended bfloat16/float16, causing significant memory and performance degradation.

The non-batch variant (create_agent_with_tools.py line 74) correctly uses torch_dtype=torch.float16.

Fix
     model = AutoModelForCausalLM.from_pretrained(
-        args.weights_dir, dtype=dtype, device_map=device_map, low_cpu_mem_usage=True
+        args.weights_dir, torch_dtype=dtype, device_map=device_map, low_cpu_mem_usage=True
     )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
model = AutoModelForCausalLM.from_pretrained(
args.weights_dir, dtype=dtype, device_map=device_map, low_cpu_mem_usage=True
model = AutoModelForCausalLM.from_pretrained(
args.weights_dir, torch_dtype=dtype, device_map=device_map, low_cpu_mem_usage=True
)
🤖 Prompt for AI Agents
In `@recipes/noc-reasoning-agent/scripts/create_agent_with_tools_batch.py` around
lines 156 - 157, The call to AutoModelForCausalLM.from_pretrained in
create_agent_with_tools_batch.py is using the wrong parameter name (dtype=) so
the intended precision is ignored; update the call to pass the dtype via
torch_dtype (e.g., torch_dtype=dtype) so the model loads in the specified
bfloat16/float16 precision, ensuring you modify the
AutoModelForCausalLM.from_pretrained(...) invocation accordingly.

# Parse arguments for input JSONL path
parser = argparse.ArgumentParser(description="Evaluation Pipeline for Agent Responses")
parser.add_argument("input_jsonl", help="Path to agent_responses.jsonl containing expected_answer and agent_response")
parser.add_argument("--output_file", help="Path to output")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

--output_file is neither required nor has a default — will crash at line 148.

If --output_file is omitted, args.output_file is None, and pd.DataFrame.to_json(None, ...) will raise a ValueError. Add required=True or a sensible default.

🐛 Proposed fix
-    parser.add_argument("--output_file", help="Path to output")
+    parser.add_argument("--output_file", required=True, help="Path to output JSONL")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
parser.add_argument("--output_file", help="Path to output")
parser.add_argument("--output_file", required=True, help="Path to output JSONL")
🤖 Prompt for AI Agents
In `@recipes/noc-reasoning-agent/scripts/evaluation/evaluation_with_judge.py` at
line 12, The parser.add_argument("--output_file", help="Path to output") can
leave args.output_file as None and cause pd.DataFrame.to_json(args.output_file,
...) to raise a ValueError; update the argument declaration
(parser.add_argument) to either set required=True or provide a sensible default
(e.g., default="evaluation_results.json"), and/or add a guard before writing
that validates args.output_file is not None and raises a clear error message;
ensure references to args.output_file and the DataFrame.to_json call are updated
accordingly.

Comment on lines +153 to +166
m = re.match(r"^\s*([A-Za-z_]\w*)\s*\[(.*)\]\s*$", str(string), re.DOTALL)

if not m:
# Handle case with no arguments, e.g., Check_Time[]
m_no_args = re.match(r"^\s*([A-Za-z_]\w*)\s*\[\s*\]\s*$", str(string))
if m_no_args:
tool_name = m_no_args.group(1)
raw_args = ""
else:
# If it doesn't match the syntax, return original string or raise error
# returning string allows the LLM to fail gracefully or retry
return string

tool_name, raw_args = m.groups()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Unreachable branch + latent crash: m.groups() on None if the dead-code path were taken.

The first regex \[(.*)\] with re.DOTALL already matches empty brackets Tool[], making the m_no_args branch (lines 157-160) dead code. More critically, if it were reached (e.g., due to a future regex change), execution falls through to line 166 which calls m.groups() on None — an AttributeError.

Restructure to eliminate the dead path and the latent crash:

Proposed fix
     m = re.match(r"^\s*([A-Za-z_]\w*)\s*\[(.*)\]\s*$", str(string), re.DOTALL)

     if not m:
-        # Handle case with no arguments, e.g., Check_Time[]
-        m_no_args = re.match(r"^\s*([A-Za-z_]\w*)\s*\[\s*\]\s*$", str(string))
-        if m_no_args:
-            tool_name = m_no_args.group(1)
-            raw_args = ""
-        else:
-            # If it doesn't match the syntax, return original string or raise error
-            # returning string allows the LLM to fail gracefully or retry
-            return string
-
-    tool_name, raw_args = m.groups()
+        return string
+
+    tool_name, raw_args = m.groups()
🤖 Prompt for AI Agents
In `@recipes/noc-reasoning-agent/scripts/utils/format_reasoning_json.py` around
lines 153 - 166, The code currently uses two regex matches (m and m_no_args) and
then unconditionally calls m.groups(), which creates dead code and a latent
crash; instead, after performing the initial match m =
re.match(r"^\s*([A-Za-z_]\w*)\s*\[(.*)\]\s*$", ... , re.DOTALL), check if m is
truthy and if not immediately return the original string (or handle the error),
and only then call m.groups() to assign tool_name and raw_args; remove the
separate m_no_args branch entirely so empty brackets are handled by the first
regex and m.groups() is always called on a valid match.

Comment on lines +207 to +276
if __name__ == "__main__":
# === Configuration ===
# csv_file = "data/top_10_categorized_incidents.csv" # Path to your CSV
csv_file = "data/prompt_incident.yaml"
column_name = "action_chronicle" # default; overridden after loading data
model = "openai/gpt-oss-120b" # Tokenizer model
output_csv = "outputs/work_notes_token_lengths.csv"
output_plot = "outputs/work_notes_token_hist.png"

# === Load Tokenizer ===
tokenizer = AutoTokenizer.from_pretrained(model)
print(f"[info] Tokenizer loaded for model: {model}")

if csv_file.endswith((".yaml", ".yml")):
with open(csv_file, "r", encoding="utf-8") as f:
yaml_data = yaml.safe_load(f)

# You might want to join everything into one big string, or tokenize each value
if isinstance(yaml_data, dict):
prompt_text = yaml.dump(yaml_data)
else:
prompt_text = str(yaml_data)

token_count = len(tokenizer.encode(prompt_text, add_special_tokens=False))
print(f"[info] YAML file {csv_file} contains {token_count} tokens.")
exit(0)

# === Load CSV ===
df = pd.read_csv(csv_file, encoding="utf-8")
if column_name not in df.columns:
raise ValueError(f"Column '{column_name}' not found in CSV. Available columns: {df.columns.tolist()}")

# === Count Tokens ===
token_lengths = []
zero_tokens = 0
for idx, text in tqdm(enumerate(df[column_name]), total=len(df), desc="Tokenizing work_notes"):
if pd.isna(text): # Handle NaN values
token_lengths.append(0)
zero_tokens += 1
continue
# Count tokens for this row
token_count = len(tokenizer.encode(str(text), add_special_tokens=False))
token_lengths.append(token_count)

# Count rows with 0 tokens
print(f"\n[info] Number of rows with 0 tokens: {zero_tokens}")

# Add results back to DataFrame
df["work_notes_token_length"] = token_lengths

# === Save CSV with token counts ===
df.to_csv(output_csv, index=False)
print(f"[info] Token lengths saved to {output_csv}")

# === Summary Statistics ===
print("\n=== Token Length Summary ===")
print(df["work_notes_token_length"].describe())

# === Plot Histogram ===
plt.figure(figsize=(10, 6))
plt.hist(df["work_notes_token_length"], bins=50)
plt.title(f"Distribution of Token Lengths for {column_name}")
plt.xlabel("Token Length")
plt.ylabel("Frequency")
plt.tight_layout()
plt.savefig(output_plot, dpi=150)
plt.close()
print(f"[info] Histogram saved to {output_plot}")
exit()
main()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

main() is unreachable — debug code left in __main__ block.

exit() on line 275 terminates the process before main() on line 276 is ever called. The entire __main__ block (lines 207–275) appears to be leftover development code with hardcoded paths (data/prompt_incident.yaml, openai/gpt-oss-120b) and duplicated logic. It also bypasses the defensive AutoTokenizer import guard — line 217 will crash with TypeError if transformers is not installed.

Replace the block with a clean entry point:

🐛 Proposed fix
 if __name__ == "__main__":
-    # === Configuration ===
-    # csv_file = "data/top_10_categorized_incidents.csv"  # Path to your CSV
-    csv_file = "data/prompt_incident.yaml"
-    column_name = "action_chronicle"  # default; overridden after loading data
-    model = "openai/gpt-oss-120b"  # Tokenizer model
-    output_csv = "outputs/work_notes_token_lengths.csv"
-    output_plot = "outputs/work_notes_token_hist.png"
-
-    # === Load Tokenizer ===
-    tokenizer = AutoTokenizer.from_pretrained(model)
-    print(f"[info] Tokenizer loaded for model: {model}")
-
-    if csv_file.endswith((".yaml", ".yml")):
-        with open(csv_file, "r", encoding="utf-8") as f:
-            yaml_data = yaml.safe_load(f)
-
-        # You might want to join everything into one big string, or tokenize each value
-        if isinstance(yaml_data, dict):
-            prompt_text = yaml.dump(yaml_data)
-        else:
-            prompt_text = str(yaml_data)
-
-        token_count = len(tokenizer.encode(prompt_text, add_special_tokens=False))
-        print(f"[info] YAML file {csv_file} contains {token_count} tokens.")
-        exit(0)
-
-    # === Load CSV ===
-    df = pd.read_csv(csv_file, encoding="utf-8")
-    if column_name not in df.columns:
-        raise ValueError(f"Column '{column_name}' not found in CSV. Available columns: {df.columns.tolist()}")
-
-    # === Count Tokens ===
-    token_lengths = []
-    zero_tokens = 0
-    for idx, text in tqdm(enumerate(df[column_name]), total=len(df), desc="Tokenizing work_notes"):
-        if pd.isna(text):  # Handle NaN values
-            token_lengths.append(0)
-            zero_tokens += 1
-            continue
-        # Count tokens for this row
-        token_count = len(tokenizer.encode(str(text), add_special_tokens=False))
-        token_lengths.append(token_count)
-
-    # Count rows with 0 tokens
-    print(f"\n[info] Number of rows with 0 tokens: {zero_tokens}")
-
-    # Add results back to DataFrame
-    df["work_notes_token_length"] = token_lengths
-
-    # === Save CSV with token counts ===
-    df.to_csv(output_csv, index=False)
-    print(f"[info] Token lengths saved to {output_csv}")
-
-    # === Summary Statistics ===
-    print("\n=== Token Length Summary ===")
-    print(df["work_notes_token_length"].describe())
-
-    # === Plot Histogram ===
-    plt.figure(figsize=(10, 6))
-    plt.hist(df["work_notes_token_length"], bins=50)
-    plt.title(f"Distribution of Token Lengths for {column_name}")
-    plt.xlabel("Token Length")
-    plt.ylabel("Frequency")
-    plt.tight_layout()
-    plt.savefig(output_plot, dpi=150)
-    plt.close()
-    print(f"[info] Histogram saved to {output_plot}")
-    exit()
-    main()
+    main()
🧰 Tools
🪛 Ruff (0.15.0)

[warning] 237-237: Avoid specifying long messages outside the exception class

(TRY003)


[warning] 242-242: Loop control variable idx not used within loop body

Rename unused idx to _idx

(B007)

🤖 Prompt for AI Agents
In `@recipes/noc-reasoning-agent/scripts/utils/token_usage.py` around lines 207 -
276, The __main__ block contains leftover debug logic that hardcodes
csv_file/model, calls AutoTokenizer directly (unsafe if transformers missing),
and calls exit() before main(), making main() unreachable; remove or replace the
entire if __name__ == "__main__": block so it simply parses CLI/default args and
invokes main(), ensure any tokenizer creation uses the existing guarded
import/creation logic (e.g., reuse the module's AutoTokenizer-safe factory or
wrap AutoTokenizer.from_pretrained in the same try/except used elsewhere),
eliminate hardcoded paths and stray exit() calls, and wire csv_file,
column_name, model, output_csv, output_plot into main() parameters or argparse
so the script runs via main() correctly.

Comment on lines +26 to +71
def parse_final_reasoning(generation_text):
"""
Parses the 'generation' field to extract only the final
Thought, Action, and Observation steps.
"""
# 1. Isolate the final, clean reasoning trace
# This regex looks for the block starting with "Question:" and ending with "Finish[...]"
trace_block_match = re.search(
r"Question:.*?(Finish\[.*?\])",
generation_text,
re.DOTALL, # Allows '.' to match newlines
)

# Find the last occurrence of "Finish"
last_index = generation_text.rfind("Finish")

if last_index != -1:
finish_text = generation_text[last_index:].strip()
print(finish_text)
else:
print("No 'Finish' found!")

return finish_text

if not trace_block_match:
return "Final reasoning trace not found."

reasoning_trace = trace_block_match.group(0)

# 2. Extract each individual Thought, Action, and Observation
# This regex finds all lines starting with the keywords
step_pattern = re.compile(
r"^(Thought|Action|Observation)\s+\d+:\s*(.*)$",
re.MULTILINE, # Allows '^' to match the start of each line
)

steps = step_pattern.findall(reasoning_trace)

# 3. Format the results for clarity
parsed_steps = []
for step in steps:
kind = step[0] # "Thought", "Action", or "Observation"
content = step[1] # The text of the step
parsed_steps.append(f"**{kind}:** {content}")

return "\n".join(parsed_steps)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Dead code: lines 50–71 are unreachable after return on line 48.

parse_final_reasoning always returns at line 48 (or crashes if last_index == -1 due to the same unbound-variable pattern). Everything from line 50 onward is dead code. Additionally, lines 44 and 46 contain debug print statements that should be removed.

Either remove the dead code or restructure the function so the intended logic is reachable.

🤖 Prompt for AI Agents
In
`@recipes/noc-reasoning-agent/scripts/visualization/generate_trace_visualization.py`
around lines 26 - 71, The function parse_final_reasoning contains an early
return and debug prints that make the intended parsing logic unreachable; remove
the premature return and the print statements and restructure the function to
(1) first find the final "Finish" block using trace_block_match or last_index as
a fallback, (2) set finish_text deterministically (return a clear error string
if neither is found), (3) run the step_pattern regex against that finish_text to
extract Thought/Action/Observation entries, format them into parsed_steps, and
finally return the joined string; refer to parse_final_reasoning,
trace_block_match, last_index, finish_text, step_pattern, and parsed_steps when
making these changes.

Comment on lines +74 to +88
def find_finish_action(generation_text):
# Regex to capture everything inside Finish[ ... ]
# match = re.search(r"Finish\[(.*?)\]", generation_text, re.DOTALL)

# finish_action = match.group(1).strip()
# return finish_action
last_index = generation_text.rfind("Finish")

if last_index != -1:
finish_text = generation_text[last_index:].strip()
# print(finish_text)
else:
print("No 'Finish' found!")

return finish_text
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

UnboundLocalError when "Finish" is not found.

When generation_text doesn't contain "Finish", finish_text is never assigned, but line 88 returns it unconditionally. This will crash at runtime.

🐛 Proposed fix
 def find_finish_action(generation_text):
-    # Regex to capture everything inside Finish[ ... ]
-    # match = re.search(r"Finish\[(.*?)\]", generation_text, re.DOTALL)
-
-    # finish_action = match.group(1).strip()
-    # return finish_action
     last_index = generation_text.rfind("Finish")
-
     if last_index != -1:
-        finish_text = generation_text[last_index:].strip()
-        # print(finish_text)
+        return generation_text[last_index:].strip()
     else:
-        print("No 'Finish' found!")
-
-    return finish_text
+        return ""
🤖 Prompt for AI Agents
In
`@recipes/noc-reasoning-agent/scripts/visualization/generate_trace_visualization.py`
around lines 74 - 88, The function find_finish_action assigns finish_text only
when last_index != -1 but always returns finish_text, causing UnboundLocalError
when "Finish" isn't found; update find_finish_action to handle the not-found
case by returning a safe default (e.g., None or empty string) or raising a clear
exception, and ensure finish_text is defined in both branches (use last_index,
generation_text, and finish_text to locate/construct the return value or the
fallback).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant