-
Couldn't load subscription status.
- Fork 182
[ROB-1734] Robusta runbook support + better runbook usage #1061
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
WalkthroughThreads a typed RunbookCatalog and DAL-backed runbook loading through server handlers, conversation/investigation flows, prompt composition (add_runbooks_to_user_prompt), runbook toolsets, templates, mocks, tests, and fixtures; converts Config.get_runbook_catalog to an instance method using self.dal. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Server as API handler
participant Config
participant DAL as SupabaseDal / MockDal
participant Catalog as RunbookCatalog
participant Prompt as add_runbooks_to_user_prompt
participant Investigator as IssueInvestigator
participant LLM
Server->>Config: get_runbook_catalog()
Config->>DAL: get_runbook_catalog()
DAL-->>Config: RunbookCatalog or None
Config-->>Server: runbooks (optional)
Server->>Investigator: investigate_issues(..., runbooks)
Investigator->>Prompt: add_runbooks_to_user_prompt(user_prompt, runbook_catalog=runbooks, issue_instructions, resource_instructions, global_instructions)
Prompt-->>Investigator: formatted_user_prompt
Investigator->>LLM: build messages (system [runbooks_enabled], formatted_user_prompt)
LLM-->>Investigator: response
Investigator-->>Server: result (res.instructions := issue_runbooks)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Potential attention areas:
Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
holmes/plugins/prompts/investigation_procedure.jinja2 (1)
4-8: Resolve TodoWrite-first vs Runbook-first contradiction.Lines 4-8 mandate starting with TodoWrite unconditionally, while Lines 9-36 mandate fetching/acting on runbooks before any tasks when runbooks_enabled. This can cause the agent to deadlock or violate one rule.
Refactor the top rule to be conditional:
-CRITICAL: For multi-step questions, you MUST start by calling the TodoWrite tool with a `todos` parameter containing an array of task objects. Each task must have: +{% if runbooks_enabled -%} +CRITICAL: For multi-step questions, FIRST fetch relevant runbooks (see below). AFTER you decide they allow continuation, create TodoWrite tasks as needed. Each task must have: +{% else -%} +CRITICAL: For multi-step questions, you MUST start by calling the TodoWrite tool with a `todos` parameter containing an array of task objects. Each task must have: +{% endif %} - `id`: unique identifier (string) - `content`: specific task description (string) - `status`: "pending" for new tasks (string)Also applies to: 9-18, 31-36
holmes/core/tool_calling_llm.py (1)
1209-1216: Fix console message gating (use matched runbooks, not catalog presence).Currently gated on
runbooks, but you printlen(issue_runbooks). This misreports when a catalog exists but no matches (or vice‑versa). Gate onissue_runbooks.- if console and runbooks: - console.print( - f"[bold]Analyzing with {len(issue_runbooks)} runbooks: {issue_runbooks}[/bold]" - ) - elif console: - console.print( - "[bold]No runbooks found for this issue. Using default behaviour. (Add runbooks to guide the investigation.)[/bold]" - ) + if console: + if issue_runbooks: + console.print( + f"[bold]Analyzing with {len(issue_runbooks)} runbook(s): {issue_runbooks}[/bold]" + ) + else: + console.print( + "[bold]No runbooks found for this issue. Using default behaviour. (Add runbooks to guide the investigation.)[/bold]" + )holmes/plugins/toolsets/runbook/runbook_fetcher.py (1)
266-283: Fix test call site that will fail with new signature.The test at
tests/plugins/toolsets/test_runbook.py:10callsRunbookToolset()with zero arguments, but the new signature requiresdalas a mandatory positional parameter:def __init__( self, dal: Optional[SupabaseDal], additional_search_paths: Optional[List[str]] = None, ):This will raise
TypeError: missing 1 required positional argument: 'dal'at runtime.Additionally, note that the parent
Toolsetclass (holmes/core/tools.py:534) hasextra="forbid"in its ConfigDict, so theadditional_search_pathsparameter stored in the localconfigdict is unused and should be removed as dead code, or properly integrated if intended for runtime configuration.
🧹 Nitpick comments (24)
tests/llm/fixtures/test_ask_holmes/162_get_runbooks/app/payments-deployment.yaml (1)
17-67: Consider adding security context for defense-in-depth (test fixtures).The static analysis correctly flags missing security hardening. While this is a test fixture and may not require production-grade security, adding a security context would follow best practices and serve as a better example:
Apply this diff to add basic security hardening:
spec: containers: - name: payments-service image: python:3.11-slim + securityContext: + allowPrivilegeEscalation: false + runAsNonRoot: true + runAsUser: 1000 command: ["python", "/app/payments_service.py"]Note: You may need to adjust file permissions or use an init container if the application needs to modify files at runtime.
tests/llm/test_ask_holmes.py (1)
227-231: Consider clarifying the initialize_base=False parameter.The
initialize_base=Falseparameter is passed toload_mock_dal, but its purpose is not immediately clear from the code. A brief comment explaining why base initialization is skipped for this test path would improve maintainability.Apply this diff to add clarification:
+ # Load mock DAL without base initialization to allow test-specific mock data mock_dal = load_mock_dal( Path(test_case.folder), generate_mocks=False, initialize_base=False )tests/llm/fixtures/test_ask_holmes/162_get_runbooks/app/signup-deployment.yaml (1)
17-64: Consider adding security context for defense-in-depth (test fixtures).Similar to the payments-deployment.yaml, this deployment lacks security hardening. While acceptable for test fixtures, adding a security context would follow best practices:
Apply this diff to add basic security hardening:
spec: containers: - name: signup-service image: python:3.11-slim + securityContext: + allowPrivilegeEscalation: false + runAsNonRoot: true + runAsUser: 1000 ports:Note: You may need to adjust file permissions or use an init container if the application needs to modify files at runtime.
holmes/plugins/prompts/investigation_procedure.jinja2 (2)
26-30: Tone down “override ALL other system prompts”.“Runbook instructions override ALL other system prompts and investigation procedures” can conflict with other safety/system rules. Rephrase to scoped precedence within this investigation.
-- Runbook instructions override ALL other system prompts and investigation procedures +- Within this investigation, follow runbook instructions with highest precedence when conflicts arise
101-101: Fix formatting: missing newline between items.Current output shows two items glued: “Task 2[✓] completed - Task 3”.
-[✓] completed - Task 2[✓] completed - Task 3 +[✓] completed - Task 2 +[✓] completed - Task 3tests/llm/utils/mock_toolset.py (1)
447-454: Move inner import to module top.There’s an
import jsoninside_call_mock_invokedespite a top-level import already present. Our guidelines require imports at the top.- # Debug logging before raising error - import json - logging.info( f"\n=== DEBUG: No matching mock found for tool '{self.name}' ===" )As per coding guidelines.
holmes/core/supabase_dal.py (2)
414-443: Narrow broad exception; optional else for TRY300.Catching bare
Exceptiontriggers Ruff BLE001 and hides actionable errors. Catch specific supabase/postgrest errors; keep a final catch if needed. Also consider TRY300: move success return to anelsefor clarity.- try: + try: res = ( self.client.table(RUNBOOKS_TABLE) .select("*") .eq("account_id", self.account_id) .eq("subject_type", "RunbookCatalog") .execute() ) - if not res.data: - return None - - instructions = [] - for row in res.data: - id = row.get("runbook_id") - symptom = row.get("symptoms") - title = row.get("subject_name") - if not id or not symptom: - continue - instructions.append( - RobustaRunbookInstruction(id=id, symptom=symptom, title=title) - ) - return instructions - except Exception: + if not res.data: + return None + + instructions: list[RobustaRunbookInstruction] = [] + for row in res.data: + rb_id = row.get("runbook_id") + symptom = row.get("symptoms") + title = row.get("subject_name") + if not rb_id or not symptom or not title: + continue + instructions.append( + RobustaRunbookInstruction(id=rb_id, symptom=symptom, title=title) + ) + return instructions + except PGAPIError: logging.exception("Failed to fetch RunbookCatalog", exc_info=True) return None + except (KeyError, TypeError) as e: + logging.exception("Malformed RunbookCatalog rows: %s", e, exc_info=True) + return None
418-441: Avoid blindexcept Exception; catch concrete errors.Ruff BLE001 flagged these. Prefer
PGAPIError,json.JSONDecodeError,KeyError,TypeError. If you must retain a final broad catch, add a clear log and consider# noqa: BLE001locally.- except Exception: + except PGAPIError: logging.exception("Failed to fetch RunbookCatalog", exc_info=True) return None- except Exception: - instruction = str(raw_instruction) + except json.JSONDecodeError: + instruction = str(raw_instruction)As per coding guidelines.
Also applies to: 471-475
server.py (1)
147-154: Consider caching runbook catalog per-processconfig.get_runbook_catalog() is called on several hot endpoints; if it hits filesystem and/or Supabase each time, latency and rate limits may suffer. Cache with TTL or memoize inside Config unless account-scoped invalidation is needed. Safe to defer.
Also applies to: 196-229, 290-301, 331-342
holmes/utils/global_instructions.py (1)
14-21: Neutralize header phrasing (optional)“My instructions to check:” reads first-person; prefer neutral for clarity.
-def _format_instructions_block( - items: List[str], header: str = "My instructions to check:" +def _format_instructions_block( + items: List[str], header: str = "Instructions to check:" ) -> str:holmes/plugins/runbooks/__init__.py (3)
132-141: Remove leading blank line in catalog promptparts starts with [""] causing a leading newline.
- parts: List[str] = [""] + parts: List[str] = []
156-164: Prefer logging.exception and avoid blind exceptsUse logging.exception to capture tracebacks and avoid silent failures; narrow exceptions already handled for JSONDecodeError.
- except json.JSONDecodeError as e: - logging.error(f"Error decoding JSON from {catalogPath}: {e}") + except json.JSONDecodeError: + logging.exception(f"Error decoding JSON from {catalogPath}") return None - except Exception as e: - logging.error( - f"Unexpected error while loading runbook catalog from {catalogPath}: {e}" - ) + except Exception: + logging.exception( + f"Unexpected error while loading runbook catalog from {catalogPath}" + ) return None @@ - except Exception as e: - logging.error(f"Error loading runbooks from Supabase: {e}") + except Exception: + logging.exception("Error loading runbooks from Supabase")Also applies to: 171-174
177-200: Docstring/signature mismatch for search_paths (optional)Docstring says search_paths may be None with a default; signature requires List[str]. Either update docstring or support Optional with default to DEFAULT_RUNBOOK_SEARCH_PATH.
-def get_runbook_by_path( - runbook_relative_path: str, search_paths: List[str] -) -> Optional[str]: +def get_runbook_by_path( + runbook_relative_path: str, search_paths: Optional[List[str]] = None +) -> Optional[str]: @@ - for search_path in search_paths: + for search_path in (search_paths or [DEFAULT_RUNBOOK_SEARCH_PATH]):tests/llm/utils/mock_dal.py (2)
77-89: Harden mock file readers and log tracebacksCatching Exception is fine for test mocks; use logging.exception to preserve context.
- except Exception as e: - logging.warning(f"Failed to read runbook catalog mock file: {e}") + except Exception: + logging.exception("Failed to read runbook catalog mock file") @@ - except Exception as e: - logging.warning(f"Failed to read runbook content mock file: {e}") + except Exception: + logging.exception("Failed to read runbook content mock file") @@ - except Exception as e: - logging.warning(f"Failed to read global instructions mock file: {e}") + except Exception: + logging.exception("Failed to read global instructions mock file")Also applies to: 91-103, 108-119
121-123: Silence ARG002 in mock by accepting kwargs (optional)-def get_workload_issues(self, *args) -> list: +def get_workload_issues(self, *args, **kwargs) -> list: return []holmes/core/tool_calling_llm.py (2)
1225-1226: Align runbooks_enabled with actual availability.Flag should reflect either a provided catalog or matched instructions.
- "runbooks_enabled": True if runbooks else False, + "runbooks_enabled": bool(runbooks or issue_runbooks),
1238-1238: Tidy heading and whitespace in appended issue context.Remove the leading space and use a clearer header.
- user_prompt = f"{user_prompt}\n #This is context from the issue:\n{issue.raw}" + user_prompt = f"{user_prompt}\n# Issue context:\n{issue.raw}"holmes/plugins/toolsets/runbook/runbook_fetcher.py (4)
60-74: Rename tool parametertype→runbook_typeto avoid builtin shadowing.Using
typeis confusing; prefer a clearer name. Update parameter schema and call sites.- "type": ToolParameter( + "runbook_type": ToolParameter( description=f"Type of runbook identifier. Must be one of: {allowed_types_str}", type="string", required=True, ),Also update reads/writes accordingly (see next comments).
81-95: Unusedcontextand message text.
- Prefix
contextwith_to satisfy Ruff ARG002.- Error mentions “link” but param is
runbook_id.- def _invoke(self, params: dict, context: ToolInvokeContext) -> StructuredToolResult: + def _invoke(self, params: dict, _context: ToolInvokeContext) -> StructuredToolResult: @@ - if not runbook_id or not runbook_id.strip(): - err_msg = ( - "Runbook link cannot be empty. Please provide a valid runbook path." - ) + if not runbook_id or not runbook_id.strip(): + err_msg = "Runbook identifier cannot be empty. Provide a UUID or a .md filename."
82-100: Keep param names consistent after renaming.If you adopt
runbook_type, update reads.- runbook_type: str = params.get("type", "") + runbook_type: str = params.get("runbook_type", "")
260-263: Update one‑liner to new params.Uses legacy
link; switch torunbook_idand include runbook type for clarity.- def get_parameterized_one_liner(self, params) -> str: - path: str = params.get("link", "") - return f"{toolset_name_for_one_liner(self.toolset.name)}: Fetch Runbook {path}" + def get_parameterized_one_liner(self, params) -> str: + rb_id: str = params.get("runbook_id", "") + rb_type: str = params.get("runbook_type", params.get("type", "")) # backward-compat + suffix = f" ({rb_type})" if rb_type else "" + return f"{toolset_name_for_one_liner(self.toolset.name)}: Fetch Runbook{suffix} {rb_id}"holmes/core/conversations.py (3)
140-142: DRY up repeatedrunbooks_enabledexpressions.Compute once per function for readability.
- system_prompt = load_and_render_prompt( + runbooks_enabled = bool(runbooks) + system_prompt = load_and_render_prompt( template_path, { @@ - "runbooks_enabled": True if runbooks else False, + "runbooks_enabled": runbooks_enabled, }, )Repeat similarly where used in this function (and below helpers).
Also applies to: 161-162, 195-196
500-501: Same DRY suggestion for workload health.Factor
runbooks_enabled = bool(runbooks)and reuse.Also applies to: 521-522, 555-556
118-129: Guard against oversized prompts when catalogs are large.Adding catalogs and global instructions increases user‑prompt size. We truncate tool messages later, but not user prompts. Consider bounding
runbook_catalog.to_prompt_string()(e.g., top‑N entries) or add a max‑chars cap inadd_runbooks_to_user_prompt.Also applies to: 211-216, 483-489, 571-576
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (30)
holmes/config.py(1 hunks)holmes/core/conversations.py(21 hunks)holmes/core/investigation.py(6 hunks)holmes/core/prompt.py(1 hunks)holmes/core/supabase_dal.py(2 hunks)holmes/core/tool_calling_llm.py(5 hunks)holmes/plugins/prompts/_runbook_instructions.jinja2(1 hunks)holmes/plugins/prompts/generic_ask.jinja2(0 hunks)holmes/plugins/prompts/investigation_procedure.jinja2(6 hunks)holmes/plugins/runbooks/__init__.py(3 hunks)holmes/plugins/toolsets/__init__.py(1 hunks)holmes/plugins/toolsets/runbook/runbook_fetcher.py(6 hunks)holmes/utils/global_instructions.py(1 hunks)server.py(9 hunks)tests/llm/fixtures/test_ask_holmes/162_get_runbooks/app/payments-deployment.yaml(1 hunks)tests/llm/fixtures/test_ask_holmes/162_get_runbooks/app/secrets.yaml(1 hunks)tests/llm/fixtures/test_ask_holmes/162_get_runbooks/app/signup-deployment.yaml(1 hunks)tests/llm/fixtures/test_ask_holmes/162_get_runbooks/global_instructions.json(1 hunks)tests/llm/fixtures/test_ask_holmes/162_get_runbooks/issue_data.json(1 hunks)tests/llm/fixtures/test_ask_holmes/162_get_runbooks/resource_instructions.json(1 hunks)tests/llm/fixtures/test_ask_holmes/162_get_runbooks/runbook_catalog.json(1 hunks)tests/llm/fixtures/test_ask_holmes/162_get_runbooks/runbook_content_40296b5c-2cb5-41df-b1f5-93441f894c44.json(1 hunks)tests/llm/fixtures/test_ask_holmes/162_get_runbooks/runbook_content_8fe8e24d-6b53-47a6-92a5-b389938fa823.json(1 hunks)tests/llm/fixtures/test_ask_holmes/162_get_runbooks/runbook_content_b2ffd311-f339-46a2-b379-1e53eb0bc1ed.json(1 hunks)tests/llm/fixtures/test_ask_holmes/162_get_runbooks/test_case.yaml(1 hunks)tests/llm/fixtures/test_ask_holmes/162_get_runbooks/toolsets.yaml(1 hunks)tests/llm/test_ask_holmes.py(3 hunks)tests/llm/utils/mock_dal.py(4 hunks)tests/llm/utils/mock_toolset.py(3 hunks)tests/test_runbook_prompt.py(1 hunks)
💤 Files with no reviewable changes (1)
- holmes/plugins/prompts/generic_ask.jinja2
🧰 Additional context used
📓 Path-based instructions (9)
tests/**
📄 CodeRabbit inference engine (CLAUDE.md)
Test files should mirror the source structure under tests/
Files:
tests/llm/fixtures/test_ask_holmes/162_get_runbooks/app/payments-deployment.yamltests/llm/fixtures/test_ask_holmes/162_get_runbooks/runbook_content_40296b5c-2cb5-41df-b1f5-93441f894c44.jsontests/llm/fixtures/test_ask_holmes/162_get_runbooks/test_case.yamltests/llm/fixtures/test_ask_holmes/162_get_runbooks/app/signup-deployment.yamltests/llm/fixtures/test_ask_holmes/162_get_runbooks/runbook_catalog.jsontests/llm/fixtures/test_ask_holmes/162_get_runbooks/resource_instructions.jsontests/llm/fixtures/test_ask_holmes/162_get_runbooks/toolsets.yamltests/llm/fixtures/test_ask_holmes/162_get_runbooks/runbook_content_b2ffd311-f339-46a2-b379-1e53eb0bc1ed.jsontests/llm/fixtures/test_ask_holmes/162_get_runbooks/app/secrets.yamltests/llm/fixtures/test_ask_holmes/162_get_runbooks/issue_data.jsontests/llm/utils/mock_toolset.pytests/llm/fixtures/test_ask_holmes/162_get_runbooks/runbook_content_8fe8e24d-6b53-47a6-92a5-b389938fa823.jsontests/llm/utils/mock_dal.pytests/llm/test_ask_holmes.pytests/test_runbook_prompt.pytests/llm/fixtures/test_ask_holmes/162_get_runbooks/global_instructions.json
tests/llm/**/*.yaml
📄 CodeRabbit inference engine (CLAUDE.md)
tests/llm/**/*.yaml: In eval manifests, ALWAYS use Kubernetes Secrets for scripts rather than inline manifests or ConfigMaps to prevent script exposure via kubectl describe
Eval resources must use neutral names, each test must use a dedicated namespace 'app-', and all pod names must be unique across tests
Files:
tests/llm/fixtures/test_ask_holmes/162_get_runbooks/app/payments-deployment.yamltests/llm/fixtures/test_ask_holmes/162_get_runbooks/test_case.yamltests/llm/fixtures/test_ask_holmes/162_get_runbooks/app/signup-deployment.yamltests/llm/fixtures/test_ask_holmes/162_get_runbooks/toolsets.yamltests/llm/fixtures/test_ask_holmes/162_get_runbooks/app/secrets.yaml
holmes/plugins/prompts/**/*.jinja2
📄 CodeRabbit inference engine (CLAUDE.md)
Prompts must be stored as Jinja2 templates under holmes/plugins/prompts/{name}.jinja2
Files:
holmes/plugins/prompts/investigation_procedure.jinja2holmes/plugins/prompts/_runbook_instructions.jinja2
tests/**/test_case.yaml
📄 CodeRabbit inference engine (CLAUDE.md)
Do NOT put toolset configuration directly in test_case.yaml; keep toolset config in a separate toolsets.yaml
Files:
tests/llm/fixtures/test_ask_holmes/162_get_runbooks/test_case.yaml
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Use Ruff for formatting and linting (configured in pyproject.toml) for all Python code
Type hints are required; code should pass mypy (configured in pyproject.toml)
ALWAYS place Python imports at the top of the file, not inside functions or methods
Files:
holmes/plugins/toolsets/__init__.pyholmes/utils/global_instructions.pyholmes/core/supabase_dal.pyholmes/config.pyserver.pyholmes/core/conversations.pytests/llm/utils/mock_toolset.pyholmes/core/investigation.pyholmes/core/prompt.pytests/llm/utils/mock_dal.pytests/llm/test_ask_holmes.pytests/test_runbook_prompt.pyholmes/plugins/toolsets/runbook/runbook_fetcher.pyholmes/core/tool_calling_llm.pyholmes/plugins/runbooks/__init__.py
holmes/plugins/toolsets/**/*
📄 CodeRabbit inference engine (CLAUDE.md)
Toolsets must be located as holmes/plugins/toolsets/{name}.yaml or holmes/plugins/toolsets/{name}/
Files:
holmes/plugins/toolsets/__init__.pyholmes/plugins/toolsets/runbook/runbook_fetcher.py
tests/**/toolsets.yaml
📄 CodeRabbit inference engine (CLAUDE.md)
In eval toolsets.yaml, ALL toolset-specific configuration must be under a 'config' field; do not place toolset config at the top level
Files:
tests/llm/fixtures/test_ask_holmes/162_get_runbooks/toolsets.yaml
{holmes/plugins/toolsets/**/*.{yaml,yml},tests/**/toolsets.yaml}
📄 CodeRabbit inference engine (CLAUDE.md)
Only the following top-level fields are valid in toolset YAML: enabled, name, description, additional_instructions, prerequisites, tools, docs_url, icon_url, installation_instructions, config, url (MCP toolsets only)
Files:
tests/llm/fixtures/test_ask_holmes/162_get_runbooks/toolsets.yaml
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Only use pytest markers that are defined in pyproject.toml; never introduce undefined markers/tags
Files:
tests/llm/utils/mock_toolset.pytests/llm/utils/mock_dal.pytests/llm/test_ask_holmes.pytests/test_runbook_prompt.py
🧠 Learnings (7)
📓 Common learnings
Learnt from: mainred
PR: robusta-dev/holmesgpt#829
File: holmes/plugins/runbooks/runbook-format.prompt.md:11-22
Timestamp: 2025-08-13T05:57:40.420Z
Learning: In holmes/plugins/runbooks/runbook-format.prompt.md, the user (mainred) prefers to keep the runbook step specifications simple without detailed orchestration metadata like step IDs, dependencies, retry policies, timeouts, and storage variables. The current format with Action, Function Description, Parameters, Expected Output, and Success/Failure Criteria is sufficient for their AI agent troubleshooting use case.
Learnt from: CR
PR: robusta-dev/holmesgpt#0
File: holmes/plugins/runbooks/CLAUDE.md:0-0
Timestamp: 2025-08-30T18:12:58.187Z
Learning: Applies to holmes/plugins/runbooks/**/*.md : Each runbook must include the four sections with the specified content: 1) Goal, 2) Workflow for [Issue Category] Diagnosis, 3) Synthesize Findings, 4) Recommended Remediation Steps, forming a strictly executable, machine-readable yet human-understandable runbook.
📚 Learning: 2025-08-30T18:12:58.187Z
Learnt from: CR
PR: robusta-dev/holmesgpt#0
File: holmes/plugins/runbooks/CLAUDE.md:0-0
Timestamp: 2025-08-30T18:12:58.187Z
Learning: Applies to holmes/plugins/runbooks/**/*.md : Each runbook must include the four sections with the specified content: 1) Goal, 2) Workflow for [Issue Category] Diagnosis, 3) Synthesize Findings, 4) Recommended Remediation Steps, forming a strictly executable, machine-readable yet human-understandable runbook.
Applied to files:
holmes/plugins/prompts/investigation_procedure.jinja2holmes/plugins/prompts/_runbook_instructions.jinja2
📚 Learning: 2025-08-30T18:12:58.187Z
Learnt from: CR
PR: robusta-dev/holmesgpt#0
File: holmes/plugins/runbooks/CLAUDE.md:0-0
Timestamp: 2025-08-30T18:12:58.187Z
Learning: Applies to holmes/plugins/runbooks/catalog.json : After creating a runbook, add a new entry to holmes/plugins/runbooks/catalog.json to register it for discovery.
Applied to files:
tests/llm/fixtures/test_ask_holmes/162_get_runbooks/runbook_catalog.json
📚 Learning: 2025-08-30T18:12:58.187Z
Learnt from: CR
PR: robusta-dev/holmesgpt#0
File: holmes/plugins/runbooks/CLAUDE.md:0-0
Timestamp: 2025-08-30T18:12:58.187Z
Learning: Applies to holmes/plugins/runbooks/catalog.json : Catalog entries must follow the specified JSON object structure with fields: name, path, description, tags (array).
Applied to files:
tests/llm/fixtures/test_ask_holmes/162_get_runbooks/runbook_catalog.json
📚 Learning: 2025-08-30T18:12:58.187Z
Learnt from: CR
PR: robusta-dev/holmesgpt#0
File: holmes/plugins/runbooks/CLAUDE.md:0-0
Timestamp: 2025-08-30T18:12:58.187Z
Learning: Applies to holmes/plugins/runbooks/catalog.json : Ensure proper JSON formatting when editing catalog.json, including adding a comma after the previous entry if needed and validating JSON before commit.
Applied to files:
tests/llm/fixtures/test_ask_holmes/162_get_runbooks/runbook_catalog.json
📚 Learning: 2025-08-05T00:42:23.792Z
Learnt from: vishiy
PR: robusta-dev/holmesgpt#782
File: config.example.yaml:31-49
Timestamp: 2025-08-05T00:42:23.792Z
Learning: In robusta-dev/holmesgpt config.example.yaml, the azuremonitorlogs toolset configuration shows "enabled: true" as an example of how to enable the toolset, not as a default setting. The toolset is disabled by default and requires explicit enablement in user configurations.
Applied to files:
tests/llm/fixtures/test_ask_holmes/162_get_runbooks/toolsets.yaml
📚 Learning: 2025-08-13T05:57:40.420Z
Learnt from: mainred
PR: robusta-dev/holmesgpt#829
File: holmes/plugins/runbooks/runbook-format.prompt.md:11-22
Timestamp: 2025-08-13T05:57:40.420Z
Learning: In holmes/plugins/runbooks/runbook-format.prompt.md, the user (mainred) prefers to keep the runbook step specifications simple without detailed orchestration metadata like step IDs, dependencies, retry policies, timeouts, and storage variables. The current format with Action, Function Description, Parameters, Expected Output, and Success/Failure Criteria is sufficient for their AI agent troubleshooting use case.
Applied to files:
holmes/plugins/prompts/_runbook_instructions.jinja2
🧬 Code graph analysis (14)
holmes/plugins/toolsets/__init__.py (2)
holmes/plugins/toolsets/runbook/runbook_fetcher.py (1)
RunbookToolset(265-292)holmes/config.py (1)
dal(121-124)
holmes/utils/global_instructions.py (3)
holmes/plugins/prompts/__init__.py (1)
load_and_render_prompt(27-54)holmes/plugins/runbooks/__init__.py (4)
RunbookCatalog(114-141)to_prompt_string(44-45)to_prompt_string(110-111)to_prompt_string(132-141)holmes/core/resource_instruction.py (1)
ResourceInstructions(15-17)
holmes/core/supabase_dal.py (2)
holmes/plugins/runbooks/__init__.py (1)
RobustaRunbookInstruction(22-54)tests/llm/utils/mock_dal.py (2)
get_runbook_catalog(77-89)get_runbook_content(91-103)
holmes/config.py (3)
tests/llm/utils/mock_dal.py (1)
get_runbook_catalog(77-89)holmes/core/supabase_dal.py (1)
get_runbook_catalog(414-442)holmes/plugins/runbooks/__init__.py (2)
RunbookCatalog(114-141)load_runbook_catalog(144-174)
server.py (5)
holmes/utils/global_instructions.py (1)
add_runbooks_to_user_prompt(37-85)holmes/config.py (2)
get_runbook_catalog(226-229)dal(121-124)tests/llm/utils/mock_dal.py (3)
get_runbook_catalog(77-89)get_resource_instructions(57-75)get_global_instructions_for_account(108-119)holmes/core/supabase_dal.py (3)
get_runbook_catalog(414-442)get_resource_instructions(477-508)get_global_instructions_for_account(510-529)holmes/core/investigation.py (1)
investigate_issues(24-78)
holmes/core/conversations.py (3)
holmes/plugins/runbooks/__init__.py (1)
RunbookCatalog(114-141)holmes/utils/global_instructions.py (2)
Instructions(10-11)add_runbooks_to_user_prompt(37-85)holmes/main.py (1)
ask(170-353)
tests/llm/utils/mock_toolset.py (3)
tests/llm/utils/mock_dal.py (1)
load_mock_dal(129-152)tests/llm/conftest.py (1)
mock_generation_config(51-79)holmes/config.py (1)
dal(121-124)
holmes/core/investigation.py (5)
holmes/config.py (2)
dal(121-124)get_runbook_catalog(226-229)holmes/core/supabase_dal.py (3)
get_resource_instructions(477-508)get_runbook_catalog(414-442)get_global_instructions_for_account(510-529)holmes/plugins/runbooks/__init__.py (1)
RunbookCatalog(114-141)holmes/utils/global_instructions.py (1)
add_runbooks_to_user_prompt(37-85)holmes/core/runbooks.py (1)
get_instructions_for_issue(11-26)
tests/llm/utils/mock_dal.py (4)
holmes/core/supabase_dal.py (4)
SupabaseDal(65-667)get_runbook_catalog(414-442)get_runbook_content(444-475)get_global_instructions_for_account(510-529)holmes/core/resource_instruction.py (1)
ResourceInstructions(15-17)holmes/plugins/runbooks/__init__.py (1)
RobustaRunbookInstruction(22-54)holmes/utils/global_instructions.py (1)
Instructions(10-11)
tests/llm/test_ask_holmes.py (3)
holmes/plugins/runbooks/__init__.py (1)
load_runbook_catalog(144-174)tests/llm/utils/mock_dal.py (2)
load_mock_dal(129-152)get_global_instructions_for_account(108-119)holmes/core/conversations.py (1)
build_chat_messages(318-417)
tests/test_runbook_prompt.py (2)
holmes/utils/global_instructions.py (1)
add_runbooks_to_user_prompt(37-85)holmes/plugins/runbooks/__init__.py (3)
to_prompt_string(44-45)to_prompt_string(110-111)to_prompt_string(132-141)
holmes/plugins/toolsets/runbook/runbook_fetcher.py (4)
holmes/core/supabase_dal.py (2)
SupabaseDal(65-667)get_runbook_content(444-475)holmes/core/tools.py (4)
Tool(173-365)ToolParameter(155-161)StructuredToolResult(79-103)StructuredToolResultStatus(52-76)holmes/config.py (1)
dal(121-124)holmes/plugins/runbooks/__init__.py (3)
load_runbook_catalog(144-174)list_available_runbooks(117-118)pretty(47-54)
holmes/core/tool_calling_llm.py (3)
holmes/plugins/runbooks/__init__.py (1)
RunbookCatalog(114-141)holmes/utils/global_instructions.py (2)
Instructions(10-11)add_runbooks_to_user_prompt(37-85)holmes/core/runbooks.py (1)
get_instructions_for_issue(11-26)
holmes/plugins/runbooks/__init__.py (3)
holmes/utils/pydantic_utils.py (2)
RobustaBaseConfig(14-15)load_model_from_file(39-54)holmes/core/supabase_dal.py (2)
SupabaseDal(65-667)get_runbook_catalog(414-442)holmes/config.py (2)
dal(121-124)get_runbook_catalog(226-229)
🪛 Checkov (3.2.334)
tests/llm/fixtures/test_ask_holmes/162_get_runbooks/app/payments-deployment.yaml
[medium] 1-68: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 1-68: Minimize the admission of root containers
(CKV_K8S_23)
tests/llm/fixtures/test_ask_holmes/162_get_runbooks/app/signup-deployment.yaml
[medium] 1-65: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 1-65: Minimize the admission of root containers
(CKV_K8S_23)
tests/llm/fixtures/test_ask_holmes/162_get_runbooks/app/secrets.yaml
[medium] 8-9: Basic Auth Credentials
(CKV_SECRET_4)
🪛 Ruff (0.14.0)
holmes/core/supabase_dal.py
439-439: Consider moving this statement to an else block
(TRY300)
471-471: Do not catch blind exception: Exception
(BLE001)
tests/llm/utils/mock_dal.py
87-87: Do not catch blind exception: Exception
(BLE001)
101-101: Do not catch blind exception: Exception
(BLE001)
116-116: Do not catch blind exception: Exception
(BLE001)
121-121: Unused method argument: args
(ARG002)
holmes/plugins/toolsets/runbook/runbook_fetcher.py
32-32: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
81-81: Unused method argument: context
(ARG002)
128-128: Do not catch blind exception: Exception
(BLE001)
129-129: Use explicit conversion flag
Replace with conversion flag
(RUF010)
130-130: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
holmes/plugins/runbooks/__init__.py
157-157: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
159-159: Do not catch blind exception: Exception
(BLE001)
160-162: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
171-171: Do not catch blind exception: Exception
(BLE001)
172-172: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: llm_evals
- GitHub Check: Pre-commit checks
- GitHub Check: build
🔇 Additional comments (24)
tests/llm/fixtures/test_ask_holmes/162_get_runbooks/resource_instructions.json (1)
1-1: LGTM!Placeholder fixture for resource instructions in the test suite.
tests/llm/fixtures/test_ask_holmes/162_get_runbooks/runbook_content_40296b5c-2cb5-41df-b1f5-93441f894c44.json (1)
1-6: LGTM!Well-structured runbook fixture with comprehensive debugging steps for Pod crashlooping scenarios. The markdown formatting and kubectl command examples are appropriate for the test context.
tests/llm/fixtures/test_ask_holmes/162_get_runbooks/toolsets.yaml (1)
1-31: Verify toolset YAML format against coding guidelines.The file uses a
toolsets:top-level key mapping toolsets toenabledflags. Per the coding guidelines, valid top-level fields for toolset YAML are:enabled,name,description,additional_instructions,prerequisites,tools,docs_url,icon_url,installation_instructions,config,url. This fixture format differs from that specification.Clarify whether this simplified format is intentional for test fixtures, or if it should align with the standard toolset YAML structure.
tests/llm/fixtures/test_ask_holmes/162_get_runbooks/runbook_content_8fe8e24d-6b53-47a6-92a5-b389938fa823.json (1)
1-6: LGTM!Well-structured Kafka runbook fixture with clear data flow diagrams and practical debugging guidance for lag investigation.
tests/llm/fixtures/test_ask_holmes/162_get_runbooks/runbook_content_b2ffd311-f339-46a2-b379-1e53eb0bc1ed.json (1)
1-6: LGTM!Comprehensive signup service runbook with well-organized debugging steps that appropriately emphasize the critical payments service dependency and Stripe API key validation.
tests/llm/fixtures/test_ask_holmes/162_get_runbooks/app/secrets.yaml (2)
24-131: LGTM!Signup service implementation is well-structured with appropriate error handling, dependency checks via readiness endpoint, and proper logging. The service-to-service communication pattern (checking payments service health) is realistic and suitable for testing runbook-driven debugging.
156-277: LGTM!Payments service implementation correctly simulates the failure scenario (invalid Stripe API key) that the Signup Service Debugging Runbook is designed to address. Health checks validate API key configuration, and error responses are appropriately detailed. The logging strategy appropriately avoids exposing sensitive credentials while documenting key status.
tests/llm/fixtures/test_ask_holmes/162_get_runbooks/runbook_catalog.json (1)
1-17: Verify runbook catalog entry structure against expected format.The learnings indicate that "Catalog entries must follow the specified JSON object structure with fields: name, path, description, tags (array)." This fixture uses:
id,symptom, andtitle. Confirm whether this simplified format is intentional for the test fixture or if additional fields (name, path, description, tags) should be included to match the production RunbookCatalogEntry structure.tests/llm/fixtures/test_ask_holmes/162_get_runbooks/issue_data.json (1)
1-1: LGTM!Placeholder fixture for issue data in the test suite.
holmes/plugins/toolsets/__init__.py (1)
105-105: LGTM! Correctly threads dal parameter to RunbookToolset.The change aligns with the new API where
RunbookToolsetaccepts a dal parameter, maintaining consistency with other toolsets likeRobustaToolset(dal)on line 84.holmes/core/prompt.py (1)
60-60: LGTM! Clean simplification using boolean flag.Replacing the runbooks dict with a
runbooks_enabledboolean flag is a good refactoring that reduces the data passed to the template context and provides cleaner separation between presence checking and content usage.holmes/config.py (1)
226-229: LGTM! Correctly refactored to use instance DAL.The method now properly uses
self.dalto enable Supabase-backed runbook loading, aligning with the broader DAL integration throughout the codebase.tests/llm/fixtures/test_ask_holmes/162_get_runbooks/global_instructions.json (1)
1-9: LGTM! Well-structured test fixture for global instructions.The fixture provides realistic instructions that align with the test scenario (Stripe API, service dependencies, team escalation paths).
tests/llm/fixtures/test_ask_holmes/162_get_runbooks/test_case.yaml (1)
1-31: LGTM! Well-structured test case with proper isolation.The test case follows best practices with:
- Dedicated namespace
app-162following theapp-<testid>pattern (as per coding guidelines)- Proper setup/teardown for test isolation
- Clear expected output criteria for verification
tests/llm/test_ask_holmes.py (1)
209-220: The review comment is incorrect.The
build_initial_ask_messagesfunction signature showsrunbooks: Union[RunbookCatalog, Dict, None], which explicitly accepts a dict. Thebuild_chat_messagesfunction signature showsrunbooks: Optional[RunbookCatalog], which accepts only a RunbookCatalog object or None.The test code is correct:
- CLI path (line 213-214): passes a dict to
build_initial_ask_messages— this matches the requiredUnion[RunbookCatalog, Dict, None]type- Non-CLI path (line 233): passes a RunbookCatalog object to
build_chat_messages— this matches the requiredOptional[RunbookCatalog]typeThe functions have different type requirements by design, and the test implementation correctly follows each function's signature. There is no type inconsistency to resolve.
Likely an incorrect or invalid review comment.
tests/test_runbook_prompt.py (1)
18-83: Unit tests cover key assembly scenarios; looks good.Parametrization validates catalog, issue/resource/global sections and combined case. Assertions via substrings are appropriate given template rendering.
Also applies to: 85-101
tests/llm/utils/mock_toolset.py (1)
682-697: Good: wire mock DAL into RunbookToolset.Passing dal=mock_dal enables test-case mock data and keeps additional_search_paths. Preserves config and status. LGTM.
holmes/core/investigation.py (1)
131-142: runbooks_enabled gate is correct.Tying the flag to catalog presence is a sane default and aligns the template’s conditional behavior. LGTM.
holmes/plugins/prompts/investigation_procedure.jinja2 (1)
10-12: Tool name reference is correct.The tool is correctly named "fetch_runbook" and the reference in the prompt at lines 10-12 is accurate and actionable. The instruction is properly instructing users to use the correct tool name.
holmes/core/tool_calling_llm.py (2)
1231-1237: User-prompt assembly LGTM.Switch to
add_runbooks_to_user_promptis correct and matches new API.
1252-1252: The review comment is incorrect.The code's type annotations are correct:
Runbook.instructionsisstr(a single string), notList[str]get_instructions_for_issue()correctly appends individualstrvalues to the list, returningList[str]LLMResult.instructionsisList[str], matching the type ofissue_runbooks- The assignment
res.instructions = issue_runbooksis type-safeNo type mismatch exists. The code will pass mypy validation.
Likely an incorrect or invalid review comment.
holmes/core/conversations.py (2)
124-129: User-prompt runbook injection LGTM.Switch to
add_runbooks_to_user_promptis correct and minimal.
388-392: General chat: runbook injection LGTM.Correctly composes the user ask with catalog/global instructions.
holmes/plugins/toolsets/runbook/runbook_fetcher.py (1)
145-197: Instance fields verified; code changes approved.The verification confirms
self.available_runbooksandself.additional_search_pathsare properly declared as instance fields (lines 32–33) and initialized in__init__(lines 45–77). The_get_md_runbookmethod correctly depends on these fields being set, and they are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
holmes/core/supabase_dal.py (1)
435-466: Fix potential None dereference and use specific exception types.Line 456 has a critical bug:
row.get("runbook").get("instructions")will raise AttributeError ifrunbookis None. Line 462's bareexcept Exceptionis too broad.This issue was already flagged in a previous review. Apply the suggested fix from the past review comment to safely handle None values and parse instructions robustly.
server.py (1)
206-206: Typo prevents stored resource instructions from loading.Line 206 checks
request.stored_instrucitons(misspelled), sostored_instructionsis never fetched from the DAL even when the user intends to load them.This issue was already flagged in a previous review. Fix the typo by changing
stored_instrucitonstostored_instructions.
🧹 Nitpick comments (1)
holmes/core/supabase_dal.py (1)
422-428: Consider validatingtitlealongsideidandsymptom.The code skips entries missing
idorsymptombut doesn't validatetitle, even thoughtitleis passed toRobustaRunbookInstruction. Iftitlecan be None and that's intentional, this is fine; otherwise, consider addingor not titleto the validation on line 425.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
holmes/core/supabase_dal.py(2 hunks)holmes/core/tool_calling_llm.py(5 hunks)server.py(9 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Use Ruff for formatting and linting (configured in pyproject.toml) for all Python code
Type hints are required; code should pass mypy (configured in pyproject.toml)
ALWAYS place Python imports at the top of the file, not inside functions or methods
Files:
server.pyholmes/core/supabase_dal.pyholmes/core/tool_calling_llm.py
🧬 Code graph analysis (3)
server.py (5)
holmes/utils/global_instructions.py (1)
add_runbooks_to_user_prompt(37-85)holmes/core/supabase_dal.py (3)
get_runbook_catalog(405-433)get_resource_instructions(468-499)get_global_instructions_for_account(501-520)tests/llm/utils/mock_dal.py (3)
get_runbook_catalog(77-89)get_resource_instructions(57-75)get_global_instructions_for_account(108-119)holmes/config.py (2)
get_runbook_catalog(226-229)dal(121-124)holmes/core/investigation.py (1)
investigate_issues(24-78)
holmes/core/supabase_dal.py (2)
holmes/plugins/runbooks/__init__.py (1)
RobustaRunbookInstruction(22-54)tests/llm/utils/mock_dal.py (2)
get_runbook_catalog(77-89)get_runbook_content(91-103)
holmes/core/tool_calling_llm.py (3)
holmes/plugins/runbooks/__init__.py (1)
RunbookCatalog(114-141)holmes/utils/global_instructions.py (2)
Instructions(10-11)add_runbooks_to_user_prompt(37-85)holmes/core/runbooks.py (1)
get_instructions_for_issue(11-26)
🪛 Ruff (0.14.0)
holmes/core/supabase_dal.py
430-430: Consider moving this statement to an else block
(TRY300)
462-462: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Pre-commit checks
- GitHub Check: llm_evals
- GitHub Check: build
🔇 Additional comments (4)
server.py (3)
147-153: LGTM!The runbook catalog is loaded and correctly passed to the investigation flow.
290-299: LGTM!Runbook catalog is correctly loaded and threaded into the issue chat flow.
331-341: LGTM!Runbook catalog is correctly integrated into the chat endpoint.
holmes/core/tool_calling_llm.py (1)
1037-1121: LGTM!The investigate method correctly integrates the runbook catalog and issue-specific runbook instructions. The separation between
runbooks(catalog) andissue_runbooks(filtered instructions) is clear, and the type flow is consistent with the helper function signatures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
holmes/core/supabase_dal.py (1)
437-468: Critical: Fix None dereference and improve error handling.This method has multiple issues that need immediate attention:
Line 458: Potential
AttributeError(flagged in previous review but not resolved) —row.get("runbook")may returnNone, causing.get("instructions")to raise anAttributeError.Line 455: Variable name shadows built-in — Using
idshadows Python's built-inid()function.Lines 443-450: Missing try-except wrapper — The query execution is not wrapped in a try-except block, unlike
get_runbook_catalog. This inconsistency could cause unhandled exceptions.Line 464: Overly broad exception handler — Catching bare
Exceptionmasks potential bugs. Be more specific or at least document why this is necessary.Apply this diff to address these issues:
def get_runbook_content( self, runbook_id: str ) -> Optional[RobustaRunbookInstruction]: if not self.enabled: return None + try: - res = ( - self.client.table(RUNBOOKS_TABLE) - .select("*") - .eq("account_id", self.account_id) - .eq("subject_type", "RunbookCatalog") - .eq("runbook_id", runbook_id) - .execute() - ) - if not res.data or len(res.data) != 1: - return None - - row = res.data[0] - id = row.get("runbook_id") - symptom = row.get("symptoms") - title = row.get("subject_name") - raw_instruction = row.get("runbook").get("instructions") - try: - if isinstance(raw_instruction, list) and len(raw_instruction) == 1: - instruction = raw_instruction[0] - else: - instruction = json.loads(raw_instruction)[0] - except Exception: - instruction = str(raw_instruction) - return RobustaRunbookInstruction( - id=id, symptom=symptom, instruction=instruction, title=title - ) + res = ( + self.client.table(RUNBOOKS_TABLE) + .select("*") + .eq("account_id", self.account_id) + .eq("subject_type", "RunbookCatalog") + .eq("runbook_id", runbook_id) + .execute() + ) + if not res.data or len(res.data) != 1: + return None + + row = res.data[0] + rb_id = row.get("runbook_id") + symptom = row.get("symptoms") + title = row.get("subject_name") + runbook_data = row.get("runbook") or {} + raw_instruction = runbook_data.get("instructions") + + instruction_str: Optional[str] = None + if isinstance(raw_instruction, list): + instruction_str = raw_instruction[0] if raw_instruction else None + elif isinstance(raw_instruction, str): + try: + parsed = json.loads(raw_instruction) + instruction_str = parsed[0] if isinstance(parsed, list) and parsed else raw_instruction + except json.JSONDecodeError: + instruction_str = raw_instruction + + if not rb_id or not symptom or not title: + return None + + return RobustaRunbookInstruction( + id=rb_id, symptom=symptom, instruction=instruction_str, title=title + ) + except Exception: + logging.exception("Failed to fetch runbook content", exc_info=True) + return None
🧹 Nitpick comments (2)
holmes/core/supabase_dal.py (1)
407-435: Renameidto avoid shadowing built-in.The variable name
idshadows Python's built-inid()function. Userunbook_idfor clarity.Apply this diff:
instructions = [] for row in res.data: - id = row.get("runbook_id") + runbook_id = row.get("runbook_id") symptom = row.get("symptoms") title = row.get("subject_name") - if not id or not symptom: + if not runbook_id or not symptom: continue instructions.append( - RobustaRunbookInstruction(id=id, symptom=symptom, title=title) + RobustaRunbookInstruction(id=runbook_id, symptom=symptom, title=title) ) return instructionsholmes/core/tool_calling_llm.py (1)
1077-1084: Consider improving console message clarity for runbook matching.The current logic checks if the
runbookscatalog exists but displays information aboutissue_runbooks(matched instructions). While technically correct, the messaging could be clearer when the catalog exists but no runbooks match the issue.Consider refactoring for better clarity:
- if console and runbooks: + if console: + if runbooks and issue_runbooks: - console.print( - f"[bold]Analyzing with {len(issue_runbooks)} runbooks: {issue_runbooks}[/bold]" - ) - elif console: - console.print( - "[bold]No runbooks found for this issue. Using default behaviour. (Add runbooks to guide the investigation.)[/bold]" - ) + console.print( + f"[bold]Analyzing with {len(issue_runbooks)} runbooks: {issue_runbooks}[/bold]" + ) + elif runbooks: + console.print( + "[bold]Runbook catalog available but no runbooks matched this issue.[/bold]" + ) + else: + console.print( + "[bold]No runbook catalog configured. Using default behavior.[/bold]" + )This distinguishes between three scenarios: catalog with matches, catalog without matches, and no catalog.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
holmes/core/supabase_dal.py(2 hunks)holmes/core/tool_calling_llm.py(5 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Use Ruff for formatting and linting (configured in pyproject.toml) for all Python code
Type hints are required; code should pass mypy (configured in pyproject.toml)
ALWAYS place Python imports at the top of the file, not inside functions or methods
Files:
holmes/core/supabase_dal.pyholmes/core/tool_calling_llm.py
🧠 Learnings (1)
📓 Common learnings
Learnt from: mainred
PR: robusta-dev/holmesgpt#829
File: holmes/plugins/runbooks/runbook-format.prompt.md:11-22
Timestamp: 2025-08-13T05:57:40.420Z
Learning: In holmes/plugins/runbooks/runbook-format.prompt.md, the user (mainred) prefers to keep the runbook step specifications simple without detailed orchestration metadata like step IDs, dependencies, retry policies, timeouts, and storage variables. The current format with Action, Function Description, Parameters, Expected Output, and Success/Failure Criteria is sufficient for their AI agent troubleshooting use case.
🧬 Code graph analysis (2)
holmes/core/supabase_dal.py (2)
holmes/plugins/runbooks/__init__.py (1)
RobustaRunbookInstruction(22-54)tests/llm/utils/mock_dal.py (2)
get_runbook_catalog(77-89)get_runbook_content(91-103)
holmes/core/tool_calling_llm.py (3)
holmes/plugins/runbooks/__init__.py (1)
RunbookCatalog(114-141)holmes/utils/global_instructions.py (2)
Instructions(10-11)add_runbooks_to_user_prompt(37-85)holmes/core/runbooks.py (1)
get_instructions_for_issue(11-26)
🪛 Ruff (0.14.1)
holmes/core/supabase_dal.py
432-432: Consider moving this statement to an else block
(TRY300)
464-464: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: llm_evals
- GitHub Check: Pre-commit checks
- GitHub Check: build
🔇 Additional comments (6)
holmes/core/supabase_dal.py (1)
36-36: LGTM!The import is correctly placed at the top of the file and necessary for the new runbook methods.
holmes/core/tool_calling_llm.py (5)
51-51: LGTM! Imports are correctly placed and used.The new imports for
RunbookCatalogandadd_runbooks_to_user_promptare properly placed at the top of the file and are used appropriately throughout the investigate method.Also applies to: 55-55
1047-1047: LGTM! Optional parameter addition is backward compatible.The new
runbooksparameter is correctly typed and optional, ensuring backward compatibility with existing callers.
1049-1049: LGTM! Runbook retrieval is correctly implemented.The call to
get_instructions_for_issuecorrectly retrieves issue-specific runbooks and the return type matches expected usage.
1086-1107: LGTM! Prompt construction correctly integrates runbook functionality.The system prompt rendering includes the
runbooks_enabledflag, and the user prompt is properly constructed usingadd_runbooks_to_user_promptwith all required parameters. The issue context formatting is also correct.
1121-1121: LGTM! Result instructions correctly set from issue-specific runbooks.The assignment of
issue_runbookstores.instructionsis type-safe and semantically correct, ensuring the result contains the matched instructions for this specific issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
holmes/plugins/toolsets/runbook/runbook_fetcher.py (1)
52-67: Set instance attributes explicitly; don’t pass unknown kwargs to base Tool.Avoid relying on Tool.init to store arbitrary kwargs. Set self.available_runbooks/self.additional_search_paths here.
- super().__init__( + super().__init__( name="fetch_runbook", description="Get runbook content by runbook link. Use this to get troubleshooting steps for incidents", parameters={ "runbook_id": ToolParameter( description=f"The runbook_id: either a UUID or a .md filename. Must be one of: {runbook_list}", type="string", required=True, ), }, - toolset=toolset, # type: ignore[call-arg] - available_runbooks=available_runbooks, # type: ignore[call-arg] - additional_search_paths=additional_search_paths, # type: ignore[call-arg] + toolset=toolset, # type: ignore[call-arg] ) - self._dal = dal + # Instance fields + self.available_runbooks = available_runbooks + self.additional_search_paths = additional_search_paths + self._dal = dal
♻️ Duplicate comments (4)
holmes/plugins/prompts/investigation_procedure.jinja2 (1)
84-86: Deduplicate repeated “process violation” blocks via a shared include.The same runbook violation message appears twice. Extract once to _runbook_instructions.jinja2 (already referenced in PR) to reduce drift.
Example:
-{% if runbooks_enabled -%} -- Not fetching relevant runbooks at the beginning of the investigation = PROCESS VIOLATION -{%- endif %} +{% if runbooks_enabled -%} +{% include "holmes/plugins/prompts/_runbook_instructions.jinja2" with context %} +{%- endif %}Also applies to: 195-196
holmes/plugins/toolsets/runbook/runbook_fetcher.py (2)
89-123: Improve DAL errors and exceptions; use logging.exception and explicit conversions.Preserve stack traces and avoid assuming UUID semantics in the disabled‑DAL path.
- def _get_robusta_runbook(self, link: str, params: dict) -> StructuredToolResult: + def _get_robusta_runbook(self, runbook_id: str, params: dict) -> StructuredToolResult: if self._dal and self._dal.enabled: try: - runbook_content = self._dal.get_runbook_content(link) + runbook_content = self._dal.get_runbook_content(runbook_id) if runbook_content: return StructuredToolResult( status=StructuredToolResultStatus.SUCCESS, data=runbook_content.pretty(), params=params, ) else: - err_msg = f"Runbook with UUID '{link}' not found in remote storage." - logging.error(err_msg) + err_msg = f"Runbook '{runbook_id!s}' not found in remote storage." + logging.error(err_msg) return StructuredToolResult( status=StructuredToolResultStatus.ERROR, error=err_msg, params=params, ) except Exception as e: - err_msg = f"Failed to fetch runbook with UUID '{link}': {str(e)}" - logging.error(err_msg) + err_msg = f"Failed to fetch runbook '{runbook_id!s}': {e!s}" + logging.exception(err_msg) return StructuredToolResult( status=StructuredToolResultStatus.ERROR, error=err_msg, params=params, ) else: - err_msg = "Runbook link appears to be a UUID, but no remote data access layer (dal) is enabled." - logging.error(err_msg) + err_msg = "Robusta runbook requested, but Supabase DAL is disabled." + logging.error(err_msg) return StructuredToolResult( status=StructuredToolResultStatus.ERROR, error=err_msg, params=params, )
26-28: Fix shared mutable class attributes; use per‑instance fields.Class-scope lists can leak state across instances (RUF012). Initialize in init and annotate ClassVar only if truly constant.
-class RunbookFetcher(Tool): - toolset: "RunbookToolset" - available_runbooks: List[str] = [] - additional_search_paths: Optional[List[str]] = None - _dal: Optional[SupabaseDal] = None +class RunbookFetcher(Tool): + toolset: "RunbookToolset" + _dal: Optional[SupabaseDal] = None + # instance attributes set in __init__: + # self.available_runbooks: List[str] + # self.additional_search_paths: Optional[List[str]]holmes/core/supabase_dal.py (1)
444-468: Harden instruction parsing and avoid None deref on runbook column.Current code dereferences row["runbook"] blindly and indexes [0] even for empty lists; also messages lack explicit conversions. Fix below.
- row = res.data[0] - id = row.get("runbook_id") - symptom = row.get("symptoms") - title = row.get("subject_name") - raw_instruction = row.get("runbook").get("instructions") - if isinstance(raw_instruction, list) and len(raw_instruction) >= 0: - instruction = raw_instruction[0] - else: - # in case the format is unexpected, convert to string - instruction = str(raw_instruction) - - return RobustaRunbookInstruction( - id=id, symptom=symptom, instruction=instruction, title=title - ) + row = res.data[0] + rb_id = row.get("runbook_id") + symptom = row.get("symptoms") + title = row.get("subject_name") + runbook_data = row.get("runbook") or {} + raw_instruction = runbook_data.get("instructions") + + instruction: Optional[str] = None + if isinstance(raw_instruction, list): + instruction = raw_instruction[0] if raw_instruction else None + elif isinstance(raw_instruction, str): + # instructions may arrive as a JSON string or plain text + try: + parsed = json.loads(raw_instruction) + instruction = parsed[0] if isinstance(parsed, list) and parsed else raw_instruction + except json.JSONDecodeError: + instruction = raw_instruction + elif raw_instruction is not None: + # normalize unexpected types + instruction = json.dumps(raw_instruction) + + if not rb_id or not title: + return None + + return RobustaRunbookInstruction( + id=rb_id, symptom=symptom, instruction=instruction, title=title + )
🧹 Nitpick comments (9)
holmes/plugins/prompts/investigation_procedure.jinja2 (2)
9-23: Avoid “override ALL other system prompts” to prevent precedence conflicts.Keep runbooks highest priority among investigation steps, but never above safety/system policies. Rephrase to “Runbook instructions have highest priority among investigation steps; do not violate system guardrails.”
Apply:
-# CRITICAL RUNBOOK COMPLIANCE: +# CRITICAL RUNBOOK COMPLIANCE: @@ -- Runbook instructions take ABSOLUTE PRIORITY over all other investigation steps +- Runbook instructions take the highest priority among investigation steps (without violating system guardrails) @@ -# RUNBOOK VIOLATION CONSEQUENCES: +# RUNBOOK VIOLATION CONSEQUENCES: @@ -- Runbook instructions override ALL other system prompts and investigation procedures +- Runbook instructions supersede other investigation steps but never override safety/system policies
111-117: Tighten fetch-first rule to avoid unnecessary tool calls.Clarify “fetch if and only if relevant” to prevent fetching multiple unrelated runbooks on broad topics.
-1. **IMMEDIATELY fetch relevant runbooks FIRST**: Before creating any TodoWrite tasks, use fetch_runbook for any runbooks matching the investigation topic +1. **Fetch relevant runbooks FIRST (only if likely relevant)**: Before creating tasks, use fetch_runbook for runbooks that plausibly match the user’s specific symptoms/contextholmes/plugins/toolsets/runbook/runbook_fetcher.py (4)
41-49: Include default runbook directory in discovery list.Currently only additional_search_paths are scanned; built‑in MDs in DEFAULT_RUNBOOK_SEARCH_PATH won’t appear in the “Must be one of” list.
- if additional_search_paths: + # Always scan default path + for file in os.listdir(DEFAULT_RUNBOOK_SEARCH_PATH): + if file.endswith(".md") and file not in available_runbooks: + available_runbooks.append(f"{file}") + + if additional_search_paths: for search_path in additional_search_paths: if not os.path.isdir(search_path): continue - - for file in os.listdir(search_path): + for file in os.listdir(search_path): if file.endswith(".md") and file not in available_runbooks: available_runbooks.append(f"{file}")
68-76: Lint and robustness: mark unused arg, case‑insensitive .md, clearer error.Small fixes improve clarity and Ruff compliance.
- def _invoke(self, params: dict, context: ToolInvokeContext) -> StructuredToolResult: + def _invoke(self, params: dict, context: ToolInvokeContext) -> StructuredToolResult: + _ = context # required by interface; unused (ARG002) runbook_id: str = params.get("runbook_id", "") - is_md_file: bool = True if runbook_id.endswith(".md") else False + is_md_file: bool = runbook_id.lower().endswith(".md") @@ - # Validate link is not empty - if not runbook_id or not runbook_id.strip(): + # Validate id is not empty + if not runbook_id or not runbook_id.strip(): err_msg = ( - "Runbook link cannot be empty. Please provide a valid runbook path." + "runbook_id cannot be empty. Provide a UUID or .md filename." ) logging.error(err_msg)
124-155: Strengthen MD path validation and naming consistency.Use “runbook_id” naming throughout and keep validations; also improve error text construction.
- def _get_md_runbook(self, link: str, params: dict) -> StructuredToolResult: + def _get_md_runbook(self, runbook_id: str, params: dict) -> StructuredToolResult: search_paths = [DEFAULT_RUNBOOK_SEARCH_PATH] if self.additional_search_paths: search_paths.extend(self.additional_search_paths) - # Validate link is in the available runbooks list OR is a valid path within allowed directories - if link not in self.available_runbooks: + # Validate runbook_id is known or resolves within allowed directories + if runbook_id not in self.available_runbooks: @@ - for search_path in search_paths: + for search_path in search_paths: - candidate_path = os.path.join(search_path, link) + candidate_path = os.path.join(search_path, runbook_id) @@ - if not is_valid_path: - err_msg = f"Invalid runbook link '{link}'. Must be one of: {', '.join(self.available_runbooks) if self.available_runbooks else 'No runbooks available'}" + if not is_valid_path: + allowed = ", ".join(self.available_runbooks) if self.available_runbooks else "No runbooks available" + err_msg = f"Invalid runbook_id '{runbook_id!s}'. Must be one of: {allowed}" logging.error(err_msg) return StructuredToolResult( status=StructuredToolResultStatus.ERROR, error=err_msg, params=params, )
157-217: Read with UTF‑8 and small logging nit.Use explicit encoding and conversion flags in error message.
- runbook_path = get_runbook_by_path(link, search_paths) + runbook_path = get_runbook_by_path(runbook_id, search_paths) @@ - with open(runbook_path, "r") as file: + with open(runbook_path, "r", encoding="utf-8") as file: content = file.read() @@ - except Exception as e: - err_msg = f"Failed to read runbook {runbook_path}: {str(e)}" - logging.error(err_msg) + except Exception as e: + err_msg = f"Failed to read runbook {runbook_path!s}: {e!s}" + logging.exception(err_msg) return StructuredToolResult( status=StructuredToolResultStatus.ERROR, error=err_msg, params=params, )holmes/core/supabase_dal.py (1)
407-436: Use logging.exception for catalog retrieval failures.Preserve stack traces and remove redundant exc_info=True where using exception().
- except Exception: - logging.exception("Failed to fetch RunbookCatalog", exc_info=True) + except Exception: + logging.exception("Failed to fetch RunbookCatalog")holmes/plugins/runbooks/__init__.py (2)
164-173: Prefer logging.exception over logging.error for file load failures.Keeps stack traces; also minor naming cleanup for catalog_path.
- catalogPath = os.path.join(dir_path, CATALOG_FILE) + catalog_path = os.path.join(dir_path, CATALOG_FILE) try: - if os.path.isfile(catalogPath): - with open(catalogPath) as file: + if os.path.isfile(catalog_path): + with open(catalog_path) as file: catalog_dict = json.load(file) catalog = RunbookCatalog(**catalog_dict) - except json.JSONDecodeError as e: - logging.error(f"Error decoding JSON from {catalogPath}: {e}") - except Exception as e: - logging.error( - f"Unexpected error while loading runbook catalog from {catalogPath}: {e}" - ) + except json.JSONDecodeError: + logging.exception(f"Error decoding JSON from {catalog_path}") + except Exception: + logging.exception(f"Unexpected error while loading runbook catalog from {catalog_path}")
175-188: Use logging.exception when merging Supabase runbooks.Preserve tracebacks; avoid shadowing variable names in except.
- if dal: - try: + if dal: + try: supabase_entries = dal.get_runbook_catalog() if not supabase_entries: return catalog if catalog: catalog.catalog.extend(supabase_entries) else: # if failed to load from file, create new catalog from supabase catalog = RunbookCatalog(catalog=supabase_entries) # type: ignore - except Exception as e: - logging.error(f"Error loading runbooks from Supabase: {e}") + except Exception: + logging.exception("Error loading runbooks from Supabase")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
holmes/core/supabase_dal.py(2 hunks)holmes/plugins/prompts/investigation_procedure.jinja2(6 hunks)holmes/plugins/runbooks/__init__.py(2 hunks)holmes/plugins/runbooks/catalog.json(1 hunks)holmes/plugins/toolsets/runbook/runbook_fetcher.py(6 hunks)
✅ Files skipped from review due to trivial changes (1)
- holmes/plugins/runbooks/catalog.json
🧰 Additional context used
📓 Path-based instructions (3)
holmes/plugins/prompts/**/*.jinja2
📄 CodeRabbit inference engine (CLAUDE.md)
Prompts must be stored as Jinja2 templates under holmes/plugins/prompts/{name}.jinja2
Files:
holmes/plugins/prompts/investigation_procedure.jinja2
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Use Ruff for formatting and linting (configured in pyproject.toml) for all Python code
Type hints are required; code should pass mypy (configured in pyproject.toml)
ALWAYS place Python imports at the top of the file, not inside functions or methods
Files:
holmes/core/supabase_dal.pyholmes/plugins/toolsets/runbook/runbook_fetcher.pyholmes/plugins/runbooks/__init__.py
holmes/plugins/toolsets/**/*
📄 CodeRabbit inference engine (CLAUDE.md)
Toolsets must be located as holmes/plugins/toolsets/{name}.yaml or holmes/plugins/toolsets/{name}/
Files:
holmes/plugins/toolsets/runbook/runbook_fetcher.py
🧠 Learnings (3)
📓 Common learnings
Learnt from: mainred
PR: robusta-dev/holmesgpt#829
File: holmes/plugins/runbooks/runbook-format.prompt.md:11-22
Timestamp: 2025-08-13T05:57:40.420Z
Learning: In holmes/plugins/runbooks/runbook-format.prompt.md, the user (mainred) prefers to keep the runbook step specifications simple without detailed orchestration metadata like step IDs, dependencies, retry policies, timeouts, and storage variables. The current format with Action, Function Description, Parameters, Expected Output, and Success/Failure Criteria is sufficient for their AI agent troubleshooting use case.
📚 Learning: 2025-08-30T18:12:58.187Z
Learnt from: CR
PR: robusta-dev/holmesgpt#0
File: holmes/plugins/runbooks/CLAUDE.md:0-0
Timestamp: 2025-08-30T18:12:58.187Z
Learning: Applies to holmes/plugins/runbooks/**/*.md : Each runbook must include the four sections with the specified content: 1) Goal, 2) Workflow for [Issue Category] Diagnosis, 3) Synthesize Findings, 4) Recommended Remediation Steps, forming a strictly executable, machine-readable yet human-understandable runbook.
Applied to files:
holmes/plugins/prompts/investigation_procedure.jinja2
📚 Learning: 2025-08-30T18:12:58.187Z
Learnt from: CR
PR: robusta-dev/holmesgpt#0
File: holmes/plugins/runbooks/CLAUDE.md:0-0
Timestamp: 2025-08-30T18:12:58.187Z
Learning: Applies to holmes/plugins/runbooks/**/*.md : In Recommended Remediation Steps, include Immediate Actions, Permanent Solutions, Verification Steps, Documentation References, Escalation Criteria, and Post-Remediation Monitoring.
Applied to files:
holmes/plugins/prompts/investigation_procedure.jinja2
🧬 Code graph analysis (3)
holmes/core/supabase_dal.py (2)
holmes/plugins/runbooks/__init__.py (1)
RobustaRunbookInstruction(22-66)tests/llm/utils/mock_dal.py (2)
get_runbook_catalog(77-89)get_runbook_content(91-103)
holmes/plugins/toolsets/runbook/runbook_fetcher.py (3)
holmes/core/supabase_dal.py (2)
SupabaseDal(65-660)get_runbook_content(438-468)holmes/config.py (1)
dal(121-124)holmes/plugins/runbooks/__init__.py (3)
load_runbook_catalog(157-189)list_available_runbooks(130-131)pretty(59-66)
holmes/plugins/runbooks/__init__.py (3)
holmes/utils/pydantic_utils.py (2)
RobustaBaseConfig(14-15)load_model_from_file(39-54)holmes/core/supabase_dal.py (2)
SupabaseDal(65-660)get_runbook_catalog(407-436)holmes/config.py (2)
dal(121-124)get_runbook_catalog(226-229)
🪛 Ruff (0.14.1)
holmes/core/supabase_dal.py
433-433: Consider moving this statement to an else block
(TRY300)
holmes/plugins/toolsets/runbook/runbook_fetcher.py
68-68: Unused method argument: context
(ARG002)
107-107: Do not catch blind exception: Exception
(BLE001)
108-108: Use explicit conversion flag
Replace with conversion flag
(RUF010)
109-109: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
holmes/plugins/runbooks/__init__.py
169-169: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
170-170: Do not catch blind exception: Exception
(BLE001)
171-173: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
186-186: Do not catch blind exception: Exception
(BLE001)
187-187: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: llm_evals
- GitHub Check: Pre-commit checks
- GitHub Check: build
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
holmes/core/supabase_dal.py (1)
438-471: Critical: Fix None dereference and add robust instruction parsing.Line 459 will raise
AttributeErrorifrow.get("runbook")returns None. Additionally, the current instruction parsing logic doesn't handle empty lists or edge cases robustly, and there's no validation of required fields before constructingRobustaRunbookInstruction.The previous review comment provides a comprehensive fix that addresses all these issues. Please implement the suggested changes to:
- Safely handle None for the runbook field
- Robustly parse instructions (handle empty lists, multi-element lists, and string formats)
- Validate all required fields (id, symptom, title) before returning
- Add error handling similar to
get_runbook_catalogConsider wrapping the method body in a try-except block for consistency with
get_runbook_catalog:def get_runbook_content( self, runbook_id: str ) -> Optional[RobustaRunbookInstruction]: if not self.enabled: return None + try: res = ( self.client.table(RUNBOOKS_TABLE) .select("*") .eq("account_id", self.account_id) .eq("subject_type", "RunbookCatalog") .eq("runbook_id", runbook_id) .execute() ) if not res.data or len(res.data) != 1: return None row = res.data[0] - id = row.get("runbook_id") - symptom = row.get("symptoms") - title = row.get("subject_name") - raw_instruction = row.get("runbook").get("instructions") - if isinstance(raw_instruction, list) and len(raw_instruction) == 1: - instruction = raw_instruction[0] - else: - # in case the format is unexpected, convert to string - logging.warning( - f"Unexpected runbook instruction format for runbook_id={runbook_id}: {raw_instruction}" - ) - instruction = str(raw_instruction) + rb_id = row.get("runbook_id") + symptom = row.get("symptoms") + title = row.get("subject_name") + runbook_data = row.get("runbook") or {} + raw_instruction = runbook_data.get("instructions") + + instruction_str: Optional[str] = None + if isinstance(raw_instruction, list): + instruction_str = raw_instruction[0] if raw_instruction else None + elif isinstance(raw_instruction, str): + try: + parsed = json.loads(raw_instruction) + instruction_str = parsed[0] if isinstance(parsed, list) and parsed else raw_instruction + except json.JSONDecodeError: + instruction_str = raw_instruction + + if not rb_id or not symptom or not title: + logging.warning("Skipping runbook with missing required fields: id=%s, title=%s, symptom=%s", rb_id, title, symptom) + return None return RobustaRunbookInstruction( - id=id, symptom=symptom, instruction=instruction, title=title + id=rb_id, symptom=symptom, instruction=instruction_str, title=title ) + except Exception: + logging.exception(f"Failed to fetch runbook content for {runbook_id}", exc_info=True) + return None
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
holmes/core/supabase_dal.py(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Use Ruff for formatting and linting (configured in pyproject.toml) for all Python code
Type hints are required; code should pass mypy (configured in pyproject.toml)
ALWAYS place Python imports at the top of the file, not inside functions or methods
Files:
holmes/core/supabase_dal.py
🧬 Code graph analysis (1)
holmes/core/supabase_dal.py (3)
holmes/plugins/runbooks/__init__.py (1)
RobustaRunbookInstruction(22-66)holmes/config.py (1)
get_runbook_catalog(226-229)tests/llm/utils/mock_dal.py (2)
get_runbook_catalog(77-89)get_runbook_content(91-103)
🪛 Ruff (0.14.1)
holmes/core/supabase_dal.py
433-433: Consider moving this statement to an else block
(TRY300)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: llm_evals
- GitHub Check: Pre-commit checks
- GitHub Check: build
🔇 Additional comments (1)
holmes/core/supabase_dal.py (1)
36-36: LGTM!The import is correctly placed at the top of the file and is required for the new runbook methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (3)
holmes/core/supabase_dal.py (3)
429-439: Validateidandtitlebefore constructing RobustaRunbookInstruction.Currently only
symptomis validated (line 434), butidandtitlecan also be None or empty fromrow.get(). SinceRobustaRunbookInstructionrequires all three fields, missingidortitlewill cause a Pydantic validation error that propagates up and causes the entire catalog fetch to fail.Apply this diff to validate all required fields:
instructions = [] for row in res.data: id = row.get("runbook_id") symptom = row.get("symptoms") title = row.get("subject_name") - if not symptom: - logging.warning("Skipping runbook with empty symptom: %s", id) + if not id or not symptom or not title: + logging.warning("Skipping runbook with missing required fields: id=%s, title=%s, symptom=%s", id, title, symptom) continue instructions.append( RobustaRunbookInstruction(id=id, symptom=symptom, title=title) )
466-466: Fix potential None dereference on runbook field.
row.get("runbook")may return None, and chaining.get("instructions")will raiseAttributeError. This issue was flagged in previous review comments but remains unaddressed.Apply this diff to safely handle None:
row = res.data[0] id = row.get("runbook_id") symptom = row.get("symptoms") title = row.get("subject_name") - raw_instruction = row.get("runbook").get("instructions") + runbook_data = row.get("runbook") + if not runbook_data: + logging.error(f"Missing runbook data for runbook_id={runbook_id}") + return None + raw_instruction = runbook_data.get("instructions")
482-484: Add validation for required fields before constructing RobustaRunbookInstruction.
id,symptom, andtitleextracted fromrow.get()can be None or empty. SinceRobustaRunbookInstructionrequires all three fields (they're not Optional), passing None will raise a Pydantic validation error. Add validation and return None if any required field is missing.Apply this diff:
+ if not id or not symptom or not title: + logging.error(f"Missing required fields for runbook_id={runbook_id}: id={id}, symptom={symptom}, title={title}") + return None + return RobustaRunbookInstruction( id=id, symptom=symptom, instruction=instruction, title=title )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
holmes/core/supabase_dal.py(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Use Ruff for formatting and linting (configured in pyproject.toml) for all Python code
Type hints are required; code should pass mypy (configured in pyproject.toml)
ALWAYS place Python imports at the top of the file, not inside functions or methods
Files:
holmes/core/supabase_dal.py
🧠 Learnings (1)
📓 Common learnings
Learnt from: mainred
PR: robusta-dev/holmesgpt#829
File: holmes/plugins/runbooks/runbook-format.prompt.md:11-22
Timestamp: 2025-08-13T05:57:40.420Z
Learning: In holmes/plugins/runbooks/runbook-format.prompt.md, the user (mainred) prefers to keep the runbook step specifications simple without detailed orchestration metadata like step IDs, dependencies, retry policies, timeouts, and storage variables. The current format with Action, Function Description, Parameters, Expected Output, and Success/Failure Criteria is sufficient for their AI agent troubleshooting use case.
🧬 Code graph analysis (1)
holmes/core/supabase_dal.py (2)
holmes/plugins/runbooks/__init__.py (1)
RobustaRunbookInstruction(22-66)tests/llm/utils/mock_dal.py (2)
get_runbook_catalog(77-89)get_runbook_content(91-103)
🪛 Ruff (0.14.1)
holmes/core/supabase_dal.py
440-440: Consider moving this statement to an else block
(TRY300)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Pre-commit checks
- GitHub Check: llm_evals
- GitHub Check: build
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
holmes/core/supabase_dal.py (2)
429-439: Validate id and title before constructing RobustaRunbookInstruction.This issue was previously flagged but remains unaddressed. Only
symptomis validated (line 434), butidandtitlecan still be None or empty fromrow.get(). SinceRobustaRunbookInstructionrequires all three fields as non-optional strings, missing validation will cause Pydantic validation errors downstream.Apply this diff to validate all required fields:
instructions = [] for row in res.data: id = row.get("runbook_id") symptom = row.get("symptoms") title = row.get("subject_name") - if not symptom: - logging.warning("Skipping runbook with empty symptom: %s", id) + if not id or not symptom or not title: + logging.warning("Skipping runbook with missing required fields: id=%s, title=%s, symptom=%s", id, title, symptom) continue instructions.append( RobustaRunbookInstruction(id=id, symptom=symptom, title=title) )
466-466: Fix potential AttributeError from None dereference.This critical issue was previously flagged but remains unaddressed. If
row.get("runbook")returns None, calling.get("instructions")will raiseAttributeError. You must safely check for None before accessing nested fields.Apply this diff to safely handle None:
row = res.data[0] id = row.get("runbook_id") symptom = row.get("symptoms") title = row.get("subject_name") - raw_instruction = row.get("runbook").get("instructions") + runbook_data = row.get("runbook") + if not runbook_data: + logging.error(f"Missing runbook data for runbook_id={runbook_id}") + return None + raw_instruction = runbook_data.get("instructions")
🧹 Nitpick comments (1)
holmes/core/supabase_dal.py (1)
440-443: Consider using else block for cleaner try-except pattern.Static analysis suggests moving the success-case
return instructionsinto anelseblock to follow the try-except-else pattern more explicitly.Apply this diff:
return instructions - except Exception: + except Exception: logging.exception("Failed to fetch RunbookCatalog", exc_info=True) return None + else: + return instructionsBased on static analysis hints.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
holmes/core/supabase_dal.py(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Use Ruff for formatting and linting (configured in pyproject.toml) for all Python code
Type hints are required; code should pass mypy (configured in pyproject.toml)
ALWAYS place Python imports at the top of the file, not inside functions or methods
Files:
holmes/core/supabase_dal.py
🧬 Code graph analysis (1)
holmes/core/supabase_dal.py (2)
holmes/plugins/runbooks/__init__.py (1)
RobustaRunbookInstruction(22-66)tests/llm/utils/mock_dal.py (2)
get_runbook_catalog(77-89)get_runbook_content(91-103)
🪛 Ruff (0.14.1)
holmes/core/supabase_dal.py
440-440: Consider moving this statement to an else block
(TRY300)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Pre-commit checks
- GitHub Check: llm_evals
- GitHub Check: build
| id = row.get("runbook_id") | ||
| symptom = row.get("symptoms") | ||
| title = row.get("subject_name") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validate required fields before constructing RobustaRunbookInstruction.
id, symptom, and title are extracted from row.get() without validation. Since RobustaRunbookInstruction requires all three as non-optional fields, None or empty values will cause Pydantic validation errors. Add validation similar to get_runbook_catalog.
Apply this diff to add validation:
row = res.data[0]
id = row.get("runbook_id")
symptom = row.get("symptoms")
title = row.get("subject_name")
+ if not id or not symptom or not title:
+ logging.error(f"Missing required fields for runbook_id={runbook_id}: id={id}, symptom={symptom}, title={title}")
+ return None
runbook_data = row.get("runbook")Also applies to: 483-485
🤖 Prompt for AI Agents
In holmes/core/supabase_dal.py around lines 463-465 and similarly at 483-485,
id, symptom, and title are taken via row.get() without validation before
constructing RobustaRunbookInstruction; validate each required field (ensure not
None and not empty string), raise or skip the row with a clear error/log message
if validation fails, and only pass validated non-empty values into
RobustaRunbookInstruction (mirror the validation approach used in
get_runbook_catalog to centralize checks and avoid Pydantic validation errors).
This is an example of what is added to the beginning of the user prompt for the runbooks if there are global instructions, runbooks, or issue/workload instructions (the sections are created dynamically in the prompt in order based on what the user has configured)
This is an example of how it looks fetching a a robusta runbook