-
Couldn't load subscription status.
- Fork 182
Improvements to loki and datadog logging #1016
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,75 @@ | ||
| ## Loki Logs Tools Usage Guide | ||
|
|
||
| Before running logs queries: | ||
|
|
||
| ** You are often (but not always) running in a kubernetes environment. So users might ask you questions about kubernetes workloads without explicitly stating their type. | ||
| ** When getting ambiguous questions, use kubectl_find_resource to find the resource you are being asked about! | ||
| ** Find the involved resource name and kind | ||
| ** If you can't figure out what is the type of the resource, ask the user for more information and don't guess | ||
|
|
||
| ### General guideline | ||
| - This toolset is used to read pod logs from Loki, a log aggregation system | ||
| - Assume the pod should have logs. If logs not found, try to adjust the query | ||
| - Loki stores historical logs, so you can query logs for pods that no longer exist | ||
|
|
||
| ### CRITICAL: Pod Name Resolution Workflow | ||
|
|
||
| **IMPORTANT WILDCARD USAGE:** | ||
| - **ALWAYS use wildcards** when searching for pods unless you have the COMPLETE pod name with all suffixes | ||
| - Kubernetes pod names include deployment hash + replica ID (e.g., `nginx-ingress-7b9899-x2km9`, `frontend-5f4d3b2a1-abc123`) | ||
| - When user says "nginx pod" or "frontend pod", search for `nginx-*` or `frontend-*` NOT just `nginx` or `frontend` | ||
| - Loki supports wildcards: `*` matches any characters (e.g., `nginx-*`, `*ingress*`, `*-x2km9`) | ||
| - For partial matches, use wildcards on both sides: `*keyword*` to find logs from any pod containing "keyword" | ||
|
|
||
| **When user provides what looks like a complete pod name** (e.g., `my-workload-5f9d8b7c4d-x2km9`): | ||
| - Query Loki directly with that exact pod name | ||
| - Do NOT try to verify if the pod exists in Kubernetes first | ||
| - This allows querying historical pods that have been deleted/replaced | ||
|
|
||
| **When user provides a simple/generic name** (e.g., "nginx", "redis", "payment-service", "auth"): | ||
| - **DEFAULT ACTION: Use wildcards** - Query with `pod-name-*` pattern | ||
| - For historical queries (yesterday, last week): ALWAYS use wildcards directly in Loki | ||
| - For current issues: Optionally use `kubectl_find_resource` to find exact pod names, but wildcards often work better | ||
| - Examples: | ||
| - User says "nginx pod" → Query Loki with `nginx-*` | ||
| - User says "redis instance" → Query Loki with `redis-*` | ||
| - User says "payment service" → Query Loki with `payment-*` | ||
|
|
||
| **Why wildcards are critical:** | ||
| - Pod names in Loki are the actual Kubernetes pod names (with random suffixes) | ||
| - Users typically refer to pods by their deployment/service name without suffixes | ||
| - Without wildcards, queries for "nginx" will find NOTHING when actual pods are named "nginx-7b9899-x2km9" | ||
| - Historical pods that no longer exist can only be found via Loki with proper wildcard usage | ||
|
|
||
| ### Time Parameters | ||
| - Use RFC3339 format: `2023-03-01T10:30:00Z` | ||
| - Or relative seconds: `-3600` for 1 hour ago | ||
| - Defaults to 7 days window if not specified | ||
|
|
||
| ### Filter Usage | ||
| - The `filter` parameter performs substring matching on log content | ||
| - Use it to search for specific error messages, keywords, or patterns | ||
| - Example: filter="error" will return only logs containing the word "error" | ||
|
|
||
| ### Common Investigation Patterns | ||
|
|
||
| **For Current Pod Issues:** | ||
| 1. User asks: "Show logs for payment service" | ||
| 2. Query Loki with pod_name="payment-*" to get all pods matching that pattern | ||
| 3. Apply time ranges and filters as needed | ||
|
|
||
| **For Historical Pod Issues:** | ||
| 1. User asks: "What happened to the payment-api pod yesterday at 2pm?" | ||
| 2. Query Loki directly with pod_name="payment-*" and appropriate time range | ||
| 3. No need to verify pod existence - Loki has the historical data | ||
|
|
||
| **For Debugging Deleted/Restarted Pods:** | ||
| 1. Pod may no longer exist in cluster | ||
| 2. Use wildcards to find all historical logs for that workload | ||
| 3. Loki retains logs even after pods are gone | ||
|
|
||
| ### Important Notes | ||
| - Always inform the user about the actual time period fetched | ||
| - If a limit was applied, tell the user how many logs were shown | ||
| - If filters were applied, mention them explicitly | ||
| - If no logs are found, suggest broader wildcards or time ranges |
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -1,3 +1,4 @@ | ||||||||||||||||||||||||||
| import os | ||||||||||||||||||||||||||
| from typing import Any, cast, Set | ||||||||||||||||||||||||||
| from pydantic import BaseModel | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
|
|
@@ -23,7 +24,11 @@ | |||||||||||||||||||||||||
| from holmes.plugins.toolsets.grafana.loki_api import ( | ||||||||||||||||||||||||||
| query_loki_logs_by_label, | ||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||
| from holmes.core.tools import StructuredToolResult, StructuredToolResultStatus | ||||||||||||||||||||||||||
| from holmes.core.tools import ( | ||||||||||||||||||||||||||
| StructuredToolResult, | ||||||||||||||||||||||||||
| StructuredToolResultStatus, | ||||||||||||||||||||||||||
| ToolParameter, | ||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| class GrafanaLokiLabelsConfig(BaseModel): | ||||||||||||||||||||||||||
|
|
@@ -35,6 +40,24 @@ class GrafanaLokiConfig(GrafanaConfig): | |||||||||||||||||||||||||
| labels: GrafanaLokiLabelsConfig = GrafanaLokiLabelsConfig() | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| class LokiPodLoggingTool(PodLoggingTool): | ||||||||||||||||||||||||||
| """Custom pod logging tool for Loki with wildcard support""" | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| def _get_tool_parameters(self, toolset: BasePodLoggingToolset) -> dict: | ||||||||||||||||||||||||||
| """Override to add wildcard support to pod_name parameter""" | ||||||||||||||||||||||||||
| # Get base parameters from parent | ||||||||||||||||||||||||||
| params = super()._get_tool_parameters(toolset) | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| # Override pod_name description to indicate wildcard support | ||||||||||||||||||||||||||
| params["pod_name"] = ToolParameter( | ||||||||||||||||||||||||||
| description="The kubernetes pod name. Use '*' to fetch logs from all pods in the namespace, or use wildcards like 'payment-*' to match multiple pods", | ||||||||||||||||||||||||||
| type="string", | ||||||||||||||||||||||||||
| required=True, | ||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| return params | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| class GrafanaLokiToolset(BasePodLoggingToolset): | ||||||||||||||||||||||||||
| @property | ||||||||||||||||||||||||||
| def supported_capabilities(self) -> Set[LoggingCapability]: | ||||||||||||||||||||||||||
|
|
@@ -51,7 +74,9 @@ def __init__(self): | |||||||||||||||||||||||||
| tools=[], # Initialize with empty tools first | ||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||
| # Now that parent is initialized and self.name exists, create the tool | ||||||||||||||||||||||||||
| self.tools = [PodLoggingTool(self)] | ||||||||||||||||||||||||||
| # Use our custom LokiPodLoggingTool with wildcard support | ||||||||||||||||||||||||||
| self.tools = [LokiPodLoggingTool(self)] | ||||||||||||||||||||||||||
| self._reload_instructions() | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| def prerequisites_callable(self, config: dict[str, Any]) -> tuple[bool, str]: | ||||||||||||||||||||||||||
| if not config: | ||||||||||||||||||||||||||
|
|
@@ -109,3 +134,10 @@ def fetch_pod_logs(self, params: FetchPodLogsParams) -> StructuredToolResult: | |||||||||||||||||||||||||
| status=StructuredToolResultStatus.NO_DATA, | ||||||||||||||||||||||||||
| params=params.model_dump(), | ||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| def _reload_instructions(self): | ||||||||||||||||||||||||||
| """Load Loki specific instructions.""" | ||||||||||||||||||||||||||
| template_file_path = os.path.abspath( | ||||||||||||||||||||||||||
| os.path.join(os.path.dirname(__file__), "loki_instructions.jinja2") | ||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||
| self._load_llm_instructions(jinja_template=f"file://{template_file_path}") | ||||||||||||||||||||||||||
|
Comment on lines
+138
to
+143
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chainAdd return type hint for mypy compliance. The method is missing a return type hint, which is required by the coding guidelines. Add Apply this diff: - def _reload_instructions(self):
+ def _reload_instructions(self) -> None:
"""Load Loki specific instructions."""Verify the loki_instructions.jinja2 template file exists. Ensure the template file is present in the same directory as this file, as the path construction expects it to be co-located. Run this script to verify the template file exists: 🏁 Script executed: #!/bin/bash
# Verify that loki_instructions.jinja2 exists in the same directory
fd -t f "loki_instructions.jinja2" holmes/plugins/toolsets/grafana/Length of output: 56 Add missing return type hint for mypy compliance. Apply this diff: - def _reload_instructions(self):
+ def _reload_instructions(self) -> None:
"""Load Loki specific instructions."""📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||||||||||||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add exception chaining to preserve context.
The enhanced error message with query details is excellent for debugging. However, line 70 should use
raise ... from eto preserve the exception chain, which is important for debugging and traceback clarity.Apply this diff:
error_details = ( f"Failed to query Loki logs:\n" f" URL: {url}\n" f" Query: {query}\n" f" Time range: {start} to {end}\n" f" Limit: {limit}\n" - f" Error: {str(e)}" + f" Error: {e!s}" ) - raise Exception(error_details) + raise Exception(error_details) from eOptional improvements:
{e!s}instead of{str(e)}for cleaner f-string formatting.LokiQueryErrorexception class instead of genericExceptionfor better error handling downstream.📝 Committable suggestion
🧰 Tools
🪛 Ruff (0.13.1)
68-68: Use explicit conversion flag
Replace with conversion flag
(RUF010)
70-70: Within an
exceptclause, raise exceptions withraise ... from errorraise ... from Noneto distinguish them from errors in exception handling(B904)
70-70: Create your own exception
(TRY002)
🤖 Prompt for AI Agents