Add macros to identify dashboards with health and performance issues

Copilot · devinslick · Copilot · commit f3297967af74 · 2025-11-16T13:48:45.000Z
Co-authored-by: devinslick &lt;1762071+devinslick@users.noreply.github.com&gt;
diff --git a/README.md b/README.md
@@ -210,36 +210,112 @@ Replace `YOUR_DASHBOARD_NAME` with your dashboard's pretty name from the registr
 
 ### Using Search Macros
 
-CACA provides several search macros for easy querying:
+CACA provides several search macros for easy querying. These macros help you quickly identify dashboards with issues, analyze performance, and understand usage patterns.
 
-#### Get dashboard stats:
+#### Finding Dashboards with Issues
+
+##### Identify dashboards with health issues (errors/warnings):
+```spl
+`get_dashboards_with_errors`
+```
+**Returns:** Dashboards with errors or warnings in the last 7 days, sorted by severity
+**Columns:** pretty_name, app, errors, warnings, total_issues, health_status
+**Use case:** Find dashboards that are generating errors and need attention
+
+##### Identify slow-performing dashboards:
+```spl
+`get_slow_dashboards`
+```
+**Returns:** Dashboards with average load time > 3 seconds in the last 7 days
+**Columns:** pretty_name, app, avg_load_time_7d, performance_status
+**Use case:** Find dashboards that need performance optimization
+
+##### Identify all problematic dashboards (health OR performance issues):
+```spl
+`get_problematic_dashboards`
+```
+**Returns:** Dashboards with critical/warning health status OR slow performance
+**Columns:** pretty_name, app, views_7d, errors_7d, avg_load_time_7d, health_status, issue_type
+**Use case:** Get a comprehensive list of all dashboards needing attention
+
+**Example - Filter for critical issues only:**
+```spl
+`get_problematic_dashboards`
+| where health_status="critical" OR avg_load_time_7d > 10000
+```
+
+#### Dashboard Analytics
+
+##### Get comprehensive stats for a specific dashboard:
 ```spl
 `get_dashboard_stats("My Dashboard Name")`
 ```
+**Returns:** All metrics (views, edits, errors, load time) for the specified dashboard
+**Use case:** Deep dive into a specific dashboard's activity
 
-#### Get all dashboards summary:
+##### Get all dashboards summary:
 ```spl
 `get_all_dashboards_summary`
 ```
+**Returns:** Summary of all dashboards with 7-day metrics
+**Columns:** pretty_name, app, views_7d, edits_7d, errors_7d, avg_load_time_7d, health_status
+**Use case:** Get an overview of all dashboard health and activity
 
-#### Get top dashboards by views:
+##### Get top dashboards by metric type:
 ```spl
 `get_top_dashboards(views)`
+`get_top_dashboards(edits)`
 ```
+**Returns:** Top 10 dashboards by views or edits in the last 7 days
+**Use case:** Identify most-used or most-edited dashboards
 
-#### Get dashboard performance:
+#### Performance Analysis
+
+##### Get performance rating for a specific dashboard:
 ```spl
 `get_dashboard_performance("My Dashboard Name")`
 ```
+**Returns:** Average load time and performance rating (Excellent/Good/Fair/Poor)
+**Use case:** Check if a dashboard meets performance standards
+
+##### Get last viewed time for a dashboard:
+```spl
+`get_dashboard_last_viewed("My Dashboard Name")`
+```
+**Returns:** Last viewed timestamp and days since last view
+**Use case:** Identify stale or unused dashboards
+
+#### Common Use Cases
+
+**Find all dashboards needing immediate attention:**
+```spl
+`get_problematic_dashboards`
+| where health_status="critical" OR (errors_7d > 50) OR (avg_load_time_7d > 10000)
+```
 
-#### Get slow dashboards:
+**List dashboards with errors that are actively used:**
+```spl
+`get_dashboards_with_errors`
+| where errors > 0 
+| join type=inner pretty_name [| mstats sum(_value) as views WHERE index=caca_metrics AND metric_name="dashboard.views" BY pretty_name span=1d | where _time >= relative_time(now(), "-7d") | stats sum(views) as views_7d by pretty_name | where views_7d > 10]
+| table pretty_name app errors warnings views_7d health_status
+```
+
+**Find slow dashboards with high usage:**
 ```spl
 `get_slow_dashboards`
+| join type=inner pretty_name [| mstats sum(_value) as views WHERE index=caca_metrics AND metric_name="dashboard.views" BY pretty_name span=1d | where _time >= relative_time(now(), "-7d") | stats sum(views) as views_7d by pretty_name]
+| where views_7d > 50
+| table pretty_name app avg_load_time_7d performance_status views_7d
+| sort -views_7d
 ```
 
-#### Get last viewed time:
+**Dashboard health report for a specific app:**
 ```spl
-`get_dashboard_last_viewed("My Dashboard Name")`
+`get_all_dashboards_summary`
+| where app="search" 
+| table pretty_name views_7d edits_7d errors_7d avg_load_time_7d health_status
+| sort -errors_7d
 ```
 
 ## Configuration
diff --git a/default/macros.conf b/default/macros.conf
@@ -92,3 +92,34 @@ definition = | mstats avg(_value) as avg_load_time WHERE index=caca_metrics AND
     1=1, "Good") \
 | sort -avg_load_time_7d
 iseval = 0
+
+[get_dashboards_with_errors]
+definition = | mstats sum(_value) as error_count WHERE index=caca_metrics AND metric_name="dashboard.errors" BY pretty_name, severity, app span=1d \
+| where _time >= relative_time(now(), "-7d") \
+| stats sum(error_count) as total_errors by pretty_name, severity, app \
+| eval {severity}=total_errors \
+| stats sum(error) as errors sum(warn) as warnings values(error) as has_errors values(warn) as has_warns sum(total_errors) as total_issues by pretty_name, app \
+| fillnull value=0 errors warnings \
+| where total_issues &gt; 0 \
+| eval health_status=case(\
+    errors &gt; 50, "Critical",\
+    errors &gt; 10, "High",\
+    errors &gt; 0, "Medium",\
+    warnings &gt; 20, "Medium",\
+    1=1, "Low") \
+| sort -errors -warnings
+iseval = 0
+
+[get_problematic_dashboards]
+definition = `get_all_dashboards_summary` \
+| where health_status="critical" OR health_status="warning" OR avg_load_time_7d &gt; 5000 \
+| eval issue_type=case(\
+    health_status="critical" AND avg_load_time_7d &gt; 5000, "Health + Performance",\
+    health_status="critical", "Health Issues",\
+    health_status="warning" AND avg_load_time_7d &gt; 5000, "Health + Performance",\
+    health_status="warning", "Health Issues",\
+    avg_load_time_7d &gt; 5000, "Performance Issues",\
+    1=1, "Other") \
+| table pretty_name app views_7d errors_7d avg_load_time_7d health_status issue_type \
+| sort -errors_7d -avg_load_time_7d
+iseval = 0