Skip to content

Commit f329796

Browse files
Copilotdevinslick
andcommitted
Add macros to identify dashboards with health and performance issues
Co-authored-by: devinslick <[email protected]>
1 parent d96e41c commit f329796

File tree

2 files changed

+115
-8
lines changed

2 files changed

+115
-8
lines changed

README.md

Lines changed: 84 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -210,36 +210,112 @@ Replace `YOUR_DASHBOARD_NAME` with your dashboard's pretty name from the registr
210210

211211
### Using Search Macros
212212

213-
CACA provides several search macros for easy querying:
213+
CACA provides several search macros for easy querying. These macros help you quickly identify dashboards with issues, analyze performance, and understand usage patterns.
214214

215-
#### Get dashboard stats:
215+
#### Finding Dashboards with Issues
216+
217+
##### Identify dashboards with health issues (errors/warnings):
218+
```spl
219+
`get_dashboards_with_errors`
220+
```
221+
**Returns:** Dashboards with errors or warnings in the last 7 days, sorted by severity
222+
**Columns:** pretty_name, app, errors, warnings, total_issues, health_status
223+
**Use case:** Find dashboards that are generating errors and need attention
224+
225+
##### Identify slow-performing dashboards:
226+
```spl
227+
`get_slow_dashboards`
228+
```
229+
**Returns:** Dashboards with average load time > 3 seconds in the last 7 days
230+
**Columns:** pretty_name, app, avg_load_time_7d, performance_status
231+
**Use case:** Find dashboards that need performance optimization
232+
233+
##### Identify all problematic dashboards (health OR performance issues):
234+
```spl
235+
`get_problematic_dashboards`
236+
```
237+
**Returns:** Dashboards with critical/warning health status OR slow performance
238+
**Columns:** pretty_name, app, views_7d, errors_7d, avg_load_time_7d, health_status, issue_type
239+
**Use case:** Get a comprehensive list of all dashboards needing attention
240+
241+
**Example - Filter for critical issues only:**
242+
```spl
243+
`get_problematic_dashboards`
244+
| where health_status="critical" OR avg_load_time_7d > 10000
245+
```
246+
247+
#### Dashboard Analytics
248+
249+
##### Get comprehensive stats for a specific dashboard:
216250
```spl
217251
`get_dashboard_stats("My Dashboard Name")`
218252
```
253+
**Returns:** All metrics (views, edits, errors, load time) for the specified dashboard
254+
**Use case:** Deep dive into a specific dashboard's activity
219255

220-
#### Get all dashboards summary:
256+
##### Get all dashboards summary:
221257
```spl
222258
`get_all_dashboards_summary`
223259
```
260+
**Returns:** Summary of all dashboards with 7-day metrics
261+
**Columns:** pretty_name, app, views_7d, edits_7d, errors_7d, avg_load_time_7d, health_status
262+
**Use case:** Get an overview of all dashboard health and activity
224263

225-
#### Get top dashboards by views:
264+
##### Get top dashboards by metric type:
226265
```spl
227266
`get_top_dashboards(views)`
267+
`get_top_dashboards(edits)`
228268
```
269+
**Returns:** Top 10 dashboards by views or edits in the last 7 days
270+
**Use case:** Identify most-used or most-edited dashboards
229271

230-
#### Get dashboard performance:
272+
#### Performance Analysis
273+
274+
##### Get performance rating for a specific dashboard:
231275
```spl
232276
`get_dashboard_performance("My Dashboard Name")`
233277
```
278+
**Returns:** Average load time and performance rating (Excellent/Good/Fair/Poor)
279+
**Use case:** Check if a dashboard meets performance standards
280+
281+
##### Get last viewed time for a dashboard:
282+
```spl
283+
`get_dashboard_last_viewed("My Dashboard Name")`
284+
```
285+
**Returns:** Last viewed timestamp and days since last view
286+
**Use case:** Identify stale or unused dashboards
287+
288+
#### Common Use Cases
289+
290+
**Find all dashboards needing immediate attention:**
291+
```spl
292+
`get_problematic_dashboards`
293+
| where health_status="critical" OR (errors_7d > 50) OR (avg_load_time_7d > 10000)
294+
```
234295

235-
#### Get slow dashboards:
296+
**List dashboards with errors that are actively used:**
297+
```spl
298+
`get_dashboards_with_errors`
299+
| where errors > 0
300+
| join type=inner pretty_name [| mstats sum(_value) as views WHERE index=caca_metrics AND metric_name="dashboard.views" BY pretty_name span=1d | where _time >= relative_time(now(), "-7d") | stats sum(views) as views_7d by pretty_name | where views_7d > 10]
301+
| table pretty_name app errors warnings views_7d health_status
302+
```
303+
304+
**Find slow dashboards with high usage:**
236305
```spl
237306
`get_slow_dashboards`
307+
| join type=inner pretty_name [| mstats sum(_value) as views WHERE index=caca_metrics AND metric_name="dashboard.views" BY pretty_name span=1d | where _time >= relative_time(now(), "-7d") | stats sum(views) as views_7d by pretty_name]
308+
| where views_7d > 50
309+
| table pretty_name app avg_load_time_7d performance_status views_7d
310+
| sort -views_7d
238311
```
239312

240-
#### Get last viewed time:
313+
**Dashboard health report for a specific app:**
241314
```spl
242-
`get_dashboard_last_viewed("My Dashboard Name")`
315+
`get_all_dashboards_summary`
316+
| where app="search"
317+
| table pretty_name views_7d edits_7d errors_7d avg_load_time_7d health_status
318+
| sort -errors_7d
243319
```
244320

245321
## Configuration

default/macros.conf

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -92,3 +92,34 @@ definition = | mstats avg(_value) as avg_load_time WHERE index=caca_metrics AND
9292
1=1, "Good") \
9393
| sort -avg_load_time_7d
9494
iseval = 0
95+
96+
[get_dashboards_with_errors]
97+
definition = | mstats sum(_value) as error_count WHERE index=caca_metrics AND metric_name="dashboard.errors" BY pretty_name, severity, app span=1d \
98+
| where _time >= relative_time(now(), "-7d") \
99+
| stats sum(error_count) as total_errors by pretty_name, severity, app \
100+
| eval {severity}=total_errors \
101+
| stats sum(error) as errors sum(warn) as warnings values(error) as has_errors values(warn) as has_warns sum(total_errors) as total_issues by pretty_name, app \
102+
| fillnull value=0 errors warnings \
103+
| where total_issues &gt; 0 \
104+
| eval health_status=case(\
105+
errors &gt; 50, "Critical",\
106+
errors &gt; 10, "High",\
107+
errors &gt; 0, "Medium",\
108+
warnings &gt; 20, "Medium",\
109+
1=1, "Low") \
110+
| sort -errors -warnings
111+
iseval = 0
112+
113+
[get_problematic_dashboards]
114+
definition = `get_all_dashboards_summary` \
115+
| where health_status="critical" OR health_status="warning" OR avg_load_time_7d &gt; 5000 \
116+
| eval issue_type=case(\
117+
health_status="critical" AND avg_load_time_7d &gt; 5000, "Health + Performance",\
118+
health_status="critical", "Health Issues",\
119+
health_status="warning" AND avg_load_time_7d &gt; 5000, "Health + Performance",\
120+
health_status="warning", "Health Issues",\
121+
avg_load_time_7d &gt; 5000, "Performance Issues",\
122+
1=1, "Other") \
123+
| table pretty_name app views_7d errors_7d avg_load_time_7d health_status issue_type \
124+
| sort -errors_7d -avg_load_time_7d
125+
iseval = 0

0 commit comments

Comments
 (0)