Skip to content

Commit dd73650

Browse files
authored
Merge pull request #1 from devinslick/copilot/review-metrics-dashboard-functionality
Add dashboard performance tracking, search macros for issue identification, and app filtering
2 parents b2802f3 + 37b66ae commit dd73650

File tree

9 files changed

+538
-24
lines changed

9 files changed

+538
-24
lines changed

README.md

Lines changed: 183 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ While the Splunk Monitoring Console (DMC) focuses on system-level health and per
2626
- **Usage Tracking**: Track how many times each dashboard is viewed, by which users, and when
2727
- **Edit History**: Monitor when dashboards are created or modified, and by whom
2828
- **Health Monitoring**: Detect and track dashboard errors and warnings from internal logs
29+
- **Performance Monitoring**: Track dashboard load times and identify slow-performing dashboards
2930
- **Stale Dashboard Detection**: Identify dashboards that haven't been accessed in 30+ days
3031

3132
#### Metrics & Analytics
@@ -56,13 +57,14 @@ While the Splunk Monitoring Console (DMC) focuses on system-level health and per
5657

5758
The app uses a three-stage pipeline for efficiency:
5859

59-
1. **Collect**: Scheduled searches analyze Splunk's internal logs (`_internal`, `_audit`) to track views, edits, and errors
60+
1. **Collect**: Scheduled searches analyze Splunk's internal logs (`_internal`, `_audit`) to track views, edits, errors, and performance
6061
2. **Store**: Metrics are written to a dedicated metrics index using `mcollect` for optimal performance
6162
3. **Query**: Fast retrieval via `mstats` command and reusable search macros
6263

6364
**Note on Scheduled Searches**: CACA is powered by lightweight scheduled searches that run at the following intervals:
6465
- Dashboard views: Every 5 minutes
65-
- Dashboard edits: Every 10 minutes
66+
- Dashboard edits: Every 10 minutes
67+
- Dashboard performance: Every 10 minutes
6668
- Dashboard health: Every 15 minutes
6769
- Registry updates: Daily at 2 AM
6870

@@ -140,6 +142,7 @@ Navigate to **Settings → Searches, reports, and alerts** and enable these sear
140142

141143
- **Dashboard Views - Metrics Collector** (runs every 5 minutes)
142144
- **Dashboard Edits - Metrics Collector** (runs every 10 minutes)
145+
- **Dashboard Performance - Metrics Collector** (runs every 10 minutes)
143146
- **Dashboard Health - Metrics Collector** (runs every 15 minutes)
144147
- **Dashboard Registry - Auto Update** (runs daily at 2 AM)
145148

@@ -161,20 +164,21 @@ Allow 15-30 minutes for the initial data collection to populate the metrics inde
161164

162165
Navigate to **CACA → Dashboard Leaderboard** to view:
163166

164-
- **High-Level KPIs**: Total dashboards, views, errors, and stale dashboards
165-
- **Activity Leaderboard Table**: Sortable list of all dashboards with metrics
166-
- **Trending Charts**: Views and errors over time
167-
- **Top Dashboards**: Most viewed and most edited dashboards
167+
- **High-Level KPIs**: Total dashboards, views, errors, average load time, and stale dashboards
168+
- **Activity Leaderboard Table**: Sortable list of all dashboards with usage, health, and performance metrics
169+
- **Trending Charts**: Views, errors, and load time trends over time
170+
- **Top Dashboards**: Most viewed, most edited, and slowest dashboards
168171

169172
### Dashboard Details
170173

171174
Click any dashboard in the leaderboard to view detailed metrics:
172175

173-
- Total views, edits, and errors
174-
- Activity trends over time
176+
- Total views, edits, errors, and average load time
177+
- Activity and performance trends over time
175178
- Top users by views
176179
- Edit history
177180
- Error details with severity
181+
- Load time analysis (average and maximum)
178182

179183
### Adding Badges to Your Dashboards
180184

@@ -206,27 +210,129 @@ Replace `YOUR_DASHBOARD_NAME` with your dashboard's pretty name from the registr
206210

207211
### Using Search Macros
208212

209-
CACA provides several search macros for easy querying:
213+
CACA provides several search macros for easy querying. These macros help you quickly identify dashboards with issues, analyze performance, and understand usage patterns.
210214

211-
#### Get dashboard stats:
215+
**Note:** All macros respect the app filter configuration (see "Filtering Apps for Monitoring" in Configuration section). All results include the `app` field showing which app each dashboard belongs to, making it easy to filter or group results by application.
216+
217+
#### Finding Dashboards with Issues
218+
219+
##### Identify dashboards with health issues (errors/warnings):
220+
```spl
221+
`get_dashboards_with_errors`
222+
```
223+
**Returns:** Dashboards with errors or warnings in the last 7 days, sorted by severity
224+
**Columns:** pretty_name, app, errors, warnings, total_issues, health_status
225+
**Use case:** Find dashboards that are generating errors and need attention
226+
227+
##### Identify slow-performing dashboards:
228+
```spl
229+
`get_slow_dashboards`
230+
```
231+
**Returns:** Dashboards with average load time > 3 seconds in the last 7 days
232+
**Columns:** pretty_name, app, avg_load_time_7d, performance_status
233+
**Use case:** Find dashboards that need performance optimization
234+
235+
##### Identify all problematic dashboards (health OR performance issues):
236+
```spl
237+
`get_problematic_dashboards`
238+
```
239+
**Returns:** Dashboards with critical/warning health status OR slow performance
240+
**Columns:** pretty_name, app, views_7d, errors_7d, avg_load_time_7d, health_status, issue_type
241+
**Use case:** Get a comprehensive list of all dashboards needing attention
242+
243+
**Example - Filter for critical issues only:**
244+
```spl
245+
`get_problematic_dashboards`
246+
| where health_status="critical" OR avg_load_time_7d > 10000
247+
```
248+
249+
#### Dashboard Analytics
250+
251+
##### Get comprehensive stats for a specific dashboard:
212252
```spl
213253
`get_dashboard_stats("My Dashboard Name")`
214254
```
255+
**Returns:** All metrics (views, edits, errors, load time) for the specified dashboard
256+
**Use case:** Deep dive into a specific dashboard's activity
215257

216-
#### Get all dashboards summary:
258+
##### Get all dashboards summary:
217259
```spl
218260
`get_all_dashboards_summary`
219261
```
262+
**Returns:** Summary of all dashboards with 7-day metrics
263+
**Columns:** pretty_name, app, views_7d, edits_7d, errors_7d, avg_load_time_7d, health_status
264+
**Use case:** Get an overview of all dashboard health and activity
220265

221-
#### Get top dashboards by views:
266+
##### Get top dashboards by metric type:
222267
```spl
223268
`get_top_dashboards(views)`
269+
`get_top_dashboards(edits)`
224270
```
271+
**Returns:** Top 10 dashboards by views or edits in the last 7 days
272+
**Use case:** Identify most-used or most-edited dashboards
273+
274+
#### Performance Analysis
225275

226-
#### Get last viewed time:
276+
##### Get performance rating for a specific dashboard:
277+
```spl
278+
`get_dashboard_performance("My Dashboard Name")`
279+
```
280+
**Returns:** Average load time and performance rating (Excellent/Good/Fair/Poor)
281+
**Use case:** Check if a dashboard meets performance standards
282+
283+
##### Get last viewed time for a dashboard:
227284
```spl
228285
`get_dashboard_last_viewed("My Dashboard Name")`
229286
```
287+
**Returns:** Last viewed timestamp and days since last view
288+
**Use case:** Identify stale or unused dashboards
289+
290+
#### Common Use Cases
291+
292+
**Find all dashboards needing immediate attention:**
293+
```spl
294+
`get_problematic_dashboards`
295+
| where health_status="critical" OR (errors_7d > 50) OR (avg_load_time_7d > 10000)
296+
```
297+
298+
**Filter results by specific app:**
299+
```spl
300+
`get_dashboards_with_errors`
301+
| where app="search"
302+
| table pretty_name app errors warnings health_status
303+
```
304+
305+
**List dashboards with errors across multiple apps:**
306+
```spl
307+
`get_dashboards_with_errors`
308+
| where app IN ("my_app1", "my_app2", "production_app")
309+
| sort -errors
310+
```
311+
312+
**List dashboards with errors that are actively used:**
313+
```spl
314+
`get_dashboards_with_errors`
315+
| where errors > 0
316+
| join type=inner pretty_name [| mstats sum(_value) as views WHERE index=caca_metrics AND metric_name="dashboard.views" BY pretty_name span=1d | where _time >= relative_time(now(), "-7d") | stats sum(views) as views_7d by pretty_name | where views_7d > 10]
317+
| table pretty_name app errors warnings views_7d health_status
318+
```
319+
320+
**Find slow dashboards with high usage:**
321+
```spl
322+
`get_slow_dashboards`
323+
| join type=inner pretty_name [| mstats sum(_value) as views WHERE index=caca_metrics AND metric_name="dashboard.views" BY pretty_name span=1d | where _time >= relative_time(now(), "-7d") | stats sum(views) as views_7d by pretty_name]
324+
| where views_7d > 50
325+
| table pretty_name app avg_load_time_7d performance_status views_7d
326+
| sort -views_7d
327+
```
328+
329+
**Dashboard health report for a specific app:**
330+
```spl
331+
`get_all_dashboards_summary`
332+
| where app="search"
333+
| table pretty_name views_7d edits_7d errors_7d avg_load_time_7d health_status
334+
| sort -errors_7d
335+
```
230336

231337
## Configuration
232338

@@ -236,6 +342,7 @@ Edit `default/savedsearches.conf` or use Splunk Web to modify:
236342

237343
- **View tracking frequency**: Default every 5 minutes
238344
- **Edit tracking frequency**: Default every 10 minutes
345+
- **Performance tracking frequency**: Default every 10 minutes
239346
- **Health tracking frequency**: Default every 15 minutes
240347
- **Registry update frequency**: Default daily at 2 AM
241348

@@ -248,9 +355,70 @@ Edit `default/indexes.conf` to adjust retention:
248355
frozenTimePeriodInSecs = 31536000 # 1 year (default)
249356
```
250357

251-
### Excluding Dashboards from Monitoring
358+
### Filtering Apps for Monitoring
359+
360+
CACA can be configured to only monitor dashboards from specific apps, or exclude certain apps from monitoring. This is useful when you only want to track dashboards in production apps, or exclude system/admin apps.
361+
362+
#### Configuration Method
363+
364+
Edit `lookups/app_filter.csv` to control which apps are monitored:
365+
366+
**Include specific apps only:**
367+
```csv
368+
app,include
369+
search,true
370+
my_production_app,true
371+
another_app,true
372+
```
373+
374+
**Exclude specific apps:**
375+
```csv
376+
app,include
377+
splunk_monitoring_console,false
378+
learned,false
379+
introspection_generator_addon,false
380+
```
381+
382+
**How it works:**
383+
- If an app is **not listed** in app_filter.csv, it **will be monitored** (default behavior)
384+
- If an app is listed with `include=true` (or `1` or `yes`), it **will be monitored**
385+
- If an app is listed with `include=false` (or `0` or `no`), it **will NOT be monitored**
386+
- The filter applies to:
387+
- Dashboard registry updates (which dashboards are discovered)
388+
- All metrics collection (views, edits, errors, performance)
389+
- All search macros and dashboard queries
390+
391+
#### Examples
392+
393+
**Monitor only specific production apps:**
394+
```csv
395+
app,include
396+
production_app1,true
397+
production_app2,true
398+
production_app3,true
399+
```
400+
Then add a wildcard exclusion entry to exclude everything else (optional):
401+
```csv
402+
app,include
403+
production_app1,true
404+
production_app2,true
405+
*,false
406+
```
407+
408+
**Exclude system and admin apps:**
409+
```csv
410+
app,include
411+
splunk_monitoring_console,false
412+
learned,false
413+
introspection_generator_addon,false
414+
splunk_instrumentation,false
415+
```
416+
417+
**Note:** After updating `app_filter.csv`, run the "Dashboard Registry - Auto Update" search to rebuild the dashboard registry with the new filter applied.
418+
419+
### Excluding Individual Dashboards from Monitoring
252420

253-
Edit `lookups/dashboard_registry.csv` and set `status=inactive` for dashboards you want to exclude from collection.
421+
Edit `lookups/dashboard_registry.csv` and set `status=inactive` for specific dashboards you want to exclude from collection (this is independent of app filtering).
254422

255423
## Troubleshooting
256424

default/data/ui/views/dashboard_details.xml

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -85,6 +85,25 @@
8585
</single>
8686
</panel>
8787

88+
<panel>
89+
<title>Avg Load Time</title>
90+
<single>
91+
<search>
92+
<query>| mstats avg(avg_load_time) as avg_load WHERE index=caca_metrics AND pretty_name="$dashboard_name$" AND metric_name="dashboard.load_time" span=1d
93+
| stats avg(avg_load) as avg_load_time
94+
| eval display=round(avg_load_time, 0)." ms"
95+
| table display</query>
96+
<earliest>$time_range.earliest$</earliest>
97+
<latest>$time_range.latest$</latest>
98+
</search>
99+
<option name="drilldown">none</option>
100+
<option name="colorMode">block</option>
101+
<option name="rangeColors">["0x53a051","0x0877a6","0xf8be34","0xdc4e41"]</option>
102+
<option name="rangeValues">[1000,3000,5000]</option>
103+
<option name="underLabel">Performance</option>
104+
</single>
105+
</panel>
106+
88107
<panel>
89108
<title>Last Viewed</title>
90109
<single>
@@ -119,6 +138,24 @@
119138
<option name="charting.drilldown">none</option>
120139
</chart>
121140
</panel>
141+
142+
<panel>
143+
<title>Load Time Trend Over Time</title>
144+
<chart>
145+
<search>
146+
<query>| mstats avg(_value) as avg_load WHERE index=caca_metrics AND pretty_name="$dashboard_name$" AND metric_name="dashboard.load_time" span=1h
147+
| timechart avg(avg_load) as "Avg Load Time (ms)" span=1h</query>
148+
<earliest>$time_range.earliest$</earliest>
149+
<latest>$time_range.latest$</latest>
150+
</search>
151+
<option name="charting.chart">line</option>
152+
<option name="charting.axisTitleX.text">Date</option>
153+
<option name="charting.axisTitleY.text">Load Time (ms)</option>
154+
<option name="charting.legend.placement">bottom</option>
155+
<option name="charting.seriesColors">[0xF8BE34]</option>
156+
<option name="charting.drilldown">none</option>
157+
</chart>
158+
</panel>
122159
</row>
123160

124161
<!-- User Activity Breakdown -->

0 commit comments

Comments
 (0)