Skip to content

Commit d05fd12

Browse files
committed
monitoring: add shared monitoring services framework
Add a comprehensive monitoring framework that can be used across all kdevops workflows to collect system metrics during test execution. Key features: - Top-level "Monitors" menu in kconfig below "Target workflows" - Shared monitoring role that any workflow can integrate - Folio migration statistics collection for developmental kernels - Automatic data collection and visualization - Zero impact when disabled - Allow workflows to set monitoring_results_base_path variable - Default to workflows/fstests/results/monitoring for backward compatibility - Update documentation with example of custom path configuration - Make monitoring framework more flexible for different workflows - Check if matplotlib is available before attempting plot generation - Skip plot generation with informative message if matplotlib missing - Still collect raw monitoring data even without plotting capability - Prevent task failures when visualization dependencies are missing Configuration options: - ENABLE_MONITORING: Main toggle for monitoring services - MONITOR_DEVELOPMENTAL_STATS: Enable developmental statistics - MONITOR_FOLIO_MIGRATION: Monitor folio migration stats - MONITOR_FOLIO_MIGRATION_INTERVAL: Collection interval (default 60s) Integration example with fstests: - Added monitoring role calls before/after oscheck execution - Added fstests-tests target for running both baseline and dev groups - Results saved to workflows/fstests/results/monitoring/ The monitoring framework collects data asynchronously in the background during workflow execution and automatically processes results afterward, including generating visualization plots when possible. This ensures monitoring data collection works even on minimal systems that don't have matplotlib installed, while still generating plots when the dependency is available. Generated-by: Claude AI Signed-off-by: Luis Chamberlain <[email protected]>
1 parent 4c36065 commit d05fd12

File tree

11 files changed

+931
-0
lines changed

11 files changed

+931
-0
lines changed

Kconfig

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,10 @@ menu "Target workflows"
7979
source "kconfigs/workflows/Kconfig"
8080
endmenu
8181

82+
menu "Monitors"
83+
source "kconfigs/monitors/Kconfig"
84+
endmenu
85+
8286
menu "Kdevops configuration"
8387
source "kconfigs/Kconfig.kdevops"
8488
endmenu

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -289,6 +289,7 @@ Below is kdevops' recommended documentation reading.
289289
* [kdevops' evolving make help](docs/evolving-make-help.md)
290290
* [kdevops configuration](docs/kdevops-configuration.md)
291291
* [kdevops mirror support](docs/kdevops-mirror.md)
292+
* [kdevops monitoring services](docs/monitoring.md)
292293
* [kdevops first run](docs/kdevops-first-run.md)
293294
* [kdevops running make](docs/running-make.md)
294295
* [kdevops libvirt storage pool considerations](docs/libvirt-storage-pool.md)

docs/monitoring.md

Lines changed: 279 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,279 @@
1+
# Monitoring Services in kdevops
2+
3+
## Overview
4+
5+
kdevops provides a flexible monitoring framework that allows you to collect system metrics and statistics during workflow execution. This is particularly useful for:
6+
7+
- Performance analysis during testing
8+
- Debugging kernel behavior
9+
- Understanding system resource usage patterns
10+
- Validating new kernel features with custom metrics
11+
12+
The monitoring framework runs services in the background during workflow execution and automatically collects results afterward.
13+
14+
## Configuration
15+
16+
### Enabling Monitoring
17+
18+
Monitoring services are configured through the kdevops menuconfig system:
19+
20+
```bash
21+
make menuconfig
22+
# Navigate to: Monitors
23+
# Enable: "Enable monitoring services during workflow execution"
24+
```
25+
26+
### Available Monitors
27+
28+
#### Folio Migration Statistics (Developmental)
29+
30+
This monitor tracks page/folio migration statistics in the Linux kernel. It's marked as "developmental" because it requires kernel patches that are not yet upstream.
31+
32+
**Requirements:**
33+
- Kernel with folio migration debugfs stats patch applied
34+
- Debugfs mounted at `/sys/kernel/debug`
35+
- File exists: `/sys/kernel/debug/mm/migrate/stats`
36+
37+
**Configuration:**
38+
```bash
39+
make menuconfig
40+
# Navigate to: Monitors
41+
# Enable: "Enable monitoring services during workflow execution"
42+
# Enable: "Enable developmental statistics (not yet upstream)"
43+
# Enable: "Monitor folio migration statistics"
44+
# Set: "Folio migration monitoring interval" (default: 60 seconds)
45+
```
46+
47+
## Integration with Workflows
48+
49+
### Currently Supported Workflows
50+
51+
- **fstests**: Filesystem testing framework
52+
53+
### How Workflows Integrate Monitoring
54+
55+
Workflows integrate monitoring by including the monitoring role at appropriate points. Here's the pattern used in fstests:
56+
57+
```yaml
58+
# Start monitoring before tests
59+
- name: Start monitoring services
60+
include_role:
61+
name: monitoring
62+
tasks_from: monitor_run
63+
when:
64+
- kdevops_run_fstests|bool
65+
- enable_monitoring|default(false)|bool
66+
tags: [ 'oscheck', 'fstests', 'run_tests', 'monitoring', 'monitor_run' ]
67+
68+
# ... workflow tasks run here ...
69+
70+
# Stop monitoring and collect data after tests
71+
- name: Stop monitoring services and collect data
72+
include_role:
73+
name: monitoring
74+
tasks_from: monitor_collect
75+
when:
76+
- kdevops_run_fstests|bool
77+
- enable_monitoring|default(false)|bool
78+
tags: [ 'oscheck', 'fstests', 'run_tests', 'monitoring', 'monitor_collect' ]
79+
```
80+
81+
### Adding Monitoring to Your Workflow
82+
83+
To add monitoring support to a new workflow:
84+
85+
1. **Identify the execution boundaries**: Determine where your workflow starts and completes its main work.
86+
87+
2. **Include the monitoring role**: Add the monitoring role calls before and after your main tasks:
88+
89+
```yaml
90+
# In your workflow's main task file (e.g., playbooks/roles/YOUR_WORKFLOW/tasks/main.yml)
91+
92+
# Set custom monitoring results path (optional)
93+
- name: Set monitoring results path for this workflow
94+
set_fact:
95+
monitoring_results_base_path: "{{ topdir_path }}/workflows/YOUR_WORKFLOW/results/monitoring"
96+
when:
97+
- enable_monitoring|default(false)|bool
98+
99+
# Start monitoring
100+
- name: Start monitoring services
101+
include_role:
102+
name: monitoring
103+
tasks_from: monitor_run
104+
when:
105+
- your_workflow_condition|bool
106+
- enable_monitoring|default(false)|bool
107+
tags: [ 'your_workflow', 'monitoring', 'monitor_run' ]
108+
109+
# Your workflow tasks here...
110+
111+
# Stop monitoring
112+
- name: Stop monitoring services and collect data
113+
include_role:
114+
name: monitoring
115+
tasks_from: monitor_collect
116+
when:
117+
- your_workflow_condition|bool
118+
- enable_monitoring|default(false)|bool
119+
tags: [ 'your_workflow', 'monitoring', 'monitor_collect' ]
120+
```
121+
122+
3. **Test the integration**: Run your workflow with monitoring enabled to verify data collection.
123+
124+
## Output and Results
125+
126+
### Result Location
127+
128+
Monitoring results are stored in workflow-specific directories:
129+
130+
- **fstests**: `workflows/fstests/results/monitoring/`
131+
- **Other workflows**: `workflows/YOUR_WORKFLOW/results/monitoring/`
132+
133+
Workflows can customize the results path by setting the `monitoring_results_base_path` variable in their playbook.
134+
135+
### Result Files
136+
137+
For folio migration monitoring:
138+
- `<hostname>_folio_migration_stats.txt`: Raw statistics with timestamps
139+
- `<hostname>_folio_migration_plot.png`: Visualization plot (if generation succeeds)
140+
141+
### Example Output
142+
143+
Raw statistics file format:
144+
```
145+
2024-01-15 10:30:00
146+
success: 12345
147+
fail: 67
148+
total: 12412
149+
150+
2024-01-15 10:31:00
151+
success: 12456
152+
fail: 68
153+
total: 12524
154+
```
155+
156+
## Running Workflows with Monitoring
157+
158+
### Example: fstests with Folio Migration Monitoring
159+
160+
1. **Configure monitoring**:
161+
```bash
162+
make menuconfig
163+
# Enable monitoring options as described above
164+
make
165+
```
166+
167+
2. **Provision systems**:
168+
```bash
169+
make bringup
170+
```
171+
172+
3. **Run tests with monitoring**:
173+
```bash
174+
# Run on both baseline and dev groups
175+
make fstests-tests TESTS=generic/003
176+
177+
# Or run on specific group
178+
make fstests-baseline TESTS=generic/003
179+
```
180+
181+
4. **Check results**:
182+
```bash
183+
ls -la workflows/fstests/results/monitoring/
184+
```
185+
186+
## Advanced Usage
187+
188+
### Custom Monitoring Intervals
189+
190+
You can override the monitoring interval at runtime:
191+
192+
```bash
193+
make fstests-tests EXTRA_VARS="monitor_folio_migration_interval=30"
194+
```
195+
196+
### Selective Monitoring
197+
198+
You can enable/disable specific monitors at runtime:
199+
200+
```bash
201+
# Enable only folio migration monitoring
202+
make fstests-tests EXTRA_VARS="enable_monitoring=true monitor_folio_migration=true"
203+
```
204+
205+
## Troubleshooting
206+
207+
### Monitor Not Starting
208+
209+
1. **Check kernel support**:
210+
```bash
211+
ansible all -m shell -a "ls -la /sys/kernel/debug/mm/migrate/stats"
212+
```
213+
214+
2. **Verify debugfs is mounted**:
215+
```bash
216+
ansible all -m shell -a "mount | grep debugfs"
217+
```
218+
219+
3. **Check monitoring process**:
220+
```bash
221+
ansible all -m shell -a "ps aux | grep monitoring"
222+
```
223+
224+
### No Data Collected
225+
226+
1. **Verify monitoring was enabled**:
227+
```bash
228+
grep -E "enable_monitoring|monitor_" .config
229+
```
230+
231+
2. **Check ansible output for monitoring tasks**:
232+
```bash
233+
make fstests-tests AV=2 | grep -A5 -B5 monitoring
234+
```
235+
236+
3. **Look for error messages**:
237+
```bash
238+
ansible all -m shell -a "cat /root/monitoring/folio_migration.log"
239+
```
240+
241+
## Adding New Monitors
242+
243+
To add a new monitor to the framework:
244+
245+
1. **Add Kconfig option** in `kconfigs/monitors/Kconfig`:
246+
```kconfig
247+
config MONITOR_YOUR_METRIC
248+
bool "Monitor your metric description"
249+
output yaml
250+
default n
251+
help
252+
Detailed description of what this monitors...
253+
```
254+
255+
2. **Extend monitoring role**:
256+
- Add collection logic in `playbooks/roles/monitoring/tasks/monitor_run.yml`
257+
- Add termination and data collection in `playbooks/roles/monitoring/tasks/monitor_collect.yml`
258+
259+
3. **Add visualization** (optional):
260+
- Place scripts in `playbooks/roles/monitoring/files/`
261+
- Call them from `monitor_collect.yml`
262+
263+
4. **Update documentation**: Add your monitor to this documentation file.
264+
265+
## Performance Considerations
266+
267+
- **Monitoring overhead**: Each monitor adds some system overhead. Consider the trade-off between data granularity and performance impact.
268+
- **Storage requirements**: Long-running tests with frequent monitoring can generate large data files.
269+
- **Concurrent monitors**: Running multiple monitors simultaneously increases overhead.
270+
271+
## Future Enhancements
272+
273+
Planned monitoring additions:
274+
- Memory pressure statistics
275+
- CPU utilization tracking
276+
- I/O statistics collection
277+
- Network traffic monitoring
278+
- Custom perf event monitoring
279+
- Integration with Grafana/Prometheus for real-time visualization

kconfigs/monitors/Kconfig

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
# SPDX-License-Identifier: copyleft-next-0.3.1
2+
3+
config ENABLE_MONITORING
4+
bool "Enable monitoring services during workflow execution"
5+
output yaml
6+
default n
7+
help
8+
Enable monitoring services to collect statistics during workflow
9+
execution. This allows collection of various system metrics while
10+
workflows are running.
11+
12+
Monitoring services run in the background during test execution and
13+
automatically collect results afterward. The collected data can be
14+
used for performance analysis, debugging, and understanding system
15+
behavior during tests.
16+
17+
Individual workflows must add support for monitoring integration.
18+
Currently supported workflows:
19+
- fstests
20+
21+
if ENABLE_MONITORING
22+
23+
config MONITOR_DEVELOPMENTAL_STATS
24+
bool "Enable developmental statistics (not yet upstream)"
25+
output yaml
26+
default n
27+
help
28+
Enable collection of statistics that are still in development
29+
and not yet merged upstream in the Linux kernel.
30+
31+
This is useful for testing and validating new kernel features
32+
that provide additional debugging or performance metrics.
33+
34+
if MONITOR_DEVELOPMENTAL_STATS
35+
36+
config MONITOR_FOLIO_MIGRATION
37+
bool "Monitor folio migration statistics"
38+
output yaml
39+
default n
40+
help
41+
Enable monitoring of folio migration statistics if available.
42+
This requires the kernel to have the folio migration debugfs
43+
stats patch applied.
44+
45+
The statistics are collected from:
46+
/sys/kernel/debug/mm/migrate/stats
47+
48+
This feature collects migration statistics periodically during
49+
workflow execution and can generate plots for visualization.
50+
51+
config MONITOR_FOLIO_MIGRATION_INTERVAL
52+
int "Folio migration monitoring interval (seconds)"
53+
output yaml
54+
default 60
55+
depends on MONITOR_FOLIO_MIGRATION
56+
help
57+
How often to collect folio migration statistics in seconds.
58+
Default is 60 seconds.
59+
60+
Lower values provide more granular data but may impact system
61+
performance. Higher values reduce overhead but may miss
62+
short-lived migration events.
63+
64+
endif # MONITOR_DEVELOPMENTAL_STATS
65+
66+
# Future monitoring options can be added here
67+
# Examples:
68+
# - Memory pressure monitoring
69+
# - CPU utilization tracking
70+
# - I/O statistics collection
71+
# - Network traffic monitoring
72+
# - Custom perf event monitoring
73+
74+
endif # ENABLE_MONITORING

0 commit comments

Comments
 (0)