Skip to content

Commit 7cd4b58

Browse files
committed
monitoring: add monitoring framework for workflow execution
Add a flexible monitoring framework that collects system metrics during workflow execution. The framework supports background monitoring services that automatically start before workflows and collect results afterward. Initial implementation includes: - Core monitoring infrastructure with Kconfig integration - Folio migration statistics monitor (for developmental kernel features) - Integration with fstests workflow - Result collection and visualization support - Documentation for adding new monitors and integrating with workflows The monitoring system is designed to be modular, allowing workflows to opt-in and new monitors to be easily added. Results are stored in workflow-specific directories and can include both raw data and visualizations. Generated-by: Claude AI Signed-off-by: Luis Chamberlain <[email protected]>
1 parent abc034c commit 7cd4b58

File tree

10 files changed

+921
-0
lines changed

10 files changed

+921
-0
lines changed

Kconfig

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,10 @@ menu "Target workflows"
7979
source "kconfigs/workflows/Kconfig"
8080
endmenu
8181

82+
menu "Monitors"
83+
source "kconfigs/monitors/Kconfig"
84+
endmenu
85+
8286
menu "Kdevops configuration"
8387
source "kconfigs/Kconfig.kdevops"
8488
endmenu

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -306,6 +306,7 @@ Below is kdevops' recommended documentation reading.
306306
* [kdevops' evolving make help](docs/evolving-make-help.md)
307307
* [kdevops configuration](docs/kdevops-configuration.md)
308308
* [kdevops mirror support](docs/kdevops-mirror.md)
309+
* [kdevops monitoring services](docs/monitoring.md)
309310
* [kdevops first run](docs/kdevops-first-run.md)
310311
* [kdevops running make](docs/running-make.md)
311312
* [kdevops libvirt storage pool considerations](docs/libvirt-storage-pool.md)

docs/monitoring.md

Lines changed: 279 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,279 @@
1+
# Monitoring Services in kdevops
2+
3+
## Overview
4+
5+
kdevops provides a flexible monitoring framework that allows you to collect system metrics and statistics during workflow execution. This is particularly useful for:
6+
7+
- Performance analysis during testing
8+
- Debugging kernel behavior
9+
- Understanding system resource usage patterns
10+
- Validating new kernel features with custom metrics
11+
12+
The monitoring framework runs services in the background during workflow execution and automatically collects results afterward.
13+
14+
## Configuration
15+
16+
### Enabling Monitoring
17+
18+
Monitoring services are configured through the kdevops menuconfig system:
19+
20+
```bash
21+
make menuconfig
22+
# Navigate to: Monitors
23+
# Enable: "Enable monitoring services during workflow execution"
24+
```
25+
26+
### Available Monitors
27+
28+
#### Folio Migration Statistics (Developmental)
29+
30+
This monitor tracks page/folio migration statistics in the Linux kernel. It's marked as "developmental" because it requires kernel patches that are not yet upstream.
31+
32+
**Requirements:**
33+
- Kernel with folio migration debugfs stats patch applied
34+
- Debugfs mounted at `/sys/kernel/debug`
35+
- File exists: `/sys/kernel/debug/mm/migrate/stats`
36+
37+
**Configuration:**
38+
```bash
39+
make menuconfig
40+
# Navigate to: Monitors
41+
# Enable: "Enable monitoring services during workflow execution"
42+
# Enable: "Enable developmental statistics (not yet upstream)"
43+
# Enable: "Monitor folio migration statistics"
44+
# Set: "Folio migration monitoring interval" (default: 60 seconds)
45+
```
46+
47+
## Integration with Workflows
48+
49+
### Currently Supported Workflows
50+
51+
- **fstests**: Filesystem testing framework
52+
53+
### How Workflows Integrate Monitoring
54+
55+
Workflows integrate monitoring by including the monitoring role at appropriate points. Here's the pattern used in fstests:
56+
57+
```yaml
58+
# Start monitoring before tests
59+
- name: Start monitoring services
60+
include_role:
61+
name: monitoring
62+
tasks_from: monitor_run
63+
when:
64+
- kdevops_run_fstests|bool
65+
- enable_monitoring|default(false)|bool
66+
tags: [ 'oscheck', 'fstests', 'run_tests', 'monitoring', 'monitor_run' ]
67+
68+
# ... workflow tasks run here ...
69+
70+
# Stop monitoring and collect data after tests
71+
- name: Stop monitoring services and collect data
72+
include_role:
73+
name: monitoring
74+
tasks_from: monitor_collect
75+
when:
76+
- kdevops_run_fstests|bool
77+
- enable_monitoring|default(false)|bool
78+
tags: [ 'oscheck', 'fstests', 'run_tests', 'monitoring', 'monitor_collect' ]
79+
```
80+
81+
### Adding Monitoring to Your Workflow
82+
83+
To add monitoring support to a new workflow:
84+
85+
1. **Identify the execution boundaries**: Determine where your workflow starts and completes its main work.
86+
87+
2. **Include the monitoring role**: Add the monitoring role calls before and after your main tasks:
88+
89+
```yaml
90+
# In your workflow's main task file (e.g., playbooks/roles/YOUR_WORKFLOW/tasks/main.yml)
91+
92+
# Set custom monitoring results path (optional)
93+
- name: Set monitoring results path for this workflow
94+
set_fact:
95+
monitoring_results_base_path: "{{ topdir_path }}/workflows/YOUR_WORKFLOW/results/monitoring"
96+
when:
97+
- enable_monitoring|default(false)|bool
98+
99+
# Start monitoring
100+
- name: Start monitoring services
101+
include_role:
102+
name: monitoring
103+
tasks_from: monitor_run
104+
when:
105+
- your_workflow_condition|bool
106+
- enable_monitoring|default(false)|bool
107+
tags: [ 'your_workflow', 'monitoring', 'monitor_run' ]
108+
109+
# Your workflow tasks here...
110+
111+
# Stop monitoring
112+
- name: Stop monitoring services and collect data
113+
include_role:
114+
name: monitoring
115+
tasks_from: monitor_collect
116+
when:
117+
- your_workflow_condition|bool
118+
- enable_monitoring|default(false)|bool
119+
tags: [ 'your_workflow', 'monitoring', 'monitor_collect' ]
120+
```
121+
122+
3. **Test the integration**: Run your workflow with monitoring enabled to verify data collection.
123+
124+
## Output and Results
125+
126+
### Result Location
127+
128+
Monitoring results are stored in workflow-specific directories:
129+
130+
- **fstests**: `workflows/fstests/results/monitoring/`
131+
- **Other workflows**: `workflows/YOUR_WORKFLOW/results/monitoring/`
132+
133+
Workflows can customize the results path by setting the `monitoring_results_base_path` variable in their playbook.
134+
135+
### Result Files
136+
137+
For folio migration monitoring:
138+
- `<hostname>_folio_migration_stats.txt`: Raw statistics with timestamps
139+
- `<hostname>_folio_migration_plot.png`: Visualization plot (if generation succeeds)
140+
141+
### Example Output
142+
143+
Raw statistics file format:
144+
```
145+
2024-01-15 10:30:00
146+
success: 12345
147+
fail: 67
148+
total: 12412
149+
150+
2024-01-15 10:31:00
151+
success: 12456
152+
fail: 68
153+
total: 12524
154+
```
155+
156+
## Running Workflows with Monitoring
157+
158+
### Example: fstests with Folio Migration Monitoring
159+
160+
1. **Configure monitoring**:
161+
```bash
162+
make menuconfig
163+
# Enable monitoring options as described above
164+
make
165+
```
166+
167+
2. **Provision systems**:
168+
```bash
169+
make bringup
170+
```
171+
172+
3. **Run tests with monitoring**:
173+
```bash
174+
# Run on both baseline and dev groups
175+
make fstests-tests TESTS=generic/003
176+
177+
# Or run on specific group
178+
make fstests-baseline TESTS=generic/003
179+
```
180+
181+
4. **Check results**:
182+
```bash
183+
ls -la workflows/fstests/results/monitoring/
184+
```
185+
186+
## Advanced Usage
187+
188+
### Custom Monitoring Intervals
189+
190+
You can override the monitoring interval at runtime:
191+
192+
```bash
193+
make fstests-tests EXTRA_VARS="monitor_folio_migration_interval=30"
194+
```
195+
196+
### Selective Monitoring
197+
198+
You can enable/disable specific monitors at runtime:
199+
200+
```bash
201+
# Enable only folio migration monitoring
202+
make fstests-tests EXTRA_VARS="enable_monitoring=true monitor_folio_migration=true"
203+
```
204+
205+
## Troubleshooting
206+
207+
### Monitor Not Starting
208+
209+
1. **Check kernel support**:
210+
```bash
211+
ansible all -m shell -a "ls -la /sys/kernel/debug/mm/migrate/stats"
212+
```
213+
214+
2. **Verify debugfs is mounted**:
215+
```bash
216+
ansible all -m shell -a "mount | grep debugfs"
217+
```
218+
219+
3. **Check monitoring process**:
220+
```bash
221+
ansible all -m shell -a "ps aux | grep monitoring"
222+
```
223+
224+
### No Data Collected
225+
226+
1. **Verify monitoring was enabled**:
227+
```bash
228+
grep -E "enable_monitoring|monitor_" .config
229+
```
230+
231+
2. **Check ansible output for monitoring tasks**:
232+
```bash
233+
make fstests-tests AV=2 | grep -A5 -B5 monitoring
234+
```
235+
236+
3. **Look for error messages**:
237+
```bash
238+
ansible all -m shell -a "cat /root/monitoring/folio_migration.log"
239+
```
240+
241+
## Adding New Monitors
242+
243+
To add a new monitor to the framework:
244+
245+
1. **Add Kconfig option** in `kconfigs/monitors/Kconfig`:
246+
```kconfig
247+
config MONITOR_YOUR_METRIC
248+
bool "Monitor your metric description"
249+
output yaml
250+
default n
251+
help
252+
Detailed description of what this monitors...
253+
```
254+
255+
2. **Extend monitoring role**:
256+
- Add collection logic in `playbooks/roles/monitoring/tasks/monitor_run.yml`
257+
- Add termination and data collection in `playbooks/roles/monitoring/tasks/monitor_collect.yml`
258+
259+
3. **Add visualization** (optional):
260+
- Place scripts in `playbooks/roles/monitoring/files/`
261+
- Call them from `monitor_collect.yml`
262+
263+
4. **Update documentation**: Add your monitor to this documentation file.
264+
265+
## Performance Considerations
266+
267+
- **Monitoring overhead**: Each monitor adds some system overhead. Consider the trade-off between data granularity and performance impact.
268+
- **Storage requirements**: Long-running tests with frequent monitoring can generate large data files.
269+
- **Concurrent monitors**: Running multiple monitors simultaneously increases overhead.
270+
271+
## Future Enhancements
272+
273+
Planned monitoring additions:
274+
- Memory pressure statistics
275+
- CPU utilization tracking
276+
- I/O statistics collection
277+
- Network traffic monitoring
278+
- Custom perf event monitoring
279+
- Integration with Grafana/Prometheus for real-time visualization

kconfigs/monitors/Kconfig

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
# SPDX-License-Identifier: copyleft-next-0.3.1
2+
3+
config ENABLE_MONITORING
4+
bool "Enable monitoring services during workflow execution"
5+
output yaml
6+
default n
7+
help
8+
Enable monitoring services to collect statistics during workflow
9+
execution. This allows collection of various system metrics while
10+
workflows are running.
11+
12+
Monitoring services run in the background during test execution and
13+
automatically collect results afterward. The collected data can be
14+
used for performance analysis, debugging, and understanding system
15+
behavior during tests.
16+
17+
Individual workflows must add support for monitoring integration.
18+
Currently supported workflows:
19+
- fstests
20+
21+
if ENABLE_MONITORING
22+
23+
config MONITOR_DEVELOPMENTAL_STATS
24+
bool "Enable developmental statistics (not yet upstream)"
25+
output yaml
26+
default n
27+
help
28+
Enable collection of statistics that are still in development
29+
and not yet merged upstream in the Linux kernel.
30+
31+
This is useful for testing and validating new kernel features
32+
that provide additional debugging or performance metrics.
33+
34+
if MONITOR_DEVELOPMENTAL_STATS
35+
36+
config MONITOR_FOLIO_MIGRATION
37+
bool "Monitor folio migration statistics"
38+
output yaml
39+
default n
40+
help
41+
Enable monitoring of folio migration statistics if available.
42+
This requires the kernel to have the folio migration debugfs
43+
stats patch applied.
44+
45+
The statistics are collected from:
46+
/sys/kernel/debug/mm/migrate/stats
47+
48+
This feature collects migration statistics periodically during
49+
workflow execution and can generate plots for visualization.
50+
51+
config MONITOR_FOLIO_MIGRATION_INTERVAL
52+
int "Folio migration monitoring interval (seconds)"
53+
output yaml
54+
default 60
55+
depends on MONITOR_FOLIO_MIGRATION
56+
help
57+
How often to collect folio migration statistics in seconds.
58+
Default is 60 seconds.
59+
60+
Lower values provide more granular data but may impact system
61+
performance. Higher values reduce overhead but may miss
62+
short-lived migration events.
63+
64+
endif # MONITOR_DEVELOPMENTAL_STATS
65+
66+
# Future monitoring options can be added here
67+
# Examples:
68+
# - Memory pressure monitoring
69+
# - CPU utilization tracking
70+
# - I/O statistics collection
71+
# - Network traffic monitoring
72+
# - Custom perf event monitoring
73+
74+
endif # ENABLE_MONITORING

0 commit comments

Comments
 (0)