|
| 1 | +# Monitoring Services in kdevops |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +kdevops provides a flexible monitoring framework that allows you to collect system metrics and statistics during workflow execution. This is particularly useful for: |
| 6 | + |
| 7 | +- Performance analysis during testing |
| 8 | +- Debugging kernel behavior |
| 9 | +- Understanding system resource usage patterns |
| 10 | +- Validating new kernel features with custom metrics |
| 11 | + |
| 12 | +The monitoring framework runs services in the background during workflow execution and automatically collects results afterward. |
| 13 | + |
| 14 | +## Configuration |
| 15 | + |
| 16 | +### Enabling Monitoring |
| 17 | + |
| 18 | +Monitoring services are configured through the kdevops menuconfig system: |
| 19 | + |
| 20 | +```bash |
| 21 | +make menuconfig |
| 22 | +# Navigate to: Monitors |
| 23 | +# Enable: "Enable monitoring services during workflow execution" |
| 24 | +``` |
| 25 | + |
| 26 | +### Available Monitors |
| 27 | + |
| 28 | +#### Folio Migration Statistics (Developmental) |
| 29 | + |
| 30 | +This monitor tracks page/folio migration statistics in the Linux kernel. It's marked as "developmental" because it requires kernel patches that are not yet upstream. |
| 31 | + |
| 32 | +**Requirements:** |
| 33 | +- Kernel with folio migration debugfs stats patch applied |
| 34 | +- Debugfs mounted at `/sys/kernel/debug` |
| 35 | +- File exists: `/sys/kernel/debug/mm/migrate/stats` |
| 36 | + |
| 37 | +**Configuration:** |
| 38 | +```bash |
| 39 | +make menuconfig |
| 40 | +# Navigate to: Monitors |
| 41 | +# Enable: "Enable monitoring services during workflow execution" |
| 42 | +# Enable: "Enable developmental statistics (not yet upstream)" |
| 43 | +# Enable: "Monitor folio migration statistics" |
| 44 | +# Set: "Folio migration monitoring interval" (default: 60 seconds) |
| 45 | +``` |
| 46 | + |
| 47 | +## Integration with Workflows |
| 48 | + |
| 49 | +### Currently Supported Workflows |
| 50 | + |
| 51 | +- **fstests**: Filesystem testing framework |
| 52 | + |
| 53 | +### How Workflows Integrate Monitoring |
| 54 | + |
| 55 | +Workflows integrate monitoring by including the monitoring role at appropriate points. Here's the pattern used in fstests: |
| 56 | + |
| 57 | +```yaml |
| 58 | +# Start monitoring before tests |
| 59 | +- name: Start monitoring services |
| 60 | + include_role: |
| 61 | + name: monitoring |
| 62 | + tasks_from: monitor_run |
| 63 | + when: |
| 64 | + - kdevops_run_fstests|bool |
| 65 | + - enable_monitoring|default(false)|bool |
| 66 | + tags: [ 'oscheck', 'fstests', 'run_tests', 'monitoring', 'monitor_run' ] |
| 67 | + |
| 68 | +# ... workflow tasks run here ... |
| 69 | + |
| 70 | +# Stop monitoring and collect data after tests |
| 71 | +- name: Stop monitoring services and collect data |
| 72 | + include_role: |
| 73 | + name: monitoring |
| 74 | + tasks_from: monitor_collect |
| 75 | + when: |
| 76 | + - kdevops_run_fstests|bool |
| 77 | + - enable_monitoring|default(false)|bool |
| 78 | + tags: [ 'oscheck', 'fstests', 'run_tests', 'monitoring', 'monitor_collect' ] |
| 79 | +``` |
| 80 | +
|
| 81 | +### Adding Monitoring to Your Workflow |
| 82 | +
|
| 83 | +To add monitoring support to a new workflow: |
| 84 | +
|
| 85 | +1. **Identify the execution boundaries**: Determine where your workflow starts and completes its main work. |
| 86 | +
|
| 87 | +2. **Include the monitoring role**: Add the monitoring role calls before and after your main tasks: |
| 88 | +
|
| 89 | +```yaml |
| 90 | +# In your workflow's main task file (e.g., playbooks/roles/YOUR_WORKFLOW/tasks/main.yml) |
| 91 | + |
| 92 | +# Set custom monitoring results path (optional) |
| 93 | +- name: Set monitoring results path for this workflow |
| 94 | + set_fact: |
| 95 | + monitoring_results_base_path: "{{ topdir_path }}/workflows/YOUR_WORKFLOW/results/monitoring" |
| 96 | + when: |
| 97 | + - enable_monitoring|default(false)|bool |
| 98 | + |
| 99 | +# Start monitoring |
| 100 | +- name: Start monitoring services |
| 101 | + include_role: |
| 102 | + name: monitoring |
| 103 | + tasks_from: monitor_run |
| 104 | + when: |
| 105 | + - your_workflow_condition|bool |
| 106 | + - enable_monitoring|default(false)|bool |
| 107 | + tags: [ 'your_workflow', 'monitoring', 'monitor_run' ] |
| 108 | + |
| 109 | +# Your workflow tasks here... |
| 110 | + |
| 111 | +# Stop monitoring |
| 112 | +- name: Stop monitoring services and collect data |
| 113 | + include_role: |
| 114 | + name: monitoring |
| 115 | + tasks_from: monitor_collect |
| 116 | + when: |
| 117 | + - your_workflow_condition|bool |
| 118 | + - enable_monitoring|default(false)|bool |
| 119 | + tags: [ 'your_workflow', 'monitoring', 'monitor_collect' ] |
| 120 | +``` |
| 121 | +
|
| 122 | +3. **Test the integration**: Run your workflow with monitoring enabled to verify data collection. |
| 123 | +
|
| 124 | +## Output and Results |
| 125 | +
|
| 126 | +### Result Location |
| 127 | +
|
| 128 | +Monitoring results are stored in workflow-specific directories: |
| 129 | +
|
| 130 | +- **fstests**: `workflows/fstests/results/monitoring/` |
| 131 | +- **Other workflows**: `workflows/YOUR_WORKFLOW/results/monitoring/` |
| 132 | + |
| 133 | +Workflows can customize the results path by setting the `monitoring_results_base_path` variable in their playbook. |
| 134 | + |
| 135 | +### Result Files |
| 136 | + |
| 137 | +For folio migration monitoring: |
| 138 | +- `<hostname>_folio_migration_stats.txt`: Raw statistics with timestamps |
| 139 | +- `<hostname>_folio_migration_plot.png`: Visualization plot (if generation succeeds) |
| 140 | + |
| 141 | +### Example Output |
| 142 | + |
| 143 | +Raw statistics file format: |
| 144 | +``` |
| 145 | +2024-01-15 10:30:00 |
| 146 | +success: 12345 |
| 147 | +fail: 67 |
| 148 | +total: 12412 |
| 149 | +
|
| 150 | +2024-01-15 10:31:00 |
| 151 | +success: 12456 |
| 152 | +fail: 68 |
| 153 | +total: 12524 |
| 154 | +``` |
| 155 | + |
| 156 | +## Running Workflows with Monitoring |
| 157 | + |
| 158 | +### Example: fstests with Folio Migration Monitoring |
| 159 | + |
| 160 | +1. **Configure monitoring**: |
| 161 | +```bash |
| 162 | +make menuconfig |
| 163 | +# Enable monitoring options as described above |
| 164 | +make |
| 165 | +``` |
| 166 | + |
| 167 | +2. **Provision systems**: |
| 168 | +```bash |
| 169 | +make bringup |
| 170 | +``` |
| 171 | + |
| 172 | +3. **Run tests with monitoring**: |
| 173 | +```bash |
| 174 | +# Run on both baseline and dev groups |
| 175 | +make fstests-tests TESTS=generic/003 |
| 176 | +
|
| 177 | +# Or run on specific group |
| 178 | +make fstests-baseline TESTS=generic/003 |
| 179 | +``` |
| 180 | + |
| 181 | +4. **Check results**: |
| 182 | +```bash |
| 183 | +ls -la workflows/fstests/results/monitoring/ |
| 184 | +``` |
| 185 | + |
| 186 | +## Advanced Usage |
| 187 | + |
| 188 | +### Custom Monitoring Intervals |
| 189 | + |
| 190 | +You can override the monitoring interval at runtime: |
| 191 | + |
| 192 | +```bash |
| 193 | +make fstests-tests EXTRA_VARS="monitor_folio_migration_interval=30" |
| 194 | +``` |
| 195 | + |
| 196 | +### Selective Monitoring |
| 197 | + |
| 198 | +You can enable/disable specific monitors at runtime: |
| 199 | + |
| 200 | +```bash |
| 201 | +# Enable only folio migration monitoring |
| 202 | +make fstests-tests EXTRA_VARS="enable_monitoring=true monitor_folio_migration=true" |
| 203 | +``` |
| 204 | + |
| 205 | +## Troubleshooting |
| 206 | + |
| 207 | +### Monitor Not Starting |
| 208 | + |
| 209 | +1. **Check kernel support**: |
| 210 | +```bash |
| 211 | +ansible all -m shell -a "ls -la /sys/kernel/debug/mm/migrate/stats" |
| 212 | +``` |
| 213 | + |
| 214 | +2. **Verify debugfs is mounted**: |
| 215 | +```bash |
| 216 | +ansible all -m shell -a "mount | grep debugfs" |
| 217 | +``` |
| 218 | + |
| 219 | +3. **Check monitoring process**: |
| 220 | +```bash |
| 221 | +ansible all -m shell -a "ps aux | grep monitoring" |
| 222 | +``` |
| 223 | + |
| 224 | +### No Data Collected |
| 225 | + |
| 226 | +1. **Verify monitoring was enabled**: |
| 227 | +```bash |
| 228 | +grep -E "enable_monitoring|monitor_" .config |
| 229 | +``` |
| 230 | + |
| 231 | +2. **Check ansible output for monitoring tasks**: |
| 232 | +```bash |
| 233 | +make fstests-tests AV=2 | grep -A5 -B5 monitoring |
| 234 | +``` |
| 235 | + |
| 236 | +3. **Look for error messages**: |
| 237 | +```bash |
| 238 | +ansible all -m shell -a "cat /root/monitoring/folio_migration.log" |
| 239 | +``` |
| 240 | + |
| 241 | +## Adding New Monitors |
| 242 | + |
| 243 | +To add a new monitor to the framework: |
| 244 | + |
| 245 | +1. **Add Kconfig option** in `kconfigs/monitors/Kconfig`: |
| 246 | +```kconfig |
| 247 | +config MONITOR_YOUR_METRIC |
| 248 | + bool "Monitor your metric description" |
| 249 | + output yaml |
| 250 | + default n |
| 251 | + help |
| 252 | + Detailed description of what this monitors... |
| 253 | +``` |
| 254 | + |
| 255 | +2. **Extend monitoring role**: |
| 256 | + - Add collection logic in `playbooks/roles/monitoring/tasks/monitor_run.yml` |
| 257 | + - Add termination and data collection in `playbooks/roles/monitoring/tasks/monitor_collect.yml` |
| 258 | + |
| 259 | +3. **Add visualization** (optional): |
| 260 | + - Place scripts in `playbooks/roles/monitoring/files/` |
| 261 | + - Call them from `monitor_collect.yml` |
| 262 | + |
| 263 | +4. **Update documentation**: Add your monitor to this documentation file. |
| 264 | + |
| 265 | +## Performance Considerations |
| 266 | + |
| 267 | +- **Monitoring overhead**: Each monitor adds some system overhead. Consider the trade-off between data granularity and performance impact. |
| 268 | +- **Storage requirements**: Long-running tests with frequent monitoring can generate large data files. |
| 269 | +- **Concurrent monitors**: Running multiple monitors simultaneously increases overhead. |
| 270 | + |
| 271 | +## Future Enhancements |
| 272 | + |
| 273 | +Planned monitoring additions: |
| 274 | +- Memory pressure statistics |
| 275 | +- CPU utilization tracking |
| 276 | +- I/O statistics collection |
| 277 | +- Network traffic monitoring |
| 278 | +- Custom perf event monitoring |
| 279 | +- Integration with Grafana/Prometheus for real-time visualization |
0 commit comments