Skip to content

Commit bcae6d0

Browse files
committed
feat: add native RAPL support for real-time energy measurements in energy monitor
1 parent fddf6b9 commit bcae6d0

File tree

5 files changed

+373
-66
lines changed

5 files changed

+373
-66
lines changed

src/32-http2/.config

Lines changed: 0 additions & 2 deletions
This file was deleted.

src/48-energy/README.md

Lines changed: 49 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -410,6 +410,46 @@ Performance Analysis:
410410
- Traditional samples at fixed intervals (100ms)
411411
```
412412

413+
### Hardware RAPL Energy Measurements
414+
415+
The energy monitor now includes **native support for Intel RAPL (Running Average Power Limit) hardware counters**, providing real-time energy measurements directly from CPU hardware instead of relying on software estimation. RAPL is available on Intel processors since Sandy Bridge (2011) and AMD processors since Zen+ (2019).
416+
417+
When RAPL is available, the tool automatically detects and uses hardware energy counters through the Linux `perf_event` interface. This provides several significant advantages over software estimation: it captures actual power consumption including dynamic frequency scaling (DVFS), CPU idle states (C-states), and voltage changes that simple time-based estimation cannot account for. RAPL measures multiple energy domains including the entire CPU package, individual cores, integrated graphics (uncore), and DRAM, giving you a complete picture of system energy consumption.
418+
419+
To use RAPL hardware measurements:
420+
421+
```bash
422+
# Automatic RAPL detection (default behavior)
423+
sudo ./energy_monitor -d 10
424+
425+
# Disable RAPL and use software estimation
426+
sudo ./energy_monitor -d 10 --no-rapl -p 15.0
427+
428+
# Check RAPL availability on your system
429+
ls /sys/bus/event_source/devices/power/events/
430+
```
431+
432+
Example output with RAPL enabled:
433+
434+
```
435+
RAPL initialized with 2 domains
436+
Using hardware RAPL energy counters
437+
Energy monitor started... Hit Ctrl-C to end.
438+
439+
=== Energy Usage Summary ===
440+
PID COMM Runtime (ms) Energy (mJ)
441+
...
442+
443+
=== RAPL Hardware Energy Measurements ===
444+
pkg : 231.716422 J (231716.42 mJ)
445+
cores : 159.937200 J (159937.20 mJ)
446+
447+
Total RAPL energy: 391.653622 J (391653.62 mJ)
448+
Measurement method: Hardware RAPL counters
449+
```
450+
451+
The RAPL measurements show total system energy consumption across all processes, while the per-process CPU time breakdown helps identify which applications consumed the most CPU cycles. For more information about RAPL and power capping, see the [Linux Powercap Framework documentation](https://docs.kernel.org/power/powercap/powercap.html).
452+
413453
## Understanding Energy Monitoring Trade-offs
414454

415455
While our energy monitor provides valuable insights, it's important to understand its limitations and trade-offs:
@@ -470,25 +510,17 @@ As a **teaching tool**, energy monitoring makes abstract concepts tangible by sh
470510

471511
## Extending the Energy Monitor
472512

473-
The current implementation provides a solid foundation for building more sophisticated energy monitoring capabilities. Several enhancement directions offer significant value for different deployment scenarios.
474-
475-
| Extension Area | Implementation Approach | Value Proposition |
476-
|---------------|------------------------|-------------------|
477-
| **Hardware Counter Integration** | Integrate RAPL counters via `PERF_TYPE_POWER` events | Replace estimation with actual hardware measurements |
478-
| **Per-Core Power Modeling** | Track core assignment and model P-core vs E-core differences | Accurate attribution on heterogeneous processors |
479-
| **Workload Classification** | Classify CPU-intensive, memory-bound, I/O-bound, and idle patterns | Enable workload-specific power optimization |
480-
| **Container Runtime Integration** | Aggregate energy by container/pod for Kubernetes environments | Cloud-native energy attribution and billing |
481-
| **Real-time Visualization** | Web dashboard with live energy consumption graphs | Immediate feedback for energy optimization |
513+
The current implementation includes **native RAPL hardware counter support** for real-time energy measurements and provides a solid foundation for building more sophisticated energy monitoring capabilities. Several enhancement directions offer significant value for different deployment scenarios.
482514

483-
**Hardware counter integration** represents the most impactful enhancement, replacing our simplified estimation model with actual hardware measurements through RAPL (Running Average Power Limit) interfaces. Modern processors provide detailed energy counters that can be read via performance events, offering precise energy measurements down to individual CPU packages.
515+
| Extension Area | Implementation Status | Value Proposition |
516+
|---------------|----------------------|-------------------|
517+
| **Hardware Counter Integration** |**Implemented** - RAPL counters via `perf_event_open()` | Actual hardware measurements with automatic fallback |
518+
| **Per-Core Power Modeling** | Future enhancement | Accurate attribution on heterogeneous processors (P-cores vs E-cores) |
519+
| **Workload Classification** | Future enhancement | Enable workload-specific power optimization |
520+
| **Container Runtime Integration** | Future enhancement | Cloud-native energy attribution and billing |
521+
| **Real-time Visualization** | Future enhancement | Immediate feedback for energy optimization |
484522

485-
```c
486-
// Read RAPL counters for actual energy measurements
487-
struct perf_event_attr attr = {
488-
.type = PERF_TYPE_POWER,
489-
.config = PERF_COUNT_HW_POWER_PKG,
490-
};
491-
```
523+
**Hardware counter integration (✅ Implemented)**: The energy monitor now includes full RAPL support, reading actual hardware energy counters through the Linux `perf_event` interface. This provides precise energy measurements across multiple domains (package, cores, DRAM) and automatically falls back to software estimation when RAPL is unavailable. See the [Hardware RAPL Energy Measurements](#hardware-rapl-energy-measurements) section above for usage details.
492524

493525
**Per-core power modeling** becomes essential on heterogeneous processors where performance cores and efficiency cores have dramatically different power characteristics. Tracking which core each process runs on enables accurate energy attribution:
494526

src/48-energy/README.zh.md

Lines changed: 49 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -361,6 +361,46 @@ sudo ./compare_monitors.sh -d 10 -w "stress --cpu 2 --timeout 10"
361361
- 传统采样以固定间隔(100ms)
362362
```
363363

364+
### 硬件 RAPL 能源测量
365+
366+
能源监控器现已内置**对 Intel RAPL(运行平均功率限制)硬件计数器的原生支持**,可直接从 CPU 硬件获取实时能源测量,而无需依赖软件估算。RAPL 在 Intel Sandy Bridge(2011 年)之后的处理器和 AMD Zen+(2019 年)之后的处理器上可用。
367+
368+
当 RAPL 可用时,工具会通过 Linux `perf_event` 接口自动检测并使用硬件能源计数器。相比软件估算,这提供了几个显著优势:它能够捕获实际功耗,包括动态频率调节(DVFS)、CPU 空闲状态(C 状态)和电压变化,这些是简单的基于时间的估算无法考虑的。RAPL 可以测量多个能源域,包括整个 CPU 封装、单独的核心、集成显卡(uncore)和 DRAM,为您提供系统能源消耗的完整画面。
369+
370+
使用 RAPL 硬件测量:
371+
372+
```bash
373+
# 自动 RAPL 检测(默认行为)
374+
sudo ./energy_monitor -d 10
375+
376+
# 禁用 RAPL 并使用软件估算
377+
sudo ./energy_monitor -d 10 --no-rapl -p 15.0
378+
379+
# 检查系统上的 RAPL 可用性
380+
ls /sys/bus/event_source/devices/power/events/
381+
```
382+
383+
启用 RAPL 的示例输出:
384+
385+
```
386+
RAPL initialized with 2 domains
387+
Using hardware RAPL energy counters
388+
Energy monitor started... Hit Ctrl-C to end.
389+
390+
=== Energy Usage Summary ===
391+
PID COMM Runtime (ms) Energy (mJ)
392+
...
393+
394+
=== RAPL Hardware Energy Measurements ===
395+
pkg : 231.716422 J (231716.42 mJ)
396+
cores : 159.937200 J (159937.20 mJ)
397+
398+
Total RAPL energy: 391.653622 J (391653.62 mJ)
399+
Measurement method: Hardware RAPL counters
400+
```
401+
402+
RAPL 测量显示所有进程的总系统能源消耗,而每个进程的 CPU 时间细分有助于识别哪些应用程序消耗了最多的 CPU 周期。有关 RAPL 和功率限制的更多信息,请参阅 [Linux Powercap 框架文档](https://docs.kernel.org/power/powercap/powercap.html)
403+
364404
## 理解能源监控权衡
365405

366406
虽然我们的能源监控器提供了有价值的见解,但了解其局限性和权衡很重要:
@@ -421,17 +461,17 @@ eBPF 能源监控作为一种强大的研究和教育工具,弥合了理论理
421461

422462
## 扩展能源监控器
423463

424-
当前的实现为构建更复杂的能源监控功能提供了坚实的基础,有多个扩展方向值得探索。**硬件计数器集成**是最有影响力的增强方向,通过 `PERF_TYPE_POWER` 事件集成 RAPL 计数器,可以用实际的硬件测量来替换我们的估算模型,大幅提升精度。**每核功率建模**在处理异构处理器时尤为重要,通过跟踪进程的核心分配并建模性能核心(P 核)与效率核心(E 核)之间的功耗差异,能够实现更准确的能源归因。**工作负载分类**功能可以识别 CPU 密集型、内存绑定、I/O 绑定和空闲模式等不同工作负载类型,从而实现针对特定工作负载的功率优化策略。**容器运行时集成**使得系统能够按容器或 pod 聚合 Kubernetes 环境中的能源消耗,支持云原生的能源归因和计费。**实时可视化**通过提供带有能源消耗图表的 Web 仪表板,为能源优化提供即时的视觉反馈
464+
当前实现已包含**原生 RAPL 硬件计数器支持**以实现实时能源测量,并为构建更复杂的能源监控功能提供了坚实的基础。多个扩展方向为不同的部署场景提供了重要价值
425465

426-
**硬件计数器集成**代表了最有影响力的增强,通过 RAPL(运行平均功率限制)接口用实际硬件测量替换我们的简化估计模型。现代处理器提供详细的能源计数器,可以通过性能事件读取,提供精确到单个 CPU 封装的能源测量。
466+
| 扩展领域 | 实现状态 | 价值主张 |
467+
|---------|---------|---------|
468+
| **硬件计数器集成** |**已实现** - 通过 `perf_event_open()` 的 RAPL 计数器 | 实际硬件测量,自动回退功能 |
469+
| **每核功率建模** | 未来增强 | 异构处理器上的准确归因(P 核与 E 核) |
470+
| **工作负载分类** | 未来增强 | 实现特定工作负载的功率优化 |
471+
| **容器运行时集成** | 未来增强 | 云原生能源归因和计费 |
472+
| **实时可视化** | 未来增强 | 能源优化的即时反馈 |
427473

428-
```c
429-
// 读取 RAPL 计数器以获取实际能源测量
430-
struct perf_event_attr attr = {
431-
.type = PERF_TYPE_POWER,
432-
.config = PERF_COUNT_HW_POWER_PKG,
433-
};
434-
```
474+
**硬件计数器集成(✅ 已实现)**:能源监控器现已包含完整的 RAPL 支持,通过 Linux `perf_event` 接口读取实际的硬件能源计数器。这提供了跨多个域(封装、核心、DRAM)的精确能源测量,并在 RAPL 不可用时自动回退到软件估算。详情请参阅上面的[硬件 RAPL 能源测量](#硬件-rapl-能源测量)部分。
435475

436476
**每核功率建模**在异构处理器上变得至关重要,其中性能核心和效率核心具有截然不同的功率特性。跟踪每个进程在哪个核心上运行可以实现准确的能源归因:
437477

0 commit comments

Comments
 (0)