Skip to content

Conversation

@jiannanWang
Copy link
Contributor

@jiannanWang jiannanWang commented Dec 15, 2025

This PR adds power-aware benchmarking for GPU kernels in BackendBench.
It introduces:

  • PowerManager: Collects GPU power, temperature, and frequency data during benchmarks.
  • do_bench_power: Benchmarks kernel execution while measuring energy consumption.
  • PerformanceTestResult: Now records total energy used per test.
  • Power plots: Generates CSV and visualizations for power, temperature, and frequency.
  • Dependencies: Adds nvidia-ml-py and matplotlib for power monitoring and plotting.
    This enables precise measurement of energy usage for GPU kernels, supporting energy-efficient optimization.

Example (check the "total_energy" entries):

  {
    "op_name": "add.Scalar",
    "args": "((T([100, 1], i64), 1,), {})",
    "speedup": 1.0037302273210138,
    "total_energy": 0.0011545763355,
    "benchmark_time_ms": 0.006022225360552894,
    "reference_time_ms": 0.006044689630126131,
    "error_msg": "",
    "successfully_ran": true,
    "test_type": "performance"
  },
  {
    "op_name": "add.Tensor",
    "args": "((T([128100, 1536], f16), T([128100, 1536], f16),), {})",
    "speedup": 0.999044778094025,
    "total_energy": 0.13822618472830014,
    "benchmark_time_ms": 0.5277284108675443,
    "reference_time_ms": 0.5272243131290782,
    "error_msg": "",
    "successfully_ran": true,
    "test_type": "performance"
  },
  {
    "op_name": "add.Tensor",
    "args": "((T([256, 1024, 1024], f16), T([256, 1024, 1024], f16),), {})",
    "speedup": 1.0008341558163996,
    "total_energy": 0.1871731542646,
    "benchmark_time_ms": 0.7144741316636404,
    "reference_time_ms": 0.7150701144162346,
    "error_msg": "",
    "successfully_ran": true,
    "test_type": "performance"
  },
  {
    "op_name": "add_.Tensor",
    "args": "((T([128, 512, 28, 28], f16), T([128, 512, 28, 28], f16),), {})",
    "speedup": 1.0001593311388208,
    "total_energy": 0.03350459256860003,
    "benchmark_time_ms": 0.1444975220835301,
    "reference_time_ms": 0.14452054503828043,
    "error_msg": "",
    "successfully_ran": true,
    "test_type": "performance"
  },

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Dec 15, 2025
@jiannanWang jiannanWang marked this pull request as ready for review December 15, 2025 23:13
@msaroufim
Copy link
Member

So the way I'd go about this before adding a utility is to take a couple of pytorch operators on H100 and B200, plot their runtimes in a loop and track the temperature and power draw respectively and see what we can learn

@jiannanWang
Copy link
Contributor Author

So the way I'd go about this before adding a utility is to take a couple of pytorch operators on H100 and B200, plot their runtimes in a loop and track the temperature and power draw respectively and see what we can learn

Sounds great! I'll do the experiment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants