Where to find the code:
- ThreadSystem:
include/core/ThreadSystem.hpp - WorkerBudget:
include/core/WorkerBudget.hpp,src/core/WorkerBudget.cpp - Power profiling tools:
tests/power_profiling/
HammerEngine uses a race-to-idle strategy optimized for battery-powered devices. The engine completes frame work as quickly as possible, then sleeps until the next vsync, keeping CPU cores in low-power C-states even during active gameplay.
- CPU: Apple M3 Pro (11-core)
- Battery: 70Wh
- OS: macOS
- Build: Debug (Release would be even more efficient)
Note: These benchmarks are updated with each major branch update to track performance over time.
| Scenario | CPU Active | Idle Residency | Power Avg | Battery Life |
|---|---|---|---|---|
| Idle gameplay (GPU path) | 13.4% | 86.7% | 0.69W | 101 hours |
| Idle gameplay (SDL_Renderer) | 14.3% | 85.8% | 0.87W | 80 hours |
| Typical gameplay | 17-19% | 80%+ | 2.1-2.6W | 27-33 hours |
| Sustained combat/action | ~20% | 80%+ | 11-13W | 5-6 hours |
| Stress test (max entities) | All systems | 60-80% | 27-28W | 2.5 hours |
Key takeaway: The GPU rendering path (-DUSE_SDL3_GPU=ON) achieves 21% lower power and 27% better battery life than SDL_Renderer for the same workload. During typical gameplay, the engine draws only 2-3W with 80%+ idle residency. GPU rendering is more efficient because draw calls complete faster, giving the CPU more time to idle.
| Entities | Power Avg | Battery Drain/Test | FPS | Throughput |
|---|---|---|---|---|
| 0 (Idle) | 0.10W | 0.001% | 49.2 | N/A |
| 10,000 | 0.06W | 0.001% | 48.7 | 487K ops/sec |
| 50,000 | 0.13W | 0.003% | 49.0 | 2.45M ops/sec |
Key: All headless tests combined = <0.01% battery drain. The AI/collision systems are extremely efficient.
| Metric | Target | Actual | Status |
|---|---|---|---|
| C-State Residency (headless) | >80% | 81%+ | ✅ EXCELLENT |
| C-State Residency (gameplay) | >70% | 86.7% (GPU) | ✅ EXCEPTIONAL |
| Power Draw (idle, 0 entities) | <1W | 0.69W (GPU) | ✅ EXCELLENT |
| Power Draw (typical gameplay) | <5W | 2.1-2.6W | ✅ EXCELLENT |
| Battery (typical gameplay) | >20 hours | 101 hours (GPU idle) | ✅ EXCEPTIONAL |
| Power Draw (sustained action) | <15W | 11-13W | ✅ GOOD |
| Battery drain (50K entity test) | <1% | 0.003% | ✅ EXCEPTIONAL |
Frame Timeline (16.67ms @ 60 FPS):
Headless (AI only):
├─ Work: 2-5ms ─────┤
│ ├─ Idle: 11-14ms (C-states) ────────────┤
└───────────────────────────────────────────────────────────┘
Result: 95% idle residency
Real App (with rendering):
├─ Work: 4-6ms ─────────┤
│ ├─ Idle: 10-12ms (vsync wait) ──────┤
└───────────────────────────────────────────────────────────┘
Result: 80%+ idle residency (still excellent!)
Both modes maintain high C-state residency because:
- Sequential manager execution - Each manager gets ALL workers, completes quickly
- Adaptive batch sizing - WorkerBudget hill-climbing finds optimal throughput
- No busy-waiting - Threads sleep between frames
- VSync alignment - Rendering waits for display refresh
| Residency | Rating | Meaning |
|---|---|---|
| >80% | Excellent | Cores sleeping most of the time |
| 60-80% | Good | Significant idle periods remain |
| 40-60% | Moderate | More sustained work |
| <40% | Poor | CPU rarely sleeps, high battery drain |
| Power | Scenario |
|---|---|
| <1W | Idle baseline (menu screens) |
| 1-5W | Light gameplay |
| 5-10W | Normal gameplay |
| 10-15W | Heavy workload |
| >15W | High load (watch battery) |
# Measure actual game for 30 seconds
sudo tests/power_profiling/run_power_test.sh --real-app
# Measure for 60 seconds
sudo tests/power_profiling/run_power_test.sh --real-app --duration 60# Run complete headless benchmark (~30 minutes)
sudo tests/power_profiling/run_power_test.sh
# Parse results
python3 tests/power_profiling/parse_powermetrics.py \
tests/test_results/power_profiling/power_*.plistBattery Hours = Battery Capacity (Wh) / Average Power (W)
Example (M3 Pro, 70Wh battery):
Light play: 70Wh / 2.5W = 28 hours
Gameplay: 70Wh / 12W = 5.8 hours
Peak load: 70Wh / 28W = 2.5 hours
- Avoid per-frame allocations - Reuse buffers, pre-allocate
- Use WorkerBudget - Let the system find optimal batch sizes
- Don't busy-wait - Use proper synchronization primitives
- Batch operations - Process multiple items per task
- SIMD where applicable - 4-wide processing with SIMDMath.hpp
- GPU rendering - Use
-DUSE_SDL3_GPU=ONfor 21% power reduction
| Problem | Cause | Solution |
|---|---|---|
| Low idle residency (<40%) | Rendering loop too fast | Check vsync settings |
| High idle power (>3W) | Background work | Audit update loops |
| Spiky power draw | Uneven batching | Use WorkerBudget |
| Metric | Before | After | Improvement |
|---|---|---|---|
| Idle residency | 60-70% | 80%+ | +15-20% |
| Light play power | 4-5W | 2-3W | ~50% reduction |
| Frame work time | 8-10ms | 4-6ms | ~50% faster |
Key changes:
- Removed GameLoop class (eliminated thread contention)
- Sequential manager execution (no concurrent manager overhead)
- WorkerBudget hill-climbing (optimal batch sizing)
- Dual-path collision threading (efficient scaling)
| Metric | SDL_Renderer | SDL3 GPU | Improvement |
|---|---|---|---|
| Avg Power | 0.87W | 0.69W | -21% |
| Idle Residency | 85.8% | 86.7% | +0.9% |
| Battery Life | 80 hours | 101 hours | +27% |
Key changes:
- GPU-accelerated rendering via SDL3 GPU API
- Draw calls complete faster, more CPU idle time
- Best idle residency achieved (86.7%)
- ThreadSystem - Thread pool and task scheduling
- WorkerBudget - Adaptive batch optimization
- Power Profiling Tools - Full profiling documentation
- GPU Rendering - GPU rendering system documentation
- Power Profile Analysis - Detailed power analysis with GPU comparison