The Power Prediction Model is a machine learning-based system that predicts power consumption for FFmpeg transcoding workloads. This document covers two predictors:
- PowerPredictor (v0.1): Simple univariate predictor based on stream count
- MultivariatePredictor (v0.2): Advanced multivariate predictor with ensemble models and confidence intervals
Key Features:
- Automatic model selection (linear vs polynomial regression)
- Ensemble models (Random Forest, Gradient Boosting)
- Robust scenario name parsing
- Handles missing data gracefully
- Provides prediction confidence through R² scores and confidence intervals
- Hardware-aware model storage and versioning
- Exports predictions to CSV and Prometheus metrics
The MultivariatePredictor extends the basic PowerPredictor with:
Multiple Input Features:
stream_count: Number of concurrent transcoding streamsbitrate_mbps: Bitrate in megabits per secondtotal_pixels: Sum of width × height × fps across all outputscpu_usage_pct: Mean CPU usage percentage during scenarioencoder_type: One-hot encoded (x264, NVENC, etc.)hardware_cpu_model: Hashed or one-hot encoded CPU modelcontainer_cpu_pct: Docker container CPU overhead percentage
Ensemble of Regression Models:
- Linear Regression: Baseline model
- Polynomial Regression (degree=2,3): Non-linear relationships
- RandomForestRegressor: Handles complex interactions
- GradientBoostingRegressor: State-of-the-art performance (with XGBoost fallback)
Prediction Targets:
mean_power_watts: Mean power consumptiontotal_energy_joules: Total energy consumedefficiency_score: Direct efficiency prediction
Confidence Intervals:
- Bootstrapped prediction intervals
- Configurable confidence level (default: 95%)
- Shows prediction uncertainty
Hardware Awareness:
- Per-hardware model storage
- Automatic hardware fingerprinting
- Fallback to universal model if hardware unknown
from advisor import MultivariatePredictor
# Create predictor with ensemble models
predictor = MultivariatePredictor(
models=['linear', 'poly2', 'rf', 'gbm'],
confidence_level=0.95,
n_bootstrap=100,
cv_folds=5
)
# Train on scenario results
success = predictor.fit(scenarios, target='mean_power_watts')
if success:
# Make prediction with confidence intervals
prediction = predictor.predict({
'stream_count': 6,
'bitrate_mbps': 3.0,
'total_pixels': 1920*1080*30*60,
'cpu_usage_pct': 80.0,
'encoder_type': 'x264',
'hardware_cpu_model': 'Intel_i7_9700K',
'container_cpu_pct': 7.0
}, return_confidence=True)
print(f"Predicted power: {prediction['mean']:.2f} W")
print(f"Confidence interval: [{prediction['ci_low']:.2f}, {prediction['ci_high']:.2f}] W")
print(f"Confidence width: {prediction['ci_width']:.2f} W")
print(f"Model used: {prediction['model']}")info = predictor.get_model_info()
print(f"Trained: {info['trained']}")
print(f"Best model: {info['best_model']}")
print(f"R² score: {info['best_score']['r2']:.4f}")
print(f"RMSE: {info['best_score']['rmse']:.2f} W")
print(f"Training samples: {info['n_samples']}")
print(f"Features: {', '.join(info['feature_names'])}")from pathlib import Path
# Save trained model
model_path = Path('advisor/models/Intel_i7_9700K/power_model_v1.pkl')
predictor.save(model_path)
# Load saved model
loaded_predictor = MultivariatePredictor.load(model_path)# Predict for multiple configurations efficiently
features_list = [
{'stream_count': 2, 'bitrate_mbps': 2.5, ...},
{'stream_count': 4, 'bitrate_mbps': 2.5, ...},
{'stream_count': 8, 'bitrate_mbps': 5.0, ...},
]
predictions = predictor.predict_batch(features_list, return_confidence=True)
for i, pred in enumerate(predictions):
print(f"Config {i+1}: {pred['mean']:.2f} ± {pred['ci_width']/2:.2f} W")The multivariate predictor is integrated into analyze_results.py:
# Use multivariate predictor for analysis
python3 scripts/analyze_results.py --multivariate
# Generate predictions for specific stream counts
python3 scripts/analyze_results.py --multivariate --predict-future 1,2,4,8,12,16
# Use simple predictor (backward compatible)
python3 scripts/analyze_results.py --predict-future 1,2,4,8,12The results_exporter automatically trains the multivariate predictor and exposes metrics:
# Predicted power consumption
results_scenario_predicted_power_watts{run_id="test_results_20231215_143022"}
# Predicted energy consumption
results_scenario_predicted_energy_joules{run_id="test_results_20231215_143022"}
# Confidence interval bounds
results_scenario_prediction_confidence_low{run_id="test_results_20231215_143022"}
results_scenario_prediction_confidence_high{run_id="test_results_20231215_143022"}
Two new dashboards are available:
-
Future Load Predictions (
future-load-predictions.json)- Measured vs Predicted power comparison
- Prediction confidence intervals
- Prediction accuracy gauge
- Confidence interval width
- Detailed prediction results table
-
Efficiency Forecasting (
efficiency-forecasting.json)- Energy efficiency scores by scenario
- Efficiency rankings table
- Top 5 most efficient configurations
- Efficiency score distribution
The predictor automatically selects the best model based on cross-validation R² scores:
Training 5 models on 12 samples...
linear: R²=0.9234, RMSE=15.23
poly2: R²=0.9567, RMSE=11.45
poly3: R²=0.9601, RMSE=10.89
rf: R²=0.9823, RMSE=7.34
gbm: R²=0.9891, RMSE=5.67
Best model: gbm (R²=0.9891, RMSE=5.67)
Prediction uncertainty is quantified using bootstrapped confidence intervals:
- Training Phase: Store training data (X, y)
- Bootstrap Resampling: Create N bootstrap samples (default: 100)
- Model Training: Train model on each bootstrap sample
- Prediction: Generate N predictions for the same input
- Confidence Bounds: Calculate percentiles (e.g., 2.5% and 97.5% for 95% CI)
Example output:
Predicted Power: 213.7 W
95% Confidence Interval: [202.3, 225.1] W
Confidence Width: 22.8 W
Interpretation:
- Narrow CI (< 10 W): High confidence, model is certain
- Medium CI (10-30 W): Moderate confidence, some uncertainty
- Wide CI (> 30 W): Low confidence, model is uncertain
The original PowerPredictor remains available for backward compatibility and simple use cases.
Used when training data contains fewer than 6 unique stream count values. Provides a simple, stable model suitable for small datasets.
Formula:
Power(streams) = β₀ + β₁ × streams
Parameters:
β₀(intercept): Baseline power consumption representing idle/overhead powerβ₁(coefficient): Incremental power per additional stream (watts per stream)streams: Number of concurrent transcoding streams
Example Interpretation: If β₀ = 40W and β₁ = 15W/stream, then:
- 0 streams: 40W (baseline/idle)
- 4 streams: 40 + (15 × 4) = 100W
- 8 streams: 40 + (15 × 8) = 160W
Assumptions:
- Linear scaling: Each additional stream adds constant power
- No thermal throttling or frequency scaling effects
- Consistent hardware behavior across workload range
Used when training data contains 6 or more unique stream count values. Captures non-linear effects in power consumption.
Formula:
Power(streams) = β₀ + β₁ × streams + β₂ × streams²
Parameters:
β₀(intercept): Baseline power consumptionβ₁(linear coefficient): Linear component of power scalingβ₂(quadratic coefficient): Non-linear scaling effectsstreams: Number of concurrent transcoding streams
What the Quadratic Term Captures:
- Thermal Throttling: At high loads, CPUs may reduce frequency to manage heat
- Cache Contention: More streams compete for L3 cache, reducing efficiency
- Memory Bandwidth Saturation: DRAM bandwidth becomes bottleneck
- CPU Frequency Scaling: Turbo boost behavior changes with core utilization
- Power Delivery Limits: VRM (Voltage Regulator Module) constraints
Example Interpretation: If β₀ = 35W, β₁ = 18W/stream, β₂ = -0.5W/stream²:
- 2 streams: 35 + (18 × 2) + (-0.5 × 4) = 69W
- 4 streams: 35 + (18 × 4) + (-0.5 × 16) = 99W
- 8 streams: 35 + (18 × 8) + (-0.5 × 64) = 147W (reduced efficiency)
The negative β₂ indicates diminishing returns: each additional stream adds less power than predicted by linear model.
The model expects scenario data from ResultsAnalyzer with the following structure:
scenarios = [
{
'name': '2 Streams @ 2500k', # String with stream count
'power': {
'mean_watts': 80.0, # Mean power during test
'median_watts': 79.5, # Not used by predictor
'min_watts': 75.0, # Not used by predictor
'max_watts': 85.0, # Not used by predictor
'total_energy_joules': 4800.0 # Not used by predictor
},
'bitrate': '2500k', # Not used by predictor
'resolution': '1280x720', # Not used by predictor
'fps': 30, # Not used by predictor
'duration': 60.0 # Not used by predictor
},
# ... more scenarios
]Required Fields:
name: String containing stream count informationpower.mean_watts: Float representing average power consumption
Optional Fields: All other fields are ignored by the predictor but used by other analysis components.
The model automatically extracts stream counts from scenario names using pattern matching.
Supported Patterns:
| Pattern | Example | Extracted Count |
|---|---|---|
N stream(s) |
"4 Streams @ 2500k" |
4 |
N-stream |
"8-stream test" |
8 |
single stream |
"Single Stream @ 1080p" |
1 |
| Leading number | "6 concurrent streams" |
6 |
| Case insensitive | "12 STREAMS Test" |
12 |
Non-Matchable Patterns:
"Baseline (Idle)"→None(no stream count)"Multi Stream Test"→None(ambiguous count)"High Quality Encode"→None(no stream count)
Scenarios without extractable stream counts are automatically filtered out during training.
Minimum Requirements:
- 1 valid data point (model will train but predictions will be poor)
- At least
mean_wattspower measurement for each scenario
Recommended for Linear Model:
- 4+ data points with different stream counts
- Even distribution across stream count range
- Example: [1, 2, 4, 8] streams
Recommended for Polynomial Model:
- 7+ data points with different stream counts
- Wide range of stream counts
- Example: [1, 2, 3, 4, 6, 8, 12] streams
Data Collection Best Practices:
- Run each test for 60+ seconds to get stable power readings
- Allow 10-15 second stabilization before measurement
- Maintain consistent hardware configuration across tests
- Keep ambient temperature stable
- Use same codec, preset, and quality settings
- Measure RAPL (Running Average Power Limit) counters for accuracy
1. Data Extraction
├─ Parse scenario names to infer stream counts
├─ Extract mean_watts from power measurements
└─ Filter scenarios missing either value
2. Feature Engineering
├─ X (features): Stream counts as numpy array [n_samples, 1]
├─ y (target): Power measurements as numpy array [n_samples]
└─ Count unique stream values
3. Model Selection
├─ If unique_streams < 6:
│ └─ Use Linear Regression
└─ If unique_streams ≥ 6:
└─ Use Polynomial Regression (degree=2)
4. Feature Transformation (Polynomial only)
├─ Input: [streams]
├─ Transform: [1, streams, streams²]
└─ Example: [4] → [1, 4, 16]
5. Model Fitting
├─ Algorithm: Ordinary Least Squares (OLS)
├─ Objective: Minimize Σ(y_true - y_pred)²
└─ Solver: sklearn LinearRegression
6. Model Validation
├─ Calculate R² score
├─ R² = 1 - (SS_res / SS_tot)
│ Where:
│ SS_res = Σ(y_true - y_pred)² # Residual sum of squares
│ SS_tot = Σ(y_true - y_mean)² # Total sum of squares
└─ Log R² for quality assessment
| R² Value | Interpretation | Action |
|---|---|---|
| 0.95 - 1.00 | Excellent fit | High confidence predictions |
| 0.85 - 0.95 | Good fit | Reliable predictions |
| 0.70 - 0.85 | Moderate fit | Use with caution |
| 0.50 - 0.70 | Poor fit | Consider more data |
| < 0.50 | Very poor fit | Model not reliable |
| Negative | Model worse than mean | Do not use predictions |
1. Input Validation
└─ Check if model is trained (return None if not)
2. Feature Preparation
├─ Create feature array: X = [[streams]]
└─ For polynomial: Transform to [1, streams, streams²]
3. Model Prediction
├─ Linear: Power = β₀ + β₁ × streams
└─ Polynomial: Power = β₀ + β₁ × streams + β₂ × streams²
4. Post-Processing
├─ Clamp to non-negative: max(0, prediction)
└─ Return as float (watts)
Interpolation (within training range): Generally reliable
Training data: [2, 4, 8] streams
Prediction: 6 streams ✓ (between 4 and 8)
Confidence: High
Extrapolation (outside training range): Use with caution
Training data: [2, 4, 8] streams
Prediction: 16 streams ⚠ (beyond 8)
Confidence: Moderate (within 2x range)
Prediction: 64 streams ✗ (far beyond 8)
Confidence: Low (> 2x range, avoid)
Extrapolation Risks:
- Linear model assumes constant scaling (may diverge from reality)
- Polynomial model can diverge rapidly outside training range
- Real systems may have thermal limits not captured in model
- CPU throttling behavior may change at extreme loads
- Power supply limits may cap maximum power
from advisor import PowerPredictor
# Create predictor
predictor = PowerPredictor()
# Load scenarios (from ResultsAnalyzer)
scenarios = [
{'name': '2 Streams @ 2500k', 'power': {'mean_watts': 80.0}},
{'name': '4 Streams @ 2500k', 'power': {'mean_watts': 150.0}},
{'name': '8 Streams @ 1080p', 'power': {'mean_watts': 280.0}},
]
# Train model
success = predictor.fit(scenarios)
if success:
print("Model trained successfully!")
else:
print("Failed to train (no valid data)")
# Make predictions
power_6 = predictor.predict(6)
print(f"Predicted power for 6 streams: {power_6:.2f} W")
power_12 = predictor.predict(12)
print(f"Predicted power for 12 streams: {power_12:.2f} W")# Get model metadata
info = predictor.get_model_info()
print(f"Trained: {info['trained']}")
print(f"Model Type: {info['model_type']}")
print(f"Training Samples: {info['n_samples']}")
print(f"Stream Range: {info['stream_range']}")
# Example output:
# Trained: True
# Model Type: linear
# Training Samples: 3
# Stream Range: (2, 8)The model is automatically integrated when running analysis:
# Run analysis (includes power predictions)
python3 scripts/analyze_results.py test_results/test_results_20231215_143022.json
# Output includes:
# 1. Standard analysis report
# 2. Power scalability predictions section
# 3. Measured vs predicted comparison table
# 4. CSV export with predicted_mean_power_w column==================================================================================================
POWER SCALABILITY PREDICTIONS
==================================================================================================
Model Type: LINEAR
Training Samples: 4
Stream Range: 1 - 8 streams
Predicted Power Consumption:
──────────────────────────────────────────────────────────────────────────────────────────────────
1 streams: 45.23 W
2 streams: 78.45 W
4 streams: 145.12 W
8 streams: 278.67 W
12 streams: 412.23 W
──────────────────────────────────────────────────────────────────────────────────────────────────
MEASURED vs PREDICTED COMPARISON
──────────────────────────────────────────────────────────────────────────────────────────────────
(Shows model fit quality on training data)
Streams Measured (W) Predicted (W) Diff (W)
──────────────────────────────────────────────────────────────────────────────────────────────────
1 45.00 45.23 +0.23
2 80.00 78.45 -1.55
4 150.00 145.12 -4.88
8 280.00 278.67 -1.33
──────────────────────────────────────────────────────────────────────────────────────────────────
The predicted_mean_power_w column is added to the analysis CSV:
name,bitrate,resolution,fps,duration,mean_power_w,predicted_mean_power_w,...
"2 Streams @ 2500k",2500k,1280x720,30,60.0,80.0,78.45,...
"4 Streams @ 2500k",2500k,1280x720,30,60.0,150.0,145.12,...
"8 Streams @ 1080p",5000k,1920x1080,30,60.0,280.0,278.67,...Column Meaning:
mean_power_w: Actual measured power from Prometheus/RAPLpredicted_mean_power_w: Model prediction based on stream count
- Consistent Hardware: Same CPU, RAM, cooling across all measurements
- Consistent Configuration: Same FFmpeg preset, codec, quality settings
- Stream Count Primary Factor: Assumes power scales mainly with stream count
- Stable Environment: Constant ambient temperature, no thermal throttling
Different Codecs: H.264 vs H.265 vs AV1 have different power profiles Different Resolutions: 720p vs 1080p vs 4K per stream Different Bitrates: 2500k vs 5000k per stream Different Presets: ultrafast vs medium vs slow Ambient Temperature: Heat affects CPU frequency and power Power Management: Governor settings (performance vs powersave) Background Load: Other processes competing for CPU Turbo Boost State: Enabled vs disabled NUMA Effects: Multi-socket systems with non-uniform memory access
Small Datasets: < 3 training points Extrapolation: Predicting > 2x max training stream count Heterogeneous Data: Mixed codecs, resolutions, or settings Thermal Throttling: Training data includes throttled measurements Inconsistent Measurements: Wide variance in power readings Low R² Score: < 0.70 indicates poor model fit
Scenario: Determine how many concurrent streams a server can handle within power budget.
predictor = PowerPredictor()
predictor.fit(benchmark_scenarios)
# Power budget: 300W
max_power = 300.0
for streams in range(1, 20):
predicted = predictor.predict(streams)
if predicted > max_power:
print(f"Max streams within {max_power}W: {streams - 1}")
breakScenario: Estimate monthly energy costs for different workload sizes.
# Predict power for target workload
streams = 10
power_watts = predictor.predict(streams)
# Calculate monthly energy
hours_per_month = 730
kwh_per_month = (power_watts * hours_per_month) / 1000
# Calculate cost (assuming $0.12/kWh)
cost_per_month = kwh_per_month * 0.12
print(f"{streams} streams: {kwh_per_month:.2f} kWh/month = ${cost_per_month:.2f}")Scenario: Identify safe operating limits before load testing.
# Check predicted power at different scales
for streams in [4, 8, 12, 16]:
power = predictor.predict(streams)
print(f"{streams} streams → {power:.0f}W")
if power > 250: # Server cooling limit
print(f" Exceeds thermal capacity")Scenario: Determine PDU (Power Distribution Unit) requirements for data center.
# Calculate total rack power for 10 servers
servers = 10
streams_per_server = 8
power_per_server = predictor.predict(streams_per_server)
total_rack_power = power_per_server * servers
print(f"Total rack power: {total_rack_power:.0f}W")
print(f"Required PDU capacity: {total_rack_power * 1.2:.0f}W (20% headroom)")-
Check R² Score: Should be > 0.70 for reliable predictions
# Logged automatically during training # INFO:root:PowerPredictor trained on 5 data points, R² = 0.9234
-
Review Comparison Table: Differences should be small
Streams Measured (W) Predicted (W) Diff (W) 2 80.00 78.45 -1.55 ✓ Good 4 150.00 145.12 -4.88 ✓ Good 8 280.00 320.50 +40.50 ✗ Poor -
Cross-Validation: Hold out some data points
# Train on subset train_scenarios = scenarios[:-2] predictor.fit(train_scenarios) # Test on held-out data test_scenarios = scenarios[-2:] for scenario in test_scenarios: streams = predictor._infer_stream_count(scenario['name']) predicted = predictor.predict(streams) actual = scenario['power']['mean_watts'] error = abs(predicted - actual) / actual * 100 print(f"{scenario['name']}: {error:.1f}% error")
Collect More Data:
- Add scenarios with different stream counts
- Fill gaps in stream count range
- Add replicate measurements for averaging
Ensure Data Quality:
- Verify stable power readings (low stddev)
- Check for thermal throttling during tests
- Confirm consistent test duration (60+ seconds)
- Validate RAPL measurements are accurate
Consider Polynomial Model:
- Collect ≥ 6 unique stream counts
- Model will automatically switch to polynomial
- Better captures non-linear scaling effects
Standardize Test Conditions:
- Same FFmpeg preset across all tests
- Same resolution and bitrate per stream
- Same ambient temperature
- Same power management settings
Issue: predictor.fit() returns False
Causes:
- No scenarios with valid power data
- No scenarios with parseable stream counts
- All scenarios filtered out
Solutions:
# Debug: Check what data was extracted
predictor = PowerPredictor()
for scenario in scenarios:
streams = predictor._infer_stream_count(scenario['name'])
power = scenario.get('power', {}).get('mean_watts')
print(f"{scenario['name']}: streams={streams}, power={power}")Issue: Large differences between measured and predicted
Causes:
- Low R² score (< 0.70)
- Non-linear effects in data but using linear model
- Inconsistent measurements in training data
- Extrapolating far beyond training range
Solutions:
- Collect more training data (aim for 6+ unique stream counts)
- Check for outliers in training data
- Review test conditions for consistency
- Avoid predictions > 2x max training stream count
Issue: Model predicts negative power
This should not happen - predictions are clamped to non-negative values. If you see this, it's a bug.
numpy>=1.20.0 # Array operations, linear algebra
scikit-learn>=1.3.0 # Machine learning (LinearRegression, PolynomialFeatures)PowerPredictor:
__init__(): Initialize empty modelfit(scenarios): Train on scenario datapredict(streams): Predict power for N streamsget_model_info(): Get model metadata_infer_stream_count(name): Parse stream count from name
sklearn Components:
LinearRegression: OLS regression modelPolynomialFeatures(degree=2): Feature transformation for quadratic terms
- Model implementation:
advisor/modeling.py - Integration:
analyze_results.py - Tests:
tests/test_modeling.py - Documentation:
docs/power-prediction-model.md(this file)
- Multi-Variable Models: Incorporate resolution, bitrate, codec
- Time-Series Predictions: Account for thermal buildup over time
- Confidence Intervals: Provide prediction uncertainty ranges
- GPU Power Modeling: Extend to NVIDIA/AMD GPU transcoding
- Ensemble Models: Combine multiple models for robustness
- Automated Hyperparameter Tuning: Optimize polynomial degree
- Feature Selection: Identify most predictive variables
- Cross-Platform Validation: Test on different CPU architectures
To improve the model:
- Collect diverse training data (different workloads, hardware)
- Document any prediction errors or limitations discovered
- Suggest additional features or variables to incorporate
- Share R² scores and model performance metrics
- RAPL (Running Average Power Limit): Intel's power measurement interface
- Ordinary Least Squares (OLS): Statistical method for linear regression
- scikit-learn Documentation: https://scikit-learn.org/stable/modules/linear_model.html
- Polynomial Regression: https://en.wikipedia.org/wiki/Polynomial_regression
- R² Score: https://en.wikipedia.org/wiki/Coefficient_of_determination
This component follows the same license as the main ffmpeg-rtmp project.
For questions or issues related to the power prediction model:
- Open an issue on GitHub repository
- Include your R² score and training data characteristics
- Provide example predictions showing unexpected behavior