Where to find the code:
- Header:
include/core/WorkerBudget.hpp - Implementation:
src/core/WorkerBudget.cpp
The WorkerBudget system provides adaptive batch optimization for parallel workloads in the Hammer Engine. It uses a sequential execution model where each manager gets ALL workers during its update window, combined with throughput-based hill-climbing to find optimal batch counts.
- Sequential Execution: Managers don't run concurrently - each gets full worker pool access
- Batch Optimization: WorkerBudget tunes batch COUNT, not worker allocation
- Adaptive Learning: Throughput-based hill-climbing converges to optimal for your hardware
- Queue Pressure: Automatic scaling when ThreadSystem is overloaded
The Hammer Engine uses a sequential manager update pattern:
GameEngine::update()
├── EventManager.update() ← ALL workers available
├── GameStateManager.update() ← ALL workers available
├── AIManager.update() ← ALL workers available
├── ParticleManager.update() ← ALL workers available
├── PathfinderManager.update() ← ALL workers available
└── CollisionManager.update() ← ALL workers available
Benefits:
- No contention: Managers don't compete for workers
- Full utilization: Each manager uses 100% of available workers
- Simpler design: No complex allocation percentages to tune
- Better cache locality: Workers process related work together
| Aspect | Old Model (Deprecated) | Current Model |
|---|---|---|
| Allocation | AI: 54%, Particles: 31%, Events: 15% | Each manager: 100% |
| Execution | Concurrent managers | Sequential managers |
| Tuning | Fixed percentages | Adaptive batch sizing |
| Complexity | Per-manager weights, buffer reserve | Just batch count tuning |
#include "core/WorkerBudget.hpp"
auto& budgetMgr = HammerEngine::WorkerBudgetManager::Instance();Returns the cached worker budget (total available workers).
const WorkerBudget& getBudget();
// Usage
auto budget = budgetMgr.getBudget();
size_t totalWorkers = budget.totalWorkers; // e.g., 7 on 8-core systemReturns optimal worker count for a system. With sequential execution, this returns ALL workers for any active workload.
size_t getOptimalWorkers(SystemType system, size_t workloadSize);
// Usage
size_t workers = budgetMgr.getOptimalWorkers(
HammerEngine::SystemType::AI,
entityCount); // Returns totalWorkers if entityCount > 0Behavior:
- Returns 0 if workloadSize is 0
- Returns all workers for any positive workload
- Scales back under critical queue pressure (>90%)
Returns optimal batch count and size based on workload and learned throughput.
std::pair<size_t, size_t> getBatchStrategy(
SystemType system,
size_t workloadSize,
size_t optimalWorkers);
// Usage
auto [batchCount, batchSize] = budgetMgr.getBatchStrategy(
HammerEngine::SystemType::AI,
10000, // 10K entities
7); // 7 workers
// Result might be: batchCount=8, batchSize=1250Algorithm:
- Base calculation:
batchCount = optimalWorkers - Apply learned multiplier:
batchCount *= multiplier(0.25x to 2.5x range) - Ensure minimum items per batch (8 items)
- Clamp to valid range
Reports execution metrics for adaptive tuning. Critical for learning!
void reportBatchCompletion(
SystemType system,
size_t workloadSize,
size_t batchCount,
double totalTimeMs);
// Usage
auto startTime = std::chrono::steady_clock::now();
// ... execute batches ...
auto elapsed = std::chrono::steady_clock::now() - startTime;
budgetMgr.reportBatchCompletion(
HammerEngine::SystemType::AI,
10000, // Processed 10K items
8, // In 8 batches
std::chrono::duration<double, std::milli>(elapsed).count());enum class SystemType : uint8_t {
AI = 0, // AIManager
Particle = 1, // ParticleManager
Pathfinding = 2,// PathfinderManager
Event = 3, // EventManager
Collision = 4, // CollisionManager
COUNT = 5
};Each system maintains its own learning state (multiplier, throughput history).
WorkerBudgetManager uses throughput-based hill-climbing to find optimal batch counts:
┌─────────────────────────────────────────────────────────────────┐
│ Hill-Climbing Algorithm │
├─────────────────────────────────────────────────────────────────┤
│ 1. Start: multiplier = 1.0 (batchCount = workerCount) │
│ │
│ 2. Measure: throughput = workloadSize / totalTimeMs │
│ │
│ 3. Smooth: smoothedThroughput = 0.88 * prev + 0.12 * current │
│ │
│ 4. Compare to previous smoothed throughput: │
│ - Better (>6%): continue in same direction │
│ - Worse (<-6%): reverse direction │
│ - Same (±6%): maintain position (dead band) │
│ │
│ 5. Adjust: multiplier += direction * 0.02 (2% step) │
│ │
│ 6. Clamp: multiplier ∈ [0.25, 2.5] │
└─────────────────────────────────────────────────────────────────┘
| Multiplier | Meaning | When Used |
|---|---|---|
| 0.25x | 4x consolidation (fewer, larger batches) | High per-batch overhead |
| 1.0x | Default (batchCount = workerCount) | Starting point |
| 2.5x | 2.5x expansion (more, smaller batches) | Very fast workers |
8-core system, 10,000 AI entities:
Frame 1: multiplier=1.0, batchCount=8, throughput=1200/ms
Frame 2: multiplier=1.02 (+), batchCount=8, throughput=1280/ms (better!)
Frame 3: multiplier=1.04 (+), batchCount=8, throughput=1310/ms (better!)
Frame 4: multiplier=1.06 (+), batchCount=9, throughput=1290/ms (worse)
Frame 5: multiplier=1.04 (-), batchCount=8, throughput=1305/ms (converged)
Frame 6+: stable around multiplier=1.04, batchCount=8
struct BatchTuningState {
static constexpr float MIN_MULTIPLIER = 0.25f; // Allow 4x consolidation
static constexpr float MAX_MULTIPLIER = 2.5f; // Allow 2.5x expansion
static constexpr float ADJUST_RATE = 0.02f; // 2% adjustment per frame
static constexpr double THROUGHPUT_TOLERANCE = 0.06; // 6% dead band
static constexpr double THROUGHPUT_SMOOTHING = 0.12; // 12% smoothing weight
static constexpr size_t MIN_ITEMS_PER_BATCH = 8; // Minimum batch size
};WorkerBudgetManager monitors ThreadSystem queue utilization to prevent overflow:
static constexpr float QUEUE_PRESSURE_CRITICAL = 0.90f; // 90% threshold| Queue Utilization | Response |
|---|---|
| < 90% | Full workers, normal batching |
| ≥ 90% | Reduced workers, larger batches |
double queuePressure = getQueuePressure();
if (queuePressure >= QUEUE_PRESSURE_CRITICAL) {
// Scale back workers proportionally
size_t scaled = static_cast<size_t>(
totalWorkers * (1.0 - queuePressure)
);
return std::max(scaled, size_t(1));
}All managers follow this pattern:
void Manager::update(float deltaTime) {
size_t workloadSize = getActiveItemCount();
if (workloadSize == 0) return;
// 1. Get optimal configuration
auto& budgetMgr = HammerEngine::WorkerBudgetManager::Instance();
size_t optimalWorkers = budgetMgr.getOptimalWorkers(
HammerEngine::SystemType::MyType, workloadSize);
auto [batchCount, batchSize] = budgetMgr.getBatchStrategy(
HammerEngine::SystemType::MyType,
workloadSize,
optimalWorkers);
// 2. Execute batches
auto startTime = std::chrono::steady_clock::now();
for (size_t i = 0; i < batchCount; ++i) {
size_t start = i * batchSize;
size_t end = std::min(start + batchSize, workloadSize);
ThreadSystem::Instance().enqueueTask([this, start, end]() {
processBatch(start, end);
}, TaskPriority::High, "Manager_Batch");
}
// 3. Wait for completion
// ... (implementation-specific) ...
// 4. Report metrics for adaptive tuning
auto elapsed = std::chrono::steady_clock::now() - startTime;
budgetMgr.reportBatchCompletion(
HammerEngine::SystemType::MyType,
workloadSize,
batchCount,
std::chrono::duration<double, std::milli>(elapsed).count());
}- Add to enum in
WorkerBudget.hpp:
enum class SystemType : uint8_t {
AI = 0,
Particle = 1,
Pathfinding = 2,
Event = 3,
Collision = 4,
MyNewSystem = 5, // Add here
COUNT = 6 // Update count
};- Use in your manager:
size_t workers = budgetMgr.getOptimalWorkers(
HammerEngine::SystemType::MyNewSystem, workloadSize);| Component | Size |
|---|---|
| WorkerBudget struct | 8 bytes |
| BatchTuningState (per system) | ~64 bytes |
| Total (5 systems) | ~350 bytes |
| Operation | Time |
|---|---|
getBudget() (cached) |
< 10ns |
getOptimalWorkers() |
< 50ns |
getBatchStrategy() |
< 100ns |
reportBatchCompletion() |
< 200ns |
| Scenario | Frames to Converge |
|---|---|
| Stable workload | 10-20 frames |
| Changing workload | 20-50 frames |
| After restart | 10-20 frames |
// Queue pressure threshold
static constexpr float QUEUE_PRESSURE_CRITICAL = 0.90f;
// BatchTuningState constants
static constexpr float MIN_MULTIPLIER = 0.25f; // Allow 4x consolidation
static constexpr float MAX_MULTIPLIER = 2.5f; // Allow 2.5x expansion
static constexpr float ADJUST_RATE = 0.02f; // 2% adjustment per frame
static constexpr double THROUGHPUT_TOLERANCE = 0.06; // 6% dead band
static constexpr double THROUGHPUT_SMOOTHING = 0.12; // 12% smoothing weight
static constexpr size_t MIN_ITEMS_PER_BATCH = 8; // Minimum batch size| Constant | Increase When | Decrease When |
|---|---|---|
ADJUST_RATE |
Slow convergence | Oscillation |
THROUGHPUT_TOLERANCE |
Too much jitter | Slow adaptation |
THROUGHPUT_SMOOTHING |
Too much noise | Slow response |
MIN_ITEMS_PER_BATCH |
Tiny batches inefficient | Large batches underutilize |
WorkerBudgetManager is fully thread-safe:
- Atomics: All mutable state uses
std::atomic - Double-Checked Locking: Cache invalidation uses proper synchronization
- No Locks in Hot Path:
getBatchStrategy()andgetOptimalWorkers()are lock-free
// Safe to call from any thread
auto& budgetMgr = HammerEngine::WorkerBudgetManager::Instance();
size_t workers = budgetMgr.getOptimalWorkers(SystemType::AI, 1000);Symptom: Low items/ms reported
Cause: Not calling reportBatchCompletion()
Fix: Ensure every batch execution reports metrics
Symptom: Batch count changes every frame
Cause: Noisy workload or timing
Fix: Increase THROUGHPUT_TOLERANCE or THROUGHPUT_SMOOTHING
Symptom: Tasks queuing up, high latency Cause: Submitting too many small batches Fix: System automatically handles via queue pressure detection
Symptom: Takes too long to find optimal
Cause: ADJUST_RATE too small
Fix: Increase ADJUST_RATE (with caution - may cause oscillation)
- ThreadSystem - Thread pool and task submission
- AIManager - AI system WorkerBudget integration
- CollisionManager - Collision threading
- ParticleManager - Particle threading