Skip to content

CommandBuffer System #73

@csprance

Description

@csprance

Feature Proposal: CommandBuffer System

Target Version: v6.8.0


Problem

Systems currently need awkward workarounds to safely modify entities during iteration:

  1. Backwards iteration: for i in range(entities.size() - 1, -1, -1)
  2. Defensive snapshots: var snapshot = entities.duplicate() (O(N) memory overhead)
  3. Limited batching: Existing add_components() only batches per-entity, not across entities

These patterns are error-prone, harder to read, and miss optimization opportunities for bulk operations.


Proposed Solution

Add a CommandBuffer system that queues structural changes for deferred execution:

# Current pattern
for i in range(entities.size() - 1, -1, -1):
    if should_delete(entities[i]):
        ECS.world.remove_entity(entities[i])

# With CommandBuffer
for entity in entities:
    if should_delete(entity):
        cmd.remove_entity(entity)  # Queued for later
# Auto-executes after system completes

Architecture

Core Components

  1. CommandBuffer class (addons/gecs/ecs/command_buffer.gd)

    • Queues operations: add_component(), remove_component(), add_entity(), remove_entity()
    • Batches by entity: Groups all operations per-entity before execution
    • Validates entities: Skips freed entities with is_instance_valid()
    • Leverages existing batch APIs: Uses entity.add_components(), world.add_entities(), etc.
  2. System integration (addons/gecs/ecs/system.gd)

    • Add cmd: CommandBuffer property to all systems
    • Add @export_enum("PER_SYSTEM", "END_OF_FRAME") var command_buffer_flush_mode
    • Auto-flush after system processing based on mode
  3. World integration (addons/gecs/ecs/world.gd)

    • Flush END_OF_FRAME buffers after all systems in group complete

Key Design Decisions

1. Flush Timing Modes

PER_SYSTEM (default):

  • Execute commands immediately after each system completes
  • Maintains same-frame visibility for dependent systems
  • Lower performance gain but safer default

END_OF_FRAME:

  • Execute commands after all systems in group complete
  • Maximum batching performance (10-50x faster for bulk ops)
  • Requires careful ordering to avoid same-frame dependency issues

2. Integration with Existing Optimizations

Preserve current optimizations:

  • Use existing _should_invalidate_cache flag during batch execution
  • Call existing entity.add_components() / entity.remove_components()
  • Call existing world.add_entities() / world.remove_entities()
  • Leverage archetype edge caching

New cross-entity batching:

  • Current: Per-entity batching (add multiple components to one entity)
  • CommandBuffer: Cross-entity batching (add components to multiple entities in one operation)

3. Command Grouping Strategy

Group commands by entity before execution:

Commands: [add(E1, C_A), add(E2, C_B), add(E1, C_C), remove(E1, C_D)]
Grouped:  E1: {add: [C_A, C_C], remove: [C_D]}, E2: {add: [C_B]}
Execute:  E1.add_components([C_A, C_C]) + E1.remove_components([C_D])  # 1 archetype move
          E2.add_components([C_B])  # 1 archetype move

This reduces archetype transitions from O(commands) to O(entities).

4. Observer Compatibility

Current behavior: Observers trigger immediately when components change
With CommandBuffer: Observers trigger when commands execute (end of system or end of frame)

This is acceptable because:

  • Observers already work with batched operations (add_components)
  • PER_SYSTEM mode keeps timing close to current behavior
  • Documentation should clarify timing expectations

Use Cases & Performance Impact

High-Impact Scenarios

Use Case Current With CommandBuffer Expected Speedup
Huge Explosions (100+ entity deletions) 100 operations 1 batch 10-50x
State transitions (remove + add) 2 archetype moves/entity 1 move/entity 2x
Collision resolution (bulk removals) N operations + snapshot copy 1 batch, no copy 5-10x
Wave spawning (100+ entities) 100 operations 1 batch 10-20x

Low-Impact Scenarios

Small operations (1-5 entities): -5% to 0% (slight buffering overhead acceptable)


API Design

Basic Operations

cmd.add_component(entity, component)
cmd.remove_component(entity, component_type)
cmd.add_components(entity, [comp1, comp2])  # Batch per-entity
cmd.remove_components(entity, [type1, type2])
cmd.add_entity(entity)
cmd.remove_entity(entity)
cmd.add_relationship(entity, relationship)
cmd.remove_relationship(entity, relationship, limit)

Advanced Operations

cmd.add_custom(callable)  # For complex multi-step operations
cmd.execute()  # Manual execution (normally automatic)
cmd.clear()    # Discard queued commands

Implementation Phases

Phase 1: Core

  • Create CommandBuffer class with all command types
  • Implement execute() with entity grouping
  • Implement _group_by_entity() optimization
  • Write unit tests for basic operations

Phase 2: Integration

  • Add cmd property to System class
  • Add flush mode configuration
  • Add auto-flush logic to System._handle() and World.process()
  • Write integration tests for flush modes

Phase 3: Optimization

  • Add command object pooling (optional)
  • Add archetype-sorted execution (optional)
  • Write performance benchmarks
  • Validate 10x+ speedup targets

Phase 4: Examples & Docs

  • Update example systems to use CommandBuffer
  • Document when to use each flush mode
  • Add migration guide for existing patterns
  • Update CLAUDE.md

Trade-offs & Limitations

Advantages

✅ Cleaner API (forward iteration)
✅ No defensive snapshot overhead
✅ Cross-entity batching opportunities
✅ 10-50x speedup for bulk operations
✅ Reduced cache invalidations

Disadvantages

⚠️ Frame delay with END_OF_FRAME mode
⚠️ Observer timing changes (acceptable)
⚠️ Memory overhead for command objects (mitigated with pooling)
⚠️ Learning curve (two flush modes to understand)

Non-Goals

❌ Thread safety (single-threaded design)
❌ Transaction rollback (future enhancement)
❌ Replacing existing direct APIs (opt-in addition)


Open Questions

  1. Should we eventually deprecate backwards iteration patterns in favor of CommandBuffer?
  2. Should we add debug statistics (commands queued, execution time)?
  3. Should we warn when END_OF_FRAME mode is used with systems that have dependencies?
  4. Should command object pooling be implemented in Phase 3 or deferred to v7.1?

Success Metrics

  1. Performance: Bulk operations (100+ entities) show 10x+ speedup in benchmarks
  2. Code Quality: Example systems use forward iteration instead of backwards
  3. Memory: Reduced allocations from eliminating defensive snapshots
  4. Adoption: Community feedback on API clarity and usefulness

References

Existing code to study:

  • addons/gecs/ecs/entity.gd:166-216 - Existing batch operations
  • addons/gecs/ecs/world.gd:1047-1050 - Cache invalidation control
  • addons/gecs/ecs/system.gd:267,342 - Defensive snapshot patterns
  • addons/gecs/ecs/archetype.gd:211-228 - Edge caching system

Related features:

  • Archetype edge caching (preserves this optimization)
  • Batch cache invalidation flag (leverages this mechanism)
  • Observer system (timing changes, but compatible)
  • Relationship system (fully compatible)

End of Proposal

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions