Skip to content

feat: OpenCDA-MARL v1.0.0 - Multi-Agent RL framework#3

Merged
lgcyaxi merged 65 commits intomainfrom
docs.API
Jan 30, 2026
Merged

feat: OpenCDA-MARL v1.0.0 - Multi-Agent RL framework#3
lgcyaxi merged 65 commits intomainfrom
docs.API

Conversation

@lgcyaxi
Copy link
Collaborator

@lgcyaxi lgcyaxi commented Jan 30, 2026

Complete MARL extension for OpenCDA with:

  • 5 RL algorithms (TD3, DQN, Q-Learning, MAPPO, SAC)
  • Custom MARLEnv with CARLA integration
  • SUMO pre-training pipeline for 10-80x speedup
  • PySide6 Qt dashboard for real-time visualization
  • Comprehensive documentation site (MkDocs Material)

…tion

- Add SUMO_HOME environment variable to pixi.toml for SUMO integration
- Add missing carla_version parameter to all CoScenarioManager instances
- Fix UnboundLocalError in cosim scenario cleanup by initializing variables
- Update platoon_joining_2lanefree_cosim, single_2lanefree_cosim, single_town05_cosim, and single_town06_cosim
- Add null checks in finally blocks to prevent errors during cleanup
…ipeline

Implementation Summary:
- SumoMARLEnv: SUMO-only MARL environment using TraCI API
- MARLCoordinator: Added SUMO mode support with dual-mode initialization
- SUMO network assets: Converted intersection XODR to SUMO format
- Configuration files: SUMO training and CARLA fine-tuning configs
- Complete documentation and usage guide

Key Features:
- 10-80x faster training in SUMO (CPU-only, no GPU needed)
- Identical 7D observation space for seamless transfer learning
- Checkpoint-based policy transfer from SUMO to CARLA
- Scalable to 50+ agents in SUMO phase (vs 10 max in CARLA)

Performance Benefits:
- Training time: 5-7 days -> 1.5 days (900 SUMO + 100 CARLA episodes)
- Agent scalability: 10 -> 50+ agents
- Memory usage: Reduced (2GB RAM vs 8-12GB VRAM)

Files Added:
- opencda_marl/envs/sumo_marl_env.py (550 lines)
- configs/marl/intersection_sumo.yaml
- configs/marl/intersection_finetune.yaml
- scripts/convert_xodr_to_sumo.py
- opencda/assets/intersection_sumo/* (network files)
- docs/sumo_marl_training.md
- SUMO_IMPLEMENTATION_SUMMARY.md

Files Modified:
- opencda_marl/coordinator.py (SUMO mode detection)
- opencda_marl/envs/__init__.py (export SumoMARLEnv)

Usage:
  pixi run python opencda.py -t intersection_sumo --marl
  pixi run python opencda.py -t intersection_finetune --marl
- Reduce noise by using debug-level logging for SUMO default vtypes
- Fix FutureWarning in XML parsing by using './/net-file' instead of '//net-file'
- Fix sumolib import to use direct import instead of traci.sumolib
- Add --start flag to SUMO GUI for non-blocking auto-start
- Add DEFAULT_VEHTYPE mapping to vtypes.json for better compatibility
…gration

- Move SUMO intersection assets from opencda/ to opencda_marl/ for better organization
- Update config path to point to new MARL-specific SUMO asset location
- Enable SUMO GUI by default with auto-start for better debugging visibility
- Initialize coordinator states from config in SUMO mode
- Add episode metric methods to SumoMARLEnv for evaluation compatibility
- Fix route edges in conversion script to use valid edge IDs
- Fix agent ID handling to support both string (SUMO) and int (CARLA) types
- Add pretrained flag to TD3Algorithm for checkpoint loading
- Fix MARL manager update call with correct parameter names
- Add collision and success tracking fields to TrainingMetrics
- Fix StepEvent structure to use vehicle_id and event_id consistently
- Add SUMO GUI auto-start flag in environment initialization
- Fix metrics update to use update_step instead of accumulate_reward
- Add traffic flows to SUMO route file for automated vehicle spawning
- Fix coordinator to skip scenario manager reset in SUMO-only mode
- Create SUMO adapter layer (SumoWorld, SumoMARLPlanner) to bridge CARLA-based TrafficManager
- Implement SumoVehicleSpawner for dynamic vehicle spawning via TraCI
- Add traffic manager initialization in SumoMARLEnv with mode selection (live/replay/static)
- Update with_z utility to support both CARLA and SUMO mock transforms
- Add debug logging to traffic manager event generation
- Generate comprehensive route file with 12 routes (4 directions × 3 turns) matching CARLA patterns
- Configure network conversion to disable traffic lights matching CARLA setup
- Update intersection config to support replay mode and active junction selection
- Disable all U-turns in network generation to prevent routing loops
- Keep internal junction links for realistic vehicle routing
- Fix intersection center detection to use junction 4 (main intersection)
- Update network file with dead_end junctions for edge boundaries
- Remove unnecessary turnaround connections at junction edges
- Add coordinate transformation layer to convert between CARLA and SUMO coordinate systems
- Apply network offset (99.80, 100.00) in SumoWaypoint, SumoVehicleSpawner, and SumoMARLEnv
- Fix waypoint position consistency across lanes for proper spawn point distribution
- Improve get_previous method to interpolate along lane shapes
- Update intersection center calculation to use CARLA coordinates
- Fix checkpoint saving to use MARLManager's built-in method
- Add SUMO-specific replay recording file with correct coordinate mapping
- Remove custom vehicle type creation logic
- Use SUMO's built-in DEFAULT_VEHTYPE for all spawned vehicles
- Remove _ensure_vehicle_types and _vtype_exists methods
- Add _carla_blueprint_to_sumo_type converter that maps to DEFAULT_VEHTYPE
- Improves compatibility and reduces complexity
- Add planner configuration section to SUMO config with distance and offset parameters
- Implement _find_compatible_edges to find edge pairs that can form valid routes
- Add _get_candidate_edges to get sorted list of edges near coordinates
- Filter out opposite-direction edge pairs to prevent routing failures
- Reduce logging noise by using debug level for route computation failures
- Improve spawn success rate by checking edge compatibility before spawning
- Add traffic manager reset call in episode reset flow
- Ensures traffic manager state is properly cleared between episodes
- Prevents event queue buildup across episodes
- Change collision action from 'warn' to 'teleport' to remove colliding vehicles
- Add collision.mingap-factor parameter for gap-based collision detection
- Improves training realism by removing crashed vehicles from simulation
- Better prepares agents for CARLA transfer learning where collisions are terminal
- Fix transition storage timing: use current observations instead of
  previous_observations to match rewards and last_actions keys
- Add CUDA memory cleanup in reset_episode() and update() methods
- Add garbage collection every few episodes to prevent memory leaks
- Clear last_actions on episode reset to prevent stale data
- Add debug logging for transition storage diagnostics
…ismatch

- Move TensorBoard logging infrastructure to BaseAlgorithm for shared use
  across all RL algorithms (TD3, DQN, Q-Learning)
- Add configurable tensorboard section to configs/marl/default.yaml
- Add `tensorboard` and `tb` pixi tasks for quick visualization start
- Fix key type mismatch between CARLA (int keys) and extractor (string keys)
- Add _merge_shared_config to MARLManager for config inheritance
- DQN: Add gradient clipping, TensorBoard logging, CUDA cache cleanup
- DQN: Add explicit tensor cleanup and periodic garbage collection
- Q-Learning: Add TensorBoard logging for Q-values and epsilon
- Add gradient norm tracking (pre/post clipping) to TD3 and DQN
- Add reward moving average, variance, and std to BaseAlgorithm
- Add convergence detection based on coefficient of variation
- Add episode length trend tracking
- Add get_learning_statistics() API for paper reporting
- New TensorBoard metrics under Learning/ and Gradients/ categories
- Add speed tracking per step and episode in TrainingMetrics
- Track avg_speed, speed_variance, speed_smoothness metrics
- Add per-agent speed consistency tracking
- Log traffic metrics to TensorBoard (Traffic/ category)
- Add get_traffic_statistics() API for paper reporting with trend analysis
- Pass observations through env to metrics for speed extraction
…optimizations

- Rewrite SmartReplayBuffer and PrioritizedReplayBuffer with numpy arrays for O(1) sampling
- Replace O(N) deque operations with circular buffer indexing
- Fix TrainingMetrics unbounded list growth with deque(maxlen=100) and file export
- Optimize gradient norm computation to single GPU-CPU sync
- Increase CUDA cleanup frequency (50 steps vs 500)
- Reduce TensorBoard flush frequency (every 10 episodes)
- Use torch.from_numpy() for efficient tensor construction
- Fix deque slice notation bug in metrics
- Add traffic_metrics parameter to TD3.log_episode_metrics
- Remove redundant episode_length metrics (fixed simulation length)
- Fix actor gradient logging to only log when actor updates
- Fix max_speed=0 bug by using finish_episode() before reset
- Improve MARL convergence detection with success/collision tracking
- Add tensorboard dependency to pixi.toml
Introduces a WorldResetManager to periodically reload the CARLA world and prevent server-side memory accumulation that causes simulation slowdown. Adds configuration options for reset frequency and automatic performance-based resets in default.yaml, integrates the manager into the coordinator, and ensures all training state is preserved across resets. Updates __init__.py to expose the new manager.
Introduces per-episode exploration noise decay for the TD3 algorithm, configurable via new parameters in the YAML configs (noise_decay, min_exploration_noise). Updates logging to track noise decay in TensorBoard and logs periodic decay info. Cleans up and consolidates intersection scenario configs by removing redundant files and updating config structure for MARL training.
Expanded the MARL state space to 44 dimensions by adding normalized features for up to 5 nearby vehicles (relative position, velocity, heading difference, distance, and TTC). Updated the reward function to include proactive TTC-based safety penalties and dense progress-toward-goal rewards. Adjusted configuration, vehicle adapter, extractor, and environment logic to support these Phase 3 enhancements, and ensured compatibility by requiring fresh training runs.
Introduces a stop penalty mechanism in the reward function to discourage hard stops and encourage gradual deceleration, configurable via YAML and environment defaults. Refactors nearby vehicle detection parameters to be configurable from the MARL config. Adds tracking and logging of target (commanded) speeds in metrics and TensorBoard, enabling comparison between RL output and actual vehicle speeds. Also fixes a bug where episode success and collision counters were not updated from episode states.
Updated MARLAgent to override the run_step method, ensuring that when a target_speed is provided by RL, it is used directly for speed control, bypassing the default BasicAgent behavior. Falls back to VanillaAgent's behavior when target_speed is None. Path and steering remain controlled by the local planner.
Added LayerNorm before tanh in the TD3 Actor to prevent gradient vanishing and improved learning stability. Increased default learning rates for actor and critic to accelerate training with the new normalization. Enhanced vehicle adapter to track current speed as target during warmup, and updated metrics to avoid double-counting agents and filter out uninitialized target speeds.
Introduces a fixed MARL PID controller and control manager to ensure correct longitudinal control gains are used, addressing a bug in the original OpenCDA controller. Updates the vehicle adapter to use the new controller for MARL scenarios, modifies the VanillaAgent to respect RL-provided target speeds, and enhances traffic metrics logging and tracking to better reflect RL commands and actual vehicle performance. Also reduces max_episodes in the intersection config for faster testing.
Deleted the MARL-specific actuation module, including marl_pid_controller, marl_control_manager, and their __init__.py. Updated vehicle_adapter.py to remove references to the deleted actuation code. Improved agent_manager.py to handle both int and string actor IDs for target speed mapping. Enhanced TD3Algorithm to skip warmup if a pretrained model is loaded. Added RL-commanded target speed metrics to episode logging in marl_manager.py.
Corrects speed observation units to km/h in vehicle_adapter.py for consistency with target speeds. Fixes PID controller to use longitudinal error buffer and gains for longitudinal control in pid_controller.py. Updates min_action in intersection.yaml and max_action in TD3 algorithm to match configuration.
Adds pending spawn tracking for accurate throughput metrics and updates evaluation logic to use the step when all vehicles have completed. Ensures local planner's run_step is called in both MARLAgent and VanillaAgent when RL provides target_speed, preventing vehicles from targeting stale waypoints. Changes debug logging in TD3 to use logger.debug instead of logger.info. VehicleAdapter now assumes agent speed is already in km/h.
JiahaoTang and others added 28 commits December 2, 2025 08:49
Improves TD3 exploration by scaling noise and speed bias based on distance to intersection, and updates config parameters for more robust exploration and experience diversity. Refactors TTC-based safety reward to use a smooth exponential penalty for better gradient learning, and adds a yielding bonus to encourage cooperative behavior. Updates YAML configs to reflect new reward and exploration strategies.
Introduces logging and visualization of relative improvement metrics (success rate, collision rate, throughput) for long training runs, including moving averages, deltas, and time-to-target milestones. Episode length is now tracked as the step when the last vehicle completes, improving throughput calculation accuracy. Evaluation and plotting modules generate and save comprehensive improvement reports and paper-ready figures.
Switched traffic mode to 'live', updated replay file and base speed, and reduced flow rates in default.yaml. Added new traffic recordings for 300 and 400 vph, renamed 500 vph recording, and deleted TD3 checkpoint files.
Improved sensor and vehicle destruction logic to avoid garbage collection warnings and ensure proper cleanup. Disabled TTC-based safety reward in MARL to prevent overly conservative driving, retaining TTC tracking for metrics only. Added clearance speed bonus to encourage faster driving when safe, adjusted yielding reward parameters, and updated exploration noise scaling in TD3. Increased max_episodes in default config and updated reward parameters in td3.yaml for better training dynamics.
Introduces the MAPPO (Multi-Agent Proximal Policy Optimization) algorithm for cooperative multi-agent learning, including a detailed configuration file, implementation of the MAPPO algorithm, and a rollout buffer supporting GAE. Updates the algorithm package and manager to support MAPPO selection and execution. Configuration and reward structure are aligned for fair comparison with TD3.
Introduces a feature configuration section for MAPPO in the YAML config, mirroring TD3 for fair comparison. Updates the ObservationExtractor to support MAPPO by using the new feature config and ensuring multi-agent observation extraction for both TD3 and MAPPO.
Refactors vehicle and sensor destruction logic across agent, safety, and world reset managers to use a phased cleanup sequence, reducing risk of CARLA Signal 11 crashes. Adds explicit stopping of sensor callbacks, world ticks between phases, and robust error handling. Introduces a custom MARLCollisionSensor with active flag and safe destroy method, and updates MARLSafetyManager to use it and expose direct references for cleanup.
Introduces a stop() method to MARLCollisionSensor for safe, idempotent sensor shutdown. Updates MARLAgentManager to use this method when stopping sensors, and generalizes sensor shutdown logic to handle all sensors in the safety manager's list, improving robustness and code clarity.
Refactors destruction logic to safely stop and destroy sensors before vehicle actors, preventing CARLA Signal 11 crashes. Adds explicit stop methods to sensors, ensures proper callback handling, and updates agent and vehicle manager cleanup routines for robustness.
Replaces the SafetyManager with a standalone MARLSafetyManager to prevent duplicate sensor creation and orphaned sensors. Ensures the old safety manager's sensors are destroyed before instantiating the new manager, improving lifecycle management and eliminating 'sensor went out of scope' warnings. Adds direct sensor management and updates sensor initialization and destruction logic.
Introduced Soft Actor-Critic (SAC) algorithm implementation and configuration for multi-agent intersection management, including auto-entropy tuning and LSTM context encoding. Added a simplified TD3 configuration (td3_simple.yaml) and updated DQN and MAPPO configs to use an 8D state space for fair comparison. Reduced traffic flow rates in default.yaml for all directions. Updated algorithm registry and MAPPO to support warmup steps for stable learning.
Remove redundant safety manager replacement that caused orphaned sensors.
The _use_marl_safety_manager() was creating duplicate sensors leading to
"streaming client: connection failed" errors.

Changes:
- Remove MARLSafetyManager replacement in vehicle_adapter
- Use default SafetyManager from VehicleManager directly
- Simplify sensor destroy methods across all managers
- Remove over-engineered cleanup phases from agent_manager
- Restore simple destroy() patterns in VehicleManager
Switched traffic mode to 'replay' and updated replay file in default.yaml; reduced traffic flow rates from 300 to 200 vph. Set DQN training_mode to false and added checkpoint loading. Improved CARLA resource cleanup by adding world.tick() after cleanup phases in agent_manager, world_reset_manager, and scenario_manager to ensure GPU memory is released. Added new traffic recording files and model checkpoints.
Adjusted traffic flow rates and mode in default.yaml, updated checkpoint paths and training modes in DQN, SAC, and TD3 configs, and tuned network architecture and hyperparameters in MAPPO and TD3_simple for improved stability and exploration. These changes align training settings and model parameters for more consistent multi-agent reinforcement learning experiments.
Introduced tracking and reporting of timeout vehicles in both training metrics and evaluation statistics. Updated success, collision, and total vehicle calculations to include timeouts, and extended logging and statistical summaries to reflect timeout rates and counts.
Added latest_checkpoint.pth files for multiple models in the checkpoints directory. Updated .gitignore to better handle checkpoint files, ensuring latest_checkpoint.pth files are tracked. Modified mappo.yaml and sac.yaml configs to use a unified checkpoints directory and set appropriate load_checkpoint paths.
Added timeout statistics to evaluation plots, including timeouts in the final results text box and pie chart. Updated td3_simple.yaml to disable training mode and load a specific checkpoint. Added new checkpoint file for TD3 model.
Introduces a new TD3 configuration with a corrected 43D state space (td3_43d.yaml), adds an improved lightweight TD3 simple v3 config (td3_simple_v3.yaml), and renames the previous td3.yaml to td3_44d.yaml for clarity. These changes improve configuration clarity and provide enhanced options for MARL experiments.
Introduces td3_simple_v4.yaml with rebalanced rewards to address slow speed in v3. Updates td3_simple_v3.yaml to use the new v3 checkpoint and disables training mode. Refactors import statements in rollout_buffer.py and smart_replay_buffer.py to remove unused types, and comments out unused episode length computation in base_algorithm.py. Also renames and adds new checkpoint files for TD3 experiments.
- Bump version from v0.1.0-alpha to v1.0.0 stable across all docs
- Remove map_adapter.md (no implementation in codebase)
- Move SUMO training guide to docs/marl/ and fix inaccuracies
  (config paths, 9D observation space, CLI commands, asset paths)
- Redesign about page with MkDocs Material grid cards
- Update mkdocs.yml nav for all changes
@lgcyaxi lgcyaxi merged commit 2d6f8d2 into main Jan 30, 2026
2 checks passed
@lgcyaxi lgcyaxi deleted the docs.API branch January 30, 2026 23:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants