Author Contact: [email protected]
Diffusion Policy is a powerful behavior cloning method that models action sequences using diffusion models. This implementation includes receding horizon optimization and supports both MLP and U-Net architectures.This directory contains the implementation of Diffusion Policy for robotic manipulation tasks.The basic envs for the robotarm on pybullet by reinforcement learning refers to https://github.com/Shimly-2/DRL-on-robot-arm.git.
- Receding Horizon Optimization: Predicts action sequences and executes them step by step
- Flexible Network Architecture: Supports both MLP and U-Net architectures
- Multi-modal Inputs: Supports both vector states and image observations
- HER Integration: Compatible with Hindsight Experience Replay
- Detailed Logging: Comprehensive denoising process visualization
python main.py train_reach_with_DiffusionPolicy \
--num_diffusion_steps=100 \
--num_inference_steps=50 \
--diffusion_lr=0.0003 \
--horizon_steps=16 \
--action_horizon=8
python main.py train_reach_with_DiffusionPolicy_CNN \
--network_type=unet \
--prediction_type=epsilon \
--beta_schedule=squaredcos_cap_v2 \
--horizon_steps=16 \
--action_horizon=8
Parameter | Default | Description |
---|---|---|
num_diffusion_steps |
100 | Number of forward diffusion steps (noise addition) |
num_inference_steps |
50 | Number of reverse inference steps (denoising) |
diffusion_lr |
0.0003 | Learning rate for diffusion model |
beta_schedule |
"squaredcos_cap_v2" | Noise schedule: "linear" or "squaredcos_cap_v2" |
prediction_type |
"epsilon" | Prediction target: "epsilon" (noise) or "sample" (clean action) |
ema_decay |
0.995 | Exponential moving average decay for model parameters |
Parameter | Default | Description |
---|---|---|
horizon_steps |
16 | Prediction horizon length, how many steps to predict at once |
action_horizon |
8 | Number of actions to actually execute, typically half of horizon_steps |
Parameter | Default | Description |
---|---|---|
network_type |
"mlp" | Network type: "mlp" or "unet" |
clip_sample |
True | Whether to clip predicted samples |
# Reduce inference steps for faster execution
python main.py train_reach_with_DiffusionPolicy \
--num_diffusion_steps=50 \
--num_inference_steps=20 \
--horizon_steps=8 \
--action_horizon=4
# Increase steps for better quality
python main.py train_reach_with_DiffusionPolicy \
--num_diffusion_steps=200 \
--num_inference_steps=100 \
--horizon_steps=32 \
--action_horizon=16
# CNN version for camera observations
python main.py train_reach_with_DiffusionPolicy_CNN \
--network_type=unet \
--num_diffusion_steps=100 \
--num_inference_steps=50 \
--diffusion_lr=0.0001
- Principle: Predict action sequences of length
horizon_steps
, but only execute the firstaction_horizon
actions - Advantage: Provides long-term planning while maintaining real-time execution
- Configuration: Typically set
action_horizon = horizon_steps // 2
- MLP: Suitable for low-dimensional vector inputs, fast training
- U-Net: Suitable for high-dimensional image inputs, better feature extraction
- Linear: Simple linear noise increase, suitable for quick experiments
- Cosine: More gradual noise increase, typically yields better results
- Epsilon: Predict noise, more stable training
- Sample: Predict clean actions directly, more intuitive
The system automatically logs the following metrics:
diffusion_loss
: Training lossavg_return
: Average episode returnsuccess_rate
: Task success rate- Detailed denoising process logs
-
Memory Issues
- Reduce
horizon_steps
andaction_horizon
- Use smaller batch sizes
- Reduce
-
Training Instability
- Try
prediction_type="epsilon"
- Reduce learning rate
- Increase
ema_decay
- Try
-
Slow Inference
- Reduce
num_inference_steps
- Use
network_type="mlp"
for simple tasks
- Reduce
- Use MLP for vector inputs
- Reduce diffusion steps during development
- Use smaller horizons for simple tasks
- Use U-Net for image inputs
- Increase diffusion and inference steps
- Use cosine noise schedule
- Enable EMA for stable inference
-
Parameter Selection
- Start with default parameters
- Gradually adjust based on task complexity
- Monitor success rate and convergence
-
Network Choice
- MLP: For vector states (positions, velocities)
- U-Net: For image observations
-
Horizon Setting
- Longer horizons: Better for planning tasks
- Shorter horizons: Better for reactive tasks
-
Training Strategy
- Use HER for sparse reward tasks
- Monitor both loss and success rate
- Save models with high success rates
Run demo program to see all features:
# Full demo
python demo_diffusion.py
# Only demo feature capabilities
python demo_diffusion.py --demo features
# Only demo training process
python demo_diffusion.py --demo training
A: Try increasing num_inference_steps
or using larger ema_decay
A: Adjust learning rate diffusion_lr
, or try "sample" prediction type
A: Reduce batch_size
, horizon_steps
, or hidden_dim
A: Check input image dimensions, ensure (C, H, W) format
- Start with quick configuration and validate
- Gradually increase complexity
- Compare different noise schedules
- Adjust horizon parameters to observe effects
- Monitor training loss and success rate