Skip to content

Diffusion Policy don't support n_obs_steps=1? #2277

@Hukongtao

Description

@Hukongtao

System Info

- `lerobot` version: 0.3.4
- Platform: Linux-5.15.0-94-generic-x86_64-with-glibc2.35
- Python version: 3.10.12
- Huggingface_hub version: 0.35.1
- Dataset version: 3.6.0
- Numpy version: 2.2.6
- PyTorch version (GPU?): 2.7.1+cu126 (True)
- Cuda version: 12060
- Using GPU in script?: <fill in>

Information

  • One of the scripts in the examples/ folder of LeRobot
  • My own task or dataset (give details below)

Reproduction

python src/lerobot/scripts/train.py \
    --output_dir=/mnt/hukongtao/models/lerobot/diffusion_agibot_20251021_90episodes_n_action_steps30/ \
    --policy.type=diffusion \
    --policy.push_to_hub=false \
    --policy.crop_shape=[224,224] \
    --policy.noise_scheduler_type="DDPM" \
    --policy.n_obs_steps=1 \
    --policy.horizon=32 \
    --policy.n_action_steps=30 \
    --policy.drop_n_last_frames=2 \
    --dataset.repo_id=lerobot/pusht \
    --dataset.root=/mnt/hukongtao/datasets/1009_pickup_lerobot_right_hand_1019_split/train/ \
    --seed=100000 \
    --batch_size=64 \
    --steps=20000 \
    --log_freq=200 \
    --save_freq=5000 \
    --wandb.enable=true \
    --wandb.project=diffusion_agibot_20251021_90episodes_n_action_steps30

I tried to train the diffusion policy with my own dataset and set horizon=32,n_action_steps=30,n_obs_steps=1. But I got an error:

Traceback (most recent call last):
  File "/mnt/hukongtao/codebase/lerobot_0909_d602e81/src/lerobot/scripts/train.py", line 295, in <module>
    main()
  File "/mnt/hukongtao/codebase/lerobot_0909_d602e81/src/lerobot/scripts/train.py", line 291, in main
    train()
  File "/mnt/hukongtao/codebase/lerobot_0909_d602e81/src/lerobot/configs/parser.py", line 225, in wrapper_inner
    response = fn(cfg, *args, **kwargs)
  File "/mnt/hukongtao/codebase/lerobot_0909_d602e81/src/lerobot/scripts/train.py", line 212, in train
    train_tracker, output_dict = update_policy(
  File "/mnt/hukongtao/codebase/lerobot_0909_d602e81/src/lerobot/scripts/train.py", line 71, in update_policy
    loss, output_dict = policy.forward(batch)
  File "/mnt/hukongtao/codebase/lerobot_0909_d602e81/src/lerobot/policies/diffusion/modeling_diffusion.py", line 161, in forward
    loss = self.diffusion.compute_loss(batch)
  File "/mnt/hukongtao/codebase/lerobot_0909_d602e81/src/lerobot/policies/diffusion/modeling_diffusion.py", line 333, in compute_loss
    global_cond = self._prepare_global_conditioning(batch)  # (B, global_cond_dim)
  File "/mnt/hukongtao/codebase/lerobot_0909_d602e81/src/lerobot/policies/diffusion/modeling_diffusion.py", line 267, in _prepare_global_conditioning
    img_features = self.rgb_encoder(
  File "/mnt/hukongtao/environments/lerobot/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/hukongtao/environments/lerobot/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/hukongtao/codebase/lerobot_0909_d602e81/src/lerobot/policies/diffusion/modeling_diffusion.py", line 517, in forward
    x = torch.flatten(self.pool(self.backbone(x)), start_dim=1)
  File "/mnt/hukongtao/environments/lerobot/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/hukongtao/environments/lerobot/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/hukongtao/environments/lerobot/lib/python3.10/site-packages/torch/nn/modules/container.py", line 240, in forward
    input = module(input)
  File "/mnt/hukongtao/environments/lerobot/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/hukongtao/environments/lerobot/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/hukongtao/environments/lerobot/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 554, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/mnt/hukongtao/environments/lerobot/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 549, in _conv_forward
    return F.conv2d(
RuntimeError: Given groups=1, weight of size [64, 3, 7, 7], expected input[1, 6, 224, 224] to have 3 channels, but got 6 channels instead

I have currently solved this problem in the following way, but I am not sure if it is the best solution.

item[vid_key] = frames.squeeze(0)

I change item[vid_key] = frames.squeeze(0) to item[vid_key] = frames

Expected behavior

Model training normally

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn’t working correctlypoliciesItems related to robot policies

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions