-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Description
Ticket Type
🐛 Bug Report (Something isn't working)
Environment & System Info
Ubuntu 22.04
CUDA ver 12.08, NVIDIA Driver ver.570
GPU NVIDIA RTX 5880 Ada (48GB)
Lerobot 0.4.3
transformers 4.57.6Description
I'm facing issue segmentation fault while training gr00t policy with below parameters. No issue with ACT, Diffusion, and SmolVLA policies. I worked with the first few steps and then crashed.
CUDA_VISIBLE_DEVICES=0 python $DIR/lerobot/src/lerobot/scripts/lerobot_train.py \ --dataset.root=<data_path> \ --dataset.repo_id=<data_id> \ --output_dir=$DIR/outputs/train/2026-02-03/09-12-00_groot \ --batch_size=24 \ --num_workers=0 \ --steps=60000 \ --eval_freq=10000 \ --log_freq=1000 \ --save_freq=10000 \ --policy.push_to_hub=false \ --policy.device=cuda \ --wandb.enable=false \ --wandb.disable_artifact=true \ --wandb.project='VM_IL' \ --wandb.notes='Train Gr00t policy ' \ --policy.type=groot \
Context & Reproduction
INFO 2026-02-03 11:52:45 ot_train.py:327 Start offline training on a fixed dataset
use_fast is set to True but the image processor class does not have a fast version. Falling back to the slow version.
INFO 2026-02-03 12:06:14 ot_train.py:354 step:1K smpl:24K ep:136 epch:0.44 loss:0.121 grdn:0.736 lr:7.4e-05 updt_s:0.588 data_s:0.220
INFO 2026-02-03 12:19:22 ot_train.py:354 step:2K smpl:48K ep:273 epch:0.88 loss:0.056 grdn:0.413 lr:9.5e-05 updt_s:0.598 data_s:0.190
./scripts/train_val/train_gr00t.sh: line 29: 11866 Segmentation fault (core dumped) CUDA_VISIBLE_DEVICES=0 ...
Relevant logs or stack trace
Checklist
- I have searched existing tickets to ensure this isn't a duplicate.
- I am using the latest version of the
mainbranch. - I have verified this is not an environment-specific problem.
Additional Info / Workarounds
No response