Skip to content

Core dumped (Segmentation Fault) while training Gr00t #2896

@hoangminhtoan

Description

@hoangminhtoan

Ticket Type

🐛 Bug Report (Something isn't working)

Environment & System Info

Ubuntu 22.04
CUDA ver 12.08, NVIDIA Driver ver.570
GPU NVIDIA RTX 5880 Ada (48GB)
Lerobot 0.4.3
transformers 4.57.6

Description

I'm facing issue segmentation fault while training gr00t policy with below parameters. No issue with ACT, Diffusion, and SmolVLA policies. I worked with the first few steps and then crashed.

CUDA_VISIBLE_DEVICES=0 python $DIR/lerobot/src/lerobot/scripts/lerobot_train.py \ --dataset.root=<data_path> \ --dataset.repo_id=<data_id> \ --output_dir=$DIR/outputs/train/2026-02-03/09-12-00_groot \ --batch_size=24 \ --num_workers=0 \ --steps=60000 \ --eval_freq=10000 \ --log_freq=1000 \ --save_freq=10000 \ --policy.push_to_hub=false \ --policy.device=cuda \ --wandb.enable=false \ --wandb.disable_artifact=true \ --wandb.project='VM_IL' \ --wandb.notes='Train Gr00t policy ' \ --policy.type=groot \

Context & Reproduction

INFO 2026-02-03 11:52:45 ot_train.py:327 Start offline training on a fixed dataset
use_fast is set to True but the image processor class does not have a fast version. Falling back to the slow version.
INFO 2026-02-03 12:06:14 ot_train.py:354 step:1K smpl:24K ep:136 epch:0.44 loss:0.121 grdn:0.736 lr:7.4e-05 updt_s:0.588 data_s:0.220
INFO 2026-02-03 12:19:22 ot_train.py:354 step:2K smpl:48K ep:273 epch:0.88 loss:0.056 grdn:0.413 lr:9.5e-05 updt_s:0.598 data_s:0.190
./scripts/train_val/train_gr00t.sh: line 29: 11866 Segmentation fault (core dumped) CUDA_VISIBLE_DEVICES=0 ...

Relevant logs or stack trace

Checklist

  • I have searched existing tickets to ensure this isn't a duplicate.
  • I am using the latest version of the main branch.
  • I have verified this is not an environment-specific problem.

Additional Info / Workarounds

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn’t working correctlydatasetIssues regarding data inputs, processing, or datasetsperformanceIssues aimed at improving speed or resource usagepoliciesItems related to robot policiesprocessorIssue related to processortrainingIssues related at training time

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions