If you want to reproduce the DreamZero DROID dataset conversion yourself (or modify the filtering), follow the steps below. This requires the raw DROID 1.0.1 dataset in RLDS format and the idle filter ranges JSON.
Most users should skip this and simply download the preprocessed dataset:
huggingface-cli download GEAR-Dreams/DreamZero-DROID-Data --repo-type dataset --local-dir ./data/droid_lerobot
pip install tensorflow tensorflow-datasets polars avThis requires gsutil (Google Cloud CLI). The full dataset is ~1.7TB.
gsutil -m cp -r gs://gresearch/robotics/droid/1.0.1 ./data/droid/1.0.1Important: Use version 1.0.1, not 1.0.0. Version 1.0.1 contains the complete set of language annotations (~75k episodes).
This JSON file maps each episode to the frame ranges that should be kept (non-idle frames). It was originally computed by Physical Intelligence for training pi0-DROID models.
gsutil cp gs://openpi-assets/droid/droid_sample_ranges_v1_0_1.json ./data/keep_ranges.jsonpython scripts/data/convert_droid.py \
./data/droid/1.0.1 \
./data/droid_lerobot \
--keep-ranges-path ./data/keep_ranges.json \
--filter-failed \
-n 16For a quick test with a small subset:
python scripts/data/convert_droid.py \
./data/droid/1.0.1 \
./data/droid_lerobot_test \
--keep-ranges-path ./data/keep_ranges.json \
--filter-failed \
--first-n 5 \
-n 4See scripts/data/convert_droid.py for full usage:
python scripts/data/convert_droid.py --help