Releases: ARISE-Initiative/robomimic
v0.5
robomimic v0.5 Release Notes
Highlights
- 🧠 New Algorithm (Diffusion Policy)
- 🕹️ Action dicts and normalization support
- 🗄️ Multi-dataset training
- 📝 Language-conditioned policy learning
- ⏯️ Resume functionality
- Other quality of life improvements
🧠 New Algorithm - Diffusion Policy
Diffusion Policy (see paper) has become increasingly prevalent in the research community. We provide an implementation of UNet-based Diffusion Policy in this release, which comes with an umbrella support for action normalization and training with per-batch learning rate update. It is a strong baseline for the robomimic datasets, often outperforming BC-RNN. We hope that this provides an easy way for our users to start using diffusion-based policies.
We provide a template configuration file for Diffusion Policy that contains suggested parameters for the model. To learn how to train a Diffusion Policy, either with our default configuration file or with custom parameters, please refer to our tutorial.
🕹️ Action dictionaries and normalization
You can now configure multiple action spaces and normalization in robomimic, particularly useful for tasks with multiple different action spaces such as in robot manipulation. This is particularly useful for:
- Robot manipulation tasks with different action components (e.g., end-effector position and rotation)
- Actions that require different normalization schemes
The action configuration consists of two main components:
action_keys: List of action components to useaction_config: Dictionary specifying how each action component should be processed
For more details on how to use action configs, please refer to the documentation.
🗄️ Multi-dataset Training
Robomimic supports training on multiple datasets simultaneously. This is useful when you want to:
- Train a single model on multiple tasks
- Combine datasets with different qualities (e.g., expert and suboptimal demonstrations)
- Balance data from different sources
Each dataset can have its own weight for sampling, and you can control whether these weights are normalized by dataset size or not. For more details on how to setup multi-dataset training, please refer to the documentation.
📝 Language-conditioned policy learning
You can now train language-conditioned policies in robomimic. We support CLIP embeddings for encoding language, and two different ways of conditioning policies on embeddings:
- As feature input to action head
- FiLM over vision encoder
For more details on how to train language-conditioned policies, please refer to the documentation.
⏯️ Resume functionality
If your training job fails due to any reason, you can re-launch your job with the additional --resume flag to resume training from the last saved epoch. This will resume training from the last.pth checkpoint in your output directory. For more details, please refer to the documentation.
Other Improvements
We outline other improvements here.
- Better data augmentation support (chaining multiple observation randomizers)
- Cosine learning rate scheduler with per-batch stepping
- Updates to BC-Transformer to predict future actions
- In addition to using env metadata extracted from a dataset for policy rollout (see docs), added ability to update env metadata using
env_meta_update_dictin a training config (e.g. for evaluating with absolute actions) - Observation extraction with multi-processing
- Deprecated API:
EnvGibsonMOMARTno longer supported,postprocess_visual_obsmoved from from env wrappers toRolloutPolicy - Updated docs and tutorials
Breaking Changes
We make breaking changes that might affect some users:
EnvGibsonMOMARTis no longer supportedpostprocess_visual_obshas been moved from from env wrappers toRolloutPolicy- Observation normalization stats from old checkpoints cannot be loaded
Contributor Spotlight
This release was a major team effort. Here is a breakdown of contributed features.
- Diffusion Policy (@cheng-chi)
- Cosine learning rate scheduler with per-batch stepping (@vaibhavsaxena11)
- Support for multi-dataset training (e.g. co-training) (@snasiriany)
- Action dictionaries and action normalization (@snasiriany)
- Language-conditioned policy training (CLIP language encoder and FiLM language conditioning support for ResNet18 and VisualCore) (@snasiriany, @amandlek, @NKrypt26, @vaibhavsaxena11)
- Resume training functionality (@amandlek)
- Support for chaining multiple observation randomizers (@vaibhavsaxena11)
- Support postprocess_visual_obs in RolloutPolicy (previously in env wrappers) (@vaibhavsaxena11)
- Support for MimicLabs environments in EnvRobosuite (@vaibhavsaxena11)
- Update BC-Transformer with functionality to predict future action chunks (@snasiriany)
- Dockerfile for robomimic (@SurafelAnshebo)
v0.4
robomimic v0.4 Release Notes
Highlights
- Full compatibility with robosuite v1.5. Existing robomimic datasets can now be used with this latest version of robosuite as well.
- Migrated robomimic datasets to be hosted on HuggingFace at this repository
v0.3
robomimic v0.3 Release Notes
Highlights
- 🧠 New Algorithms (BC-Transformer, IQL)
- 🤖 Full compatibility with robosuite v1.4 and DeepMind's MuJoCo bindings
- 👁️ The ability to use pre-trained image representations for policy learning
- 📈 Support for wandb logging
- Better data augmentation support
- Other quality of life improvements
New Algorithms
BC-Transformer
Transformer-based policies are becoming increasingly prevalent in the research community. We provide an implementation of BC-Transformer (Behavioral Cloning with a Transformer). It is a strong baseline for the robomimic datasets, often outperforming BC-RNN. We hope that this provides an easy way for our users to start using transformer-based policies.
We provide a template configuration file for BC-Transformer that contains suggested parameters for the model. To learn how to train a transformer, either with our default configuration file or with custom parameters, please refer to our tutorial.
IQL
Implicit Q-Learning (IQL) is a popular offline RL method that has seen widespread use in the research community. We provide an implementation along with benchmarking results on D4RL. Reproducing these results is easy -- see the instructions here.
Pre-trained Image Representations
We make it easy to use pre-trained image representations such as R3M and MVP for image-based agents in robomimic. See instructions here.
Wandb Logging
We now support logging with wandb (in addition to tensorboard). Using it is simple. Follow this guide for more details.
Better Data Augmentation Support
We now support observation randomizers, including crop randomizer, color randomizer, and gaussian randomizer. They will correctly detect the train/eval mode and change their behaviors accordingly.
CropRandomizer: Take random crops of image-like observations.ColorRandomizer: Samples random color jitters in terms of brightness, contrast, saturation, and hue. Applies to RGB image observations.GaussianRandomizer: Add Gaussian noise to input. Applies to any observations.
For more details on how they work, please refer to the documentation
Full compatibility with robosuite v1.4 and DeepMind's MuJoCo bindings
Our original robomimic datasets are now compatible with robosuite v1.4 (the latest robosuite release) and the DeepMind MuJoCo bindings. The best part of this is that MuJoCo installation is much less painful now!
We highly recommend updating robosuite and switching from mujoco-py to DeepMind's MuJoCo bindings. All our tests now assume robosuite v1.4 is installed, and our dataset download script download the versions compatible with robosuite v1.4 by default. You can still use the old versions by using the download script in robomimic v0.2, or using the download links in the v0.2 documentation.
Other Improvements
We outline other improvements here.
- Image datasets can use significantly less storage now -- we provide an option to use image compression and also exclude the
next_obskey (which is unneeded for imitation learning algorithms) in our image extraction script. More details here. - A new colab notebook to get started easily
- Users must now specify train and validation hdf5 keys explicitly in the config when setting
config.experiment.validate = True. This avoids any ambiguity in what data is being used for training and for validation, and offers more flexibility for using different training and validation splits. See more notes here. - Moved some core observation processing models into
models/obs_core.py - Updated docs and tutorials
Contributor Spotlight
We would like to introduce the newest member of our robomimic core team Matthew Bronars, who helped with this release!
This release was a major team effort. Here is a breakdown of contributed features by team member.
- BC-Transformer (@snasiriany, @MBronars, @amandlek)
- IQL (@snasiriany)
- Pre-trained image representations (@j96w, @danfeiX, @MBronars)
- Wandb logging (@snasiriany)
- Better Data Augmentation Support (@danfeiX, @cremebrule)
- Compatibility with robosuite v1.4 (@snasiriany)
- Colab notebook (@danfeiX)
- Improvements to dataset storage and train-valid splits (@amandlek)
v0.2.0
robomimic 0.2.0 Release Notes
Highlights
This release of robomimic brings integrated support for mobile manipulation datasets from the recent MOMART paper, and adds modular features for easily modifying and adding custom observation modalities and corresponding encoding networks.
MOMART Datasets
We have added integrated support for MOMART datasets, a large-scale set of multi-stage, long-horizon mobile manipulation task demonstrations in a simulated kitchen environment collected in iGibson.
Using MOMART Datasets
Datasets can be easily downloaded using download_momart_datasets.py.
For step-by-step instructions for setting up your machine environment to visualize and train with the MOMART datasets, please visit the Getting Started page.
Modular Observation Modalities
We also introduce modular features for easily modifying and adding custom observation modalities and corresponding encoding networks. A modality corresponds to a group of specific observations that should be encoded the same way.
Default Modalities
robomimic natively supports the following modalities (expected size from a raw dataset shown, excluding the optional leading batch dimension):
rgb(H, W, 3): Standard 3-channel color frames with values in range[0, 255]depth(H, W, 1): 1-channel frame with normalized values in range[0, 1]low_dim(N): low dimensional observations, e.g.: proprioception or object statesscan(1, N): 1-channel, single-dimension data from a laser range scanner
We have default encoder networks which can be configured / modified by setting relevant parameters in your config, e.g.:
# These keys should exist in your dataset
config.observation.modalities.obs.rgb = ["cam1", "cam2", "cam3"] # Add camera observations to the RGB modality
config.observation.modalities.obs.low_dim = ["proprio", "object"] # Add proprioception and object states to low dim modality
...
# Now let's modify the default RGB encoder network and set the feature dimension size
config.observation.encoder.rgb.core_kwargs.feature_dimension = 128
...To see the structure of the observation modalities and encoder parameters, please see the base config module.
Custom Modalities
You can also easily add your own modality and corresponding custom encoding network! Please see our example add_new_modality.py.
Refactored Config Structure
With the introduction of modular modalities, our Config class structure has been modified slightly, and will likely cause breaking changes to any configs you have created using version 0.1.0. Below, we describe the exact changes in the config that need to be updated to match the current structure:
Observation Modalities
The image modality have been renamed to rgb. Thus, you would need to change your config in any places referencing image modality, e.g.:
# Old format
config.observation.modalities.image.<etc>
# New format
config.observation.modalities.rgb.<etc>The low_dim modality remains unchanged. Note, however, that we have additionally added integrated support for both depth and scan modalities, and can be referenced in the same way, e.g.:
config.observation.modalities.depth.<etc>
config.observation.modalities.scan.<etc>Observation Encoders / Randomizer Networks
We have modularized the encoder / randomizer arguments so that they are general, and are unique to each type of observation modality. All of the original arguments in v0.1.0 have been preserved, but are now re-formatted as follows:
############# OLD ##############
# Previously, a single set of arguments were specified, and was hardcoded to process image (rgb) observations
# Assumes that you're using the VisualCore class, not general!
config.observation.encoder.visual_feature_dimension = 64
config.observation.encoder.visual_core = 'ResNet18Conv'
config.observation.encoder.visual_core_kwargs.pretrained = False
config.observation.encoder.visual_core_kwargs.input_coord_conv = False
# For pooling, is hardcoded to use spatial softmax or not, not general!
config.observation.encoder.use_spatial_softmax = True
# kwargs for spatial softmax layer
config.observation.encoder.spatial_softmax_kwargs.num_kp = 32
config.observation.encoder.spatial_softmax_kwargs.learnable_temperature = False
config.observation.encoder.spatial_softmax_kwargs.temperature = 1.0
config.observation.encoder.spatial_softmax_kwargs.noise_std = 0.0
############# NEW ##############
# Now, argument names are general (network-agnostic), and are specified per modality!
# Example for RGB, to reproduce the above configuration
# The core encoder network can be arbitrarily specified!
config.observation.encoder.rgb.core_class = "VisualCore"
# Corresponding kwargs that should be passed to the core class are specified below
config.observation.encoder.rgb.core_kwargs.feature_dimension = 64
config.observation.encoder.rgb.core_kwargs.backbone_class = "ResNet18Conv"
config.observation.encoder.rgb.core_kwargs.backbone_kwargs.pretrained = False
config.observation.encoder.rgb.core_kwargs.backbone_kwargs.input_coord_conv = False
# The pooling class can also arbitrarily be specified!
config.observation.encoder.rgb.core_kwargs.pool_class = "SpatialSoftmax"
# Corresponding kwargs that should be passed to the pooling class are specified below
config.observation.encoder.rgb.core_kwargs.pool_kwargs.num_kp = 32
config.observation.encoder.rgb.core_kwargs.pool_kwargs.learnable_temperature = False
config.observation.encoder.rgb.core_kwargs.pool_kwargs.temperature = 1.0
config.observation.encoder.rgb.core_kwargs.pool_kwargs.noise_std = 0.0Thankfully, the observation randomization network specifications were already modularized, but were hardcoded to process image (rgb) modality only. Thus, the only change we made is to allow the randomization kwargs to be specified per modality:
############# OLD ##############
# Previously, observation randomization was hardcoded for image / rgb modality
config.observation.encoder.obs_randomizer_class = None
config.observation.encoder.obs_randomizer_kwargs.crop_height = 76
config.observation.encoder.obs_randomizer_kwargs.crop_width = 76
config.observation.encoder.obs_randomizer_kwargs.num_crops = 1
config.observation.encoder.obs_randomizer_kwargs.pos_enc = False
############# NEW ##############
# Now, the randomization arguments are specified per modality. An example for RGB is shown below
config.observation.encoder.rgb.obs_randomizer_class = None
config.observation.encoder.rgb.obs_randomizer_kwargs.crop_height = 76
config.observation.encoder.rgb.obs_randomizer_kwargs.crop_width = 76
config.observation.encoder.rgb.obs_randomizer_kwargs.num_crops = 1
config.observation.encoder.rgb.obs_randomizer_kwargs.pos_enc = FalseYou can also view the default configs and compare your config to these templates to view exact diffs in structure.
v0.1.0 - Initial Release
This is the first release of robomimic. This version should be used when trying to reproduce results from the study.