Skip to content

irom-princeton/womap

Repository files navigation

WoMAP: World Models For Embodied
Open-Vocabulary Object Localization

Tenny Yin*, Zhiting Mei, Tao Sun, Lihan Zha, Emily Zhou+, Jeremy Bao+, Miyu Yamane+, Ola Shorinwa*, Anirudha Majumdar

*Equal Contribution. +Authors contributed equally.

WoMAP

We introduce World Models for Active Perception (WoMAP): a recipe for training open-vocabulary object localization policies that: (i) uses a Gaussian Splatting-based real-to-sim-to-real pipeline for scalable data generation without the need for expert demonstrations, (ii) distills dense rewards signals from open-vocabulary object detectors, and (iii) leverages a latent world model for dynamics and rewards prediction to ground high-level action proposals at inference time.

Getting Started

  1. Installation
  2. Training the World Model
  3. Running Experiments

Installation

  1. Clone this repo.
git clone https://github.com/irom-princeton/womap.git
  1. Install wmap as a Python package.
python -m pip install -e .

Training the World Model

Train (with GSplat):

DINO, CLIP, or ViT Encoder with Frozen Weights (with Dynamics and Rewards Predictors)
python main.py --fname configs/gsplat/cfg_target_seq2.yaml \
--projname <project name, e.g., 0211-test-encoder> \
--expname <experiment name, e.g., test> \
--encoder <dino, clip, vit> --frozen
DINO, CLIP, or ViT Encoder with Frozen Weights (without Dynamics Predictor)
python main.py --fname configs/gsplat/cfg_target_seq2.yaml \
--projname <project name, e.g., 0211-test-encoder> \
--expname <experiment name, e.g., test> \
--encoder <dino, clip, vit> --frozen \
--ablate_rewards
DINO, CLIP, or ViT Encoder with Frozen Weights (without Rewards Predictor)
python main.py --fname configs/gsplat/cfg_target_seq2.yaml \
--projname <project name, e.g., 0211-test-encoder> \
--expname <experiment name, e.g., test> \
--encoder <dino, clip, vit> --frozen \
--ablate_dynamics

You can find bash scripts for ablating the dynamics and rewards predictors and for training the entire model in:

bash_scripts/ablation_dynamics.bash
bash_scripts/ablation_rewards.bash
bash_scripts/train_model.bash

respectively, which you can run on the terminal, e.g., via:

bash bash_scripts/ablation_dynamics.bash

To enable finetuning the encoder's weights, remove the flag --frozen.

Templates for running jobs via SLURM are available in the slurm_scripts directory. Please update the following fields in the shell scripts:

#SBATCH --mail-user=<princeton Net ID>@princeton.edu
#

# load modules or conda environments here
source ~/.bashrc

# activate virtual environment
micromamba activate <path to micromama environment>
# or path to conda env
# conda activate <path to micromama environment>

# run
cd <path to the project folder>
bash bash_scripts/ablation_rewards.bash

For example, to run an ablation experiment on the rewards predictor, run the following command:

sbatch slurm_scripts/slurm_ablation_rewards.sh

To run an ablation experiment on the dynamics predictor, run the following command:

sbatch slurm_scripts/slurm_ablation_dynamics.sh

To train the dynamics and rewards predictors, run the following command:

sbatch slurm_scripts/slurm_train_model.sh

The output files can be found in slurm_outputs.

Experiments

Generate experiment configurations

Generate experiment config files under experiment_configs/ speicifying the model and experiments to run

Run experiments

Specify relevant arguments in the bash file. The script will submit one job per experiment configuration (model + experiment):

bash bash_scripts/submit_experiments.bash 

Visualize Results

Interactive script using extract_result.py

About

World Models for Active Perception

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published