If you find our code or paper helpful, please consider starring this repository and citing the following:
@misc{snapmogen2025,
title={SnapMoGen: Human Motion Generation from Expressive Texts},
author={Chuan Guo and Inwoo Hwang and Jian Wang and Bing Zhou},
year={2025},
eprint={2507.09122},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2507.09122},
}
๐ข 2023-11-29 --- Initialized the webpage and git project.
conda env create -f environment.yml
conda activate momask-plus
If you encounter issues with Conda, you can install the dependencies using pip:
pip install -r requirements.txt
โ Tested on Python 3.8.20.
bash prepare/download_models.sh
(For evaluation only.)
bash prepare/download_evaluators.sh
bash prepare/download_glove.sh
To address the download error related to gdown: "Cannot retrieve the public link of the file. You may need to change the permission to 'Anyone with the link', or have had many accesses". A potential solution is to run pip install --upgrade --no-cache-dir gdown
, as suggested on wkentaro/gdown#43. This should help resolve the issue.
Visit [Google Drive] to download the models and evaluators mannually.
HumanML3D - Follow the instruction in HumanML3D, then copy the dataset to your data folder:
cp -r ./HumanML3D/ your_data_folder/HumanML3D
SnapMoGen - Download the data from huggingface, then place it in the following directory:
cp -r ./SnapMoGen your_data_folder/SnapMoGen
Remember to update the
data.root_dir
in all theconfig/*.yaml
files - with your own data directory path.
To generate motion from your own text prompts, use:
python gen_momask_plus.py
You can modify the inference configuration (e.g., number of diffusion steps, guidance scale, etc.) in config/eval_momaskplus.yaml
.
Run the following scripts for quantitive evaluation:
python eval_momask_plus_hml.py # Evaluate on HumanML3D dataset
python eval_momask_plus.py # Evaluate on SnapMoGen dataset
There are two main components in MoMask++, a multi-scale residual motion VQVAE and a generative masked Transformer.
All checkpoints will be stored under
/checkpoint_dir
.
python train_rvq_hml.py # Train RVQVAE on HumanML3D
python train_rvq.py # Train RVQVAE on SnapMoGen
Configuration files:
config/residual_vqvae_hml.yaml
(for HumanML3D)config/residual_vqvae.yaml
(for SnapMoGen)
python train_momask_plus_hml.py # Train on HumanML3D
python train_momask_plus.py # Train on SnapMoGen
Configuration files:
config/train_momaskplus_hml.yaml
(for HumanML3D)config/train_momaskplus.yaml
(for SnapMoGen)
Remember to change
vq_name
andvq_ckpt
to your VQ name and VQ checkpoint in these two configuration files. Training accuracy at around 0.25 is normal.
We use a separate lightweight root motion regressor to refine the root trajectory. In particular, this regressor is trained given local motion features to predict root linear velocities. During motion generation, we use this regressor to re-predict the resulting root trajectories which effectively reduces sliding feet.
All animations were manually rendered in Blender using Bitmoji characters.
An example character is available here, and we use this Blender scene for animation rendering.
We recommend using the Rokoko Blender add-on (v1.4.1) for seamless motion retargeting.
โ ๏ธ Note: All motions in SnapMoGen use T-Pose as the rest pose.
If your character rig is in A-Pose, use the rest_pose_retarget.py
to convert between T-Pose and A-Pose rest poses:
We sincerely thank the open-sourcing of these works where our code is based on:
MoMask, VAR, deep-motion-editing, Muse, vector-quantize-pytorch, T2M-GPT, MDM and MLD
Contact [email protected] for further questions.