Shuang Liang1,4*, Jing He3, Chuanmeizhi Wang1, Lejun Liao2, Guo Zhang1, Ying-Cong Chen3,5 Yuan Yuan2β
1Rama Alpaca Technology Company, 2Boston College, 3HKUST(GZ), 4The University of Hong Kong, 5HKUST
*Work done during an internship at Rama Alpaca Technology. β Corresponding author.
- [2026-Mar-20] π Revised version of our paper released on arXiv.
- [2026-Mar-20] πΌοΈ COCO-OOD Ukiyoe, corruption subsets released! Please refer to the dataset section of this repo.
- [2025-Dec-03] π Revised version of our paper released on arXiv.
- [2025-Oct-28] π§© ComfyUI node of SDPose-OOD is now available! We sincerely thank @judian17 and @Piscesbody for their excellent contribution in developing this ComfyUI node, which enables downstream ComfyUI workflows and helps more people explore and apply our work. π Check it out here: ComfyUI-SDPose-OOD
- [2025-Oct-14] π Wholebody model and Huggingface Space Demo released! You can now run SDPose demos in our Huggingface space! Check out our π€ SDPose Huggingface Space and π€ SDPose-Wholebody Model Repository.
- [2025-Oct-13] π Gradio local deployment script released! You can now run SDPose demos locally on your machine.
- [2025-Oct-12] π Body model, COCO-OOD validation Benchmark and inference code released! Check out our π€ SDPose-Body Model Repository.
- [2025-Sep-29] π Paper released on arXiv.
- We plan to release the training scripts upon the acceptance of the paper.
- We plan to release new models soon! Stay tuned!
- HuggingFace space demo release
- WholeBody model release
- Gradio local deployment script release
- Body model and Inference code release
- COCO-OOD Validation Benchmark release
SDPose leverages the powerful visual priors from Stable Diffusion to achieve state-of-the-art performance in:
- β Out-of-Domain (OOD) Generalization: Superior performance on unseen domains without fine-tuning
- β Robust Pose Estimation: Handles challenging scenarios including occlusions, rare poses, and artistic styles
- β Body & Wholebody Support: Supports both body keypoints (17) and wholebody keypoints (133)
SDPose demonstrates robust performance on animation videos.
π‘ Tip: For more interactive demos and real-time inference, check out our π€ HuggingFace Spaces!
- Clone the repository
git clone https://github.com/t-s-liang/SDPose-OOD.git
cd SDPose-OOD- Create a conda environment
conda create -n SDPose python=3.10
conda activate SDPose- Install dependencies
pip install -r requirements.txtDownload the pre-trained Body model checkpoint from our HuggingFace Model repository:
π€ SDPose-Body Model π€ SDPose-Wholebody Model The model repository contains the checkpoint files and detailed usage instructions.
We provide interactive Gradio demos on HuggingFace Spaces:
You can now run the Gradio demo on your local machine!
Since SDPose is a top-down pose estimation method, it requires an object detection model to detect humans in the image first. We recommend using YOLO11-x for robust human detection:
Download YOLO11-x model:
# Download the YOLO11-x pretrained model
wget https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11x.pt -P models/cd gradio_app
bash launch_gradio.shThe Gradio interface will be available at http://localhost:7860 (or the port specified in the launch script).
Use the provided evaluation script to run inference on standard pose estimation benchmarks.
Edit scripts/eval.sh to configure the evaluation parameters:
# Dataset settings
dataset_name='COCO' # Dataset name: COCO, HumanArt, etc.
keypoint_scheme='body' # 'body' (17 keypoints) or 'wholebody' (133 keypoints)
dataset_root='/path/to/datasets' # Root directory of datasets
ann_file='/path/to/annotation.json' # Annotation file path
# Model settings
checkpoint_path='/path/to/checkpoint' # Path to SDPose checkpoint
# Inference settings
eval_batch_size=16 # Batch size per GPU
dataloader_num_workers=16 # Number of data loading workersFor COCO evaluation, please download the precomputed person detection bounding boxes from: https://huggingface.co/noahcao/sapiens-pose-coco/tree/main/sapiens_host/pose/person_detection_results
These detection results are required for evaluation under the top-down protocol on COCO, COCO-OOD, and COCO-WholeBody.
The expected directory structure is:
${DATASET_ROOT}/
β
βββ COCO/
β βββ annotations/
β β βββ person_keypoints_train2017.json
β β βββ person_keypoints_val2017.json
β β βββ coco_wholebody_train_v1.0.json
β β βββ coco_wholebody_val_v1.0.json
β β
β βββ train2017/
β βββ val2017/
β βββ val2017oil/
β βββ person_detection_results/
β βββ COCO_val2017_detections_AP_H_70_person.json
β
βββ HumanArt/
βββ annotations/
β βββ validation_humanart.json
βββ images/
When running evaluation, the dataloader will automatically locate the correct annotation and bounding box files based on the specified dataset name:
- COCO β standard COCO validation
- COCO_OOD β COCO stylized (val2017oil)
- COCOWholebody β COCO-WholeBody validation
- COCO-OOD_Wholebody β COCO-WholeBody OOD validation
- HumanArt β HumanArt validation setcd scripts
bash eval.shThis will:
- Load the SDPose model from the checkpoint
- Run inference on the specified dataset
- Compute evaluation metrics (AP, AR, etc.)
- Print results to console
To complement the HumanArt dataset and enable OOD evaluation under matched content and labels, we constructed COCO-OOD by applying artistic style transfer to the original COCO images.
We adopt the official CycleGAN and StyTR2 framework to perform image-to-image translation from the COCO domain (natural photographs) to the target domain of Ukiyo-e and Monet-style paintings. During conversion, all validation images in COCO are processed to produce style-transferred counterparts, while preserving their original human annotations (bounding boxes, keypoints). This yields an OOD variant of COCO in which the underlying scene structure is unchanged, but the texture, color palette, and brushstroke patterns are consistent with oil/ukiyo-e artistic style. We also utilize Nano-banana as a style transfer tool to produce color sketch versions of COCO-OOD.
Importantly, for fair comparison and to avoid introducing priors from large-scale pretrained diffusion models, we intentionally adopt the earlier StyTR2 and CycleGAN framework rather than more recent style transfer methods. Such stylization introduces a significant appearance shift while keeping pose-related geometric information intact, making it suitable for robust pose estimation evaluation.
π₯ Download COCO-OOD Monet Dataset from Google Drive
π₯ Download COCO-OOD Corruption Dataset from Google Drive
π₯ Download COCO-OOD Ukiyoe Dataset from Google Drive
If you find SDPose useful in your research, please consider citing:
@misc{liang2025sdposeexploitingdiffusionpriors,
title={SDPose: Exploiting Diffusion Priors for Out-of-Domain and Robust Pose Estimation},
author={Shuang Liang and Jing He and Chuanmeizhi Wang and Lejun Liao and Guo Zhang and Yingcong Chen and Yuan Yuan},
year={2025},
eprint={2509.24980},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2509.24980},
}This project is released under the MIT License.
This project is built upon the following excellent open-source projects:
- MMPose: OpenMMLab pose estimation toolbox
- Diffusers: HuggingFace diffusion models library
- Marigold: Diffusion-based depth estimation
- Lotus: Diffusion-based dense prediction
- Stable Diffusion: Latent diffusion models
- CycleGAN: Style Transfer for the COCO-OOD Ukiyo-e variant.
- StyTR2: Style Transfer for the COCO-OOD Monet-oil variant.
For questions, suggestions, or collaboration inquiries:
- Shuang Liang: tsliang2001@gmail.com
- Project Page: https://t-s-liang.github.io/SDPose
β Star us on GitHub - it motivates us a lot!
π Website | π Paper | π€ Model-Body | π€ Model-Wholebody | π€ Demo






