SDPose: Exploiting Diffusion Priors for Out-of-Domain and Robust Pose Estimation

Shuang Liang^1,4*, Jing He³, Chuanmeizhi Wang¹, Lejun Liao², Guo Zhang¹, Ying-Cong Chen^3,5 Yuan Yuan^2†

¹Rama Alpaca Technology Company, ²Boston College, ³HKUST(GZ), ⁴The University of Hong Kong, ⁵HKUST

^*Work done during an internship at Rama Alpaca Technology. ^†Corresponding author.

📢 News

[2026-Mar-20] 📄 Revised version of our paper released on arXiv.
[2026-Mar-20] 🖼️ COCO-OOD Ukiyoe, corruption subsets released! Please refer to the dataset section of this repo.
[2025-Dec-03] 📄 Revised version of our paper released on arXiv.
[2025-Oct-28] 🧩 ComfyUI node of SDPose-OOD is now available! We sincerely thank @judian17 and @Piscesbody for their excellent contribution in developing this ComfyUI node, which enables downstream ComfyUI workflows and helps more people explore and apply our work. 🔗 Check it out here: ComfyUI-SDPose-OOD
[2025-Oct-14] 🚀 Wholebody model and Huggingface Space Demo released! You can now run SDPose demos in our Huggingface space! Check out our 🤗 SDPose Huggingface Space and 🤗 SDPose-Wholebody Model Repository.
[2025-Oct-13] 🚀 Gradio local deployment script released! You can now run SDPose demos locally on your machine.
[2025-Oct-12] 🎉 Body model, COCO-OOD validation Benchmark and inference code released! Check out our 🤗 SDPose-Body Model Repository.
[2025-Sep-29] 📄 Paper released on arXiv.

🚀 Coming Soon

We plan to release the training scripts upon the acceptance of the paper.
We plan to release new models soon! Stay tuned!
HuggingFace space demo release
WholeBody model release
Gradio local deployment script release
Body model and Inference code release
COCO-OOD Validation Benchmark release

🔥 Highlights

SDPose leverages the powerful visual priors from Stable Diffusion to achieve state-of-the-art performance in:

✅ Out-of-Domain (OOD) Generalization: Superior performance on unseen domains without fine-tuning
✅ Robust Pose Estimation: Handles challenging scenarios including occlusions, rare poses, and artistic styles
✅ Body & Wholebody Support: Supports both body keypoints (17) and wholebody keypoints (133)

🎬 Demo: Animation Video Pose Estimation in the Wild

SDPose demonstrates robust performance on animation videos.

💡 Tip: For more interactive demos and real-time inference, check out our 🤗 HuggingFace Spaces!

🎨 Visualization

Body Pose Estimation (17 Keypoints)

Wholebody Pose Estimation (133 Keypoints)

🛠️ Setup

Installation

Clone the repository

git clone https://github.com/t-s-liang/SDPose-OOD.git
cd SDPose-OOD

Create a conda environment

conda create -n SDPose python=3.10
conda activate SDPose

Install dependencies

pip install -r requirements.txt

Download Pre-trained Models

Download the pre-trained Body model checkpoint from our HuggingFace Model repository:

🤗 SDPose-Body Model 🤗 SDPose-Wholebody Model The model repository contains the checkpoint files and detailed usage instructions.

🤗 Gradio Demo

We provide interactive Gradio demos on HuggingFace Spaces:

🤗 SDPose Huggingface Space Demo

Run Gradio Demo Locally

You can now run the Gradio demo on your local machine!

Prerequisites

Since SDPose is a top-down pose estimation method, it requires an object detection model to detect humans in the image first. We recommend using YOLO11-x for robust human detection:

Download YOLO11-x model:

# Download the YOLO11-x pretrained model
wget https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11x.pt -P models/

Launch Gradio App

cd gradio_app
bash launch_gradio.sh

The Gradio interface will be available at http://localhost:7860 (or the port specified in the launch script).

🕹️ Inference

Evaluation

Use the provided evaluation script to run inference on standard pose estimation benchmarks.

Configuration

Edit scripts/eval.sh to configure the evaluation parameters:

# Dataset settings
dataset_name='COCO'              # Dataset name: COCO, HumanArt, etc.
keypoint_scheme='body'           # 'body' (17 keypoints) or 'wholebody' (133 keypoints)
dataset_root='/path/to/datasets' # Root directory of datasets
ann_file='/path/to/annotation.json' # Annotation file path

# Model settings
checkpoint_path='/path/to/checkpoint' # Path to SDPose checkpoint

# Inference settings
eval_batch_size=16               # Batch size per GPU
dataloader_num_workers=16        # Number of data loading workers

Dataset Preparation and Evaluation

For COCO evaluation, please download the precomputed person detection bounding boxes from: https://huggingface.co/noahcao/sapiens-pose-coco/tree/main/sapiens_host/pose/person_detection_results

These detection results are required for evaluation under the top-down protocol on COCO, COCO-OOD, and COCO-WholeBody.

The expected directory structure is:

${DATASET_ROOT}/
│
├── COCO/
│   ├── annotations/
│   │   ├── person_keypoints_train2017.json
│   │   ├── person_keypoints_val2017.json
│   │   ├── coco_wholebody_train_v1.0.json
│   │   └── coco_wholebody_val_v1.0.json
│   │
│   ├── train2017/
│   ├── val2017/
│   ├── val2017oil/
│   └── person_detection_results/
│       └── COCO_val2017_detections_AP_H_70_person.json
│
└── HumanArt/
    ├── annotations/
    │   └── validation_humanart.json
    └── images/

When running evaluation, the dataloader will automatically locate the correct annotation and bounding box files based on the specified dataset name:
- COCO → standard COCO validation
- COCO_OOD → COCO stylized (val2017oil)
- COCOWholebody → COCO-WholeBody validation
- COCO-OOD_Wholebody → COCO-WholeBody OOD validation
- HumanArt → HumanArt validation set

Run Evaluation

cd scripts
bash eval.sh

This will:

Load the SDPose model from the checkpoint
Run inference on the specified dataset
Compute evaluation metrics (AP, AR, etc.)
Print results to console

📊 COCO-OOD Dataset

To complement the HumanArt dataset and enable OOD evaluation under matched content and labels, we constructed COCO-OOD by applying artistic style transfer to the original COCO images.

Dataset Construction

We adopt the official CycleGAN and StyTR2 framework to perform image-to-image translation from the COCO domain (natural photographs) to the target domain of Ukiyo-e and Monet-style paintings. During conversion, all validation images in COCO are processed to produce style-transferred counterparts, while preserving their original human annotations (bounding boxes, keypoints). This yields an OOD variant of COCO in which the underlying scene structure is unchanged, but the texture, color palette, and brushstroke patterns are consistent with oil/ukiyo-e artistic style. We also utilize Nano-banana as a style transfer tool to produce color sketch versions of COCO-OOD.

Importantly, for fair comparison and to avoid introducing priors from large-scale pretrained diffusion models, we intentionally adopt the earlier StyTR2 and CycleGAN framework rather than more recent style transfer methods. Such stylization introduces a significant appearance shift while keeping pose-related geometric information intact, making it suitable for robust pose estimation evaluation.

Download

📥 Download COCO-OOD Monet Dataset from Google Drive

📥 Download COCO-OOD Corruption Dataset from Google Drive

📥 Download COCO-OOD Ukiyoe Dataset from Google Drive

🎓 Citation

If you find SDPose useful in your research, please consider citing:

@misc{liang2025sdposeexploitingdiffusionpriors,
      title={SDPose: Exploiting Diffusion Priors for Out-of-Domain and Robust Pose Estimation}, 
      author={Shuang Liang and Jing He and Chuanmeizhi Wang and Lejun Liao and Guo Zhang and Yingcong Chen and Yuan Yuan},
      year={2025},
      eprint={2509.24980},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2509.24980}, 
}

📄 License

This project is released under the MIT License.

🙏 Acknowledgements

This project is built upon the following excellent open-source projects:

MMPose: OpenMMLab pose estimation toolbox
Diffusers: HuggingFace diffusion models library
Marigold: Diffusion-based depth estimation
Lotus: Diffusion-based dense prediction
Stable Diffusion: Latent diffusion models
CycleGAN: Style Transfer for the COCO-OOD Ukiyo-e variant.
StyTR2: Style Transfer for the COCO-OOD Monet-oil variant.

📧 Contact

For questions, suggestions, or collaboration inquiries:

Shuang Liang: tsliang2001@gmail.com
Project Page: https://t-s-liang.github.io/SDPose

⭐ Star us on GitHub - it motivates us a lot!

🌐 Website | 📄 Paper | 🤗 Model-Body | 🤗 Model-Wholebody | 🤗 Demo

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
assets		assets
configs/_base_		configs/_base_
eval		eval
gradio_app		gradio_app
mmpose		mmpose
models		models
pipelines		pipelines
scripts		scripts
utils		utils
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SDPose: Exploiting Diffusion Priors for Out-of-Domain and Robust Pose Estimation

📢 News

🚀 Coming Soon

🔥 Highlights

🎬 Demo: Animation Video Pose Estimation in the Wild

🎨 Visualization

Body Pose Estimation (17 Keypoints)

Wholebody Pose Estimation (133 Keypoints)

🛠️ Setup

Installation

Download Pre-trained Models

🤗 Gradio Demo

Run Gradio Demo Locally

Prerequisites

Launch Gradio App

🕹️ Inference

Evaluation

Configuration

Dataset Preparation and Evaluation

Run Evaluation

📊 COCO-OOD Dataset

Dataset Construction

Download

🎓 Citation

📄 License

🙏 Acknowledgements

📧 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

SDPose: Exploiting Diffusion Priors for Out-of-Domain and Robust Pose Estimation

📢 News

🚀 Coming Soon

🔥 Highlights

🎬 Demo: Animation Video Pose Estimation in the Wild

🎨 Visualization

Body Pose Estimation (17 Keypoints)

Wholebody Pose Estimation (133 Keypoints)

🛠️ Setup

Installation

Download Pre-trained Models

🤗 Gradio Demo

Run Gradio Demo Locally

Prerequisites

Launch Gradio App

🕹️ Inference

Evaluation

Configuration

Dataset Preparation and Evaluation

Run Evaluation

📊 COCO-OOD Dataset

Dataset Construction

Download

🎓 Citation

📄 License

🙏 Acknowledgements

📧 Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages