Skip to content

T-S-Liang/SDPose-OOD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

47 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

SDPose: Exploiting Diffusion Priors for Out-of-Domain and Robust Pose Estimation

Page Paper HuggingFace Model HuggingFace Model HuggingFace Space ComfyUI Node License: MIT

Shuang Liang1,4*, Jing He3, Chuanmeizhi Wang1, Lejun Liao2, Guo Zhang1, Ying-Cong Chen3,5 Yuan Yuan2†

1Rama Alpaca Technology Company, 2Boston College, 3HKUST(GZ), 4The University of Hong Kong, 5HKUST

*Work done during an internship at Rama Alpaca Technology. †Corresponding author.


πŸ“’ News

  • [2026-Mar-20] πŸ“„ Revised version of our paper released on arXiv.
  • [2026-Mar-20] πŸ–ΌοΈ COCO-OOD Ukiyoe, corruption subsets released! Please refer to the dataset section of this repo.
  • [2025-Dec-03] πŸ“„ Revised version of our paper released on arXiv.
  • [2025-Oct-28] 🧩 ComfyUI node of SDPose-OOD is now available! We sincerely thank @judian17 and @Piscesbody for their excellent contribution in developing this ComfyUI node, which enables downstream ComfyUI workflows and helps more people explore and apply our work. πŸ”— Check it out here: ComfyUI-SDPose-OOD
  • [2025-Oct-14] πŸš€ Wholebody model and Huggingface Space Demo released! You can now run SDPose demos in our Huggingface space! Check out our πŸ€— SDPose Huggingface Space and πŸ€— SDPose-Wholebody Model Repository.
  • [2025-Oct-13] πŸš€ Gradio local deployment script released! You can now run SDPose demos locally on your machine.
  • [2025-Oct-12] πŸŽ‰ Body model, COCO-OOD validation Benchmark and inference code released! Check out our πŸ€— SDPose-Body Model Repository.
  • [2025-Sep-29] πŸ“„ Paper released on arXiv.

πŸš€ Coming Soon

  • We plan to release the training scripts upon the acceptance of the paper.
  • We plan to release new models soon! Stay tuned!
  • HuggingFace space demo release
  • WholeBody model release
  • Gradio local deployment script release
  • Body model and Inference code release
  • COCO-OOD Validation Benchmark release

πŸ”₯ Highlights

SDPose leverages the powerful visual priors from Stable Diffusion to achieve state-of-the-art performance in:

  • βœ… Out-of-Domain (OOD) Generalization: Superior performance on unseen domains without fine-tuning
  • βœ… Robust Pose Estimation: Handles challenging scenarios including occlusions, rare poses, and artistic styles
  • βœ… Body & Wholebody Support: Supports both body keypoints (17) and wholebody keypoints (133)

🎬 Demo: Animation Video Pose Estimation in the Wild

SDPose demonstrates robust performance on animation videos.

πŸ’‘ Tip: For more interactive demos and real-time inference, check out our πŸ€— HuggingFace Spaces!


🎨 Visualization

Body Pose Estimation (17 Keypoints)

Wholebody Pose Estimation (133 Keypoints)


πŸ› οΈ Setup

Installation

  1. Clone the repository
git clone https://github.com/t-s-liang/SDPose-OOD.git
cd SDPose-OOD
  1. Create a conda environment
conda create -n SDPose python=3.10
conda activate SDPose
  1. Install dependencies
pip install -r requirements.txt

Download Pre-trained Models

Download the pre-trained Body model checkpoint from our HuggingFace Model repository:

πŸ€— SDPose-Body Model πŸ€— SDPose-Wholebody Model The model repository contains the checkpoint files and detailed usage instructions.

πŸ€— Gradio Demo

We provide interactive Gradio demos on HuggingFace Spaces:

Run Gradio Demo Locally

You can now run the Gradio demo on your local machine!

Prerequisites

Since SDPose is a top-down pose estimation method, it requires an object detection model to detect humans in the image first. We recommend using YOLO11-x for robust human detection:

Download YOLO11-x model:

# Download the YOLO11-x pretrained model
wget https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11x.pt -P models/

Launch Gradio App

cd gradio_app
bash launch_gradio.sh

The Gradio interface will be available at http://localhost:7860 (or the port specified in the launch script).


πŸ•ΉοΈ Inference

Evaluation

Use the provided evaluation script to run inference on standard pose estimation benchmarks.

Configuration

Edit scripts/eval.sh to configure the evaluation parameters:

# Dataset settings
dataset_name='COCO'              # Dataset name: COCO, HumanArt, etc.
keypoint_scheme='body'           # 'body' (17 keypoints) or 'wholebody' (133 keypoints)
dataset_root='/path/to/datasets' # Root directory of datasets
ann_file='/path/to/annotation.json' # Annotation file path

# Model settings
checkpoint_path='/path/to/checkpoint' # Path to SDPose checkpoint

# Inference settings
eval_batch_size=16               # Batch size per GPU
dataloader_num_workers=16        # Number of data loading workers

Dataset Preparation and Evaluation

For COCO evaluation, please download the precomputed person detection bounding boxes from: https://huggingface.co/noahcao/sapiens-pose-coco/tree/main/sapiens_host/pose/person_detection_results

These detection results are required for evaluation under the top-down protocol on COCO, COCO-OOD, and COCO-WholeBody.

The expected directory structure is:

${DATASET_ROOT}/
β”‚
β”œβ”€β”€ COCO/
β”‚   β”œβ”€β”€ annotations/
β”‚   β”‚   β”œβ”€β”€ person_keypoints_train2017.json
β”‚   β”‚   β”œβ”€β”€ person_keypoints_val2017.json
β”‚   β”‚   β”œβ”€β”€ coco_wholebody_train_v1.0.json
β”‚   β”‚   └── coco_wholebody_val_v1.0.json
β”‚   β”‚
β”‚   β”œβ”€β”€ train2017/
β”‚   β”œβ”€β”€ val2017/
β”‚   β”œβ”€β”€ val2017oil/
β”‚   └── person_detection_results/
β”‚       └── COCO_val2017_detections_AP_H_70_person.json
β”‚
└── HumanArt/
    β”œβ”€β”€ annotations/
    β”‚   └── validation_humanart.json
    └── images/

When running evaluation, the dataloader will automatically locate the correct annotation and bounding box files based on the specified dataset name:
- COCO β†’ standard COCO validation
- COCO_OOD β†’ COCO stylized (val2017oil)
- COCOWholebody β†’ COCO-WholeBody validation
- COCO-OOD_Wholebody β†’ COCO-WholeBody OOD validation
- HumanArt β†’ HumanArt validation set

Run Evaluation

cd scripts
bash eval.sh

This will:

  1. Load the SDPose model from the checkpoint
  2. Run inference on the specified dataset
  3. Compute evaluation metrics (AP, AR, etc.)
  4. Print results to console

πŸ“Š COCO-OOD Dataset

To complement the HumanArt dataset and enable OOD evaluation under matched content and labels, we constructed COCO-OOD by applying artistic style transfer to the original COCO images.

Dataset Construction

We adopt the official CycleGAN and StyTR2 framework to perform image-to-image translation from the COCO domain (natural photographs) to the target domain of Ukiyo-e and Monet-style paintings. During conversion, all validation images in COCO are processed to produce style-transferred counterparts, while preserving their original human annotations (bounding boxes, keypoints). This yields an OOD variant of COCO in which the underlying scene structure is unchanged, but the texture, color palette, and brushstroke patterns are consistent with oil/ukiyo-e artistic style. We also utilize Nano-banana as a style transfer tool to produce color sketch versions of COCO-OOD.

Importantly, for fair comparison and to avoid introducing priors from large-scale pretrained diffusion models, we intentionally adopt the earlier StyTR2 and CycleGAN framework rather than more recent style transfer methods. Such stylization introduces a significant appearance shift while keeping pose-related geometric information intact, making it suitable for robust pose estimation evaluation.

Download

πŸ“₯ Download COCO-OOD Monet Dataset from Google Drive

πŸ“₯ Download COCO-OOD Corruption Dataset from Google Drive

πŸ“₯ Download COCO-OOD Ukiyoe Dataset from Google Drive


πŸŽ“ Citation

If you find SDPose useful in your research, please consider citing:

@misc{liang2025sdposeexploitingdiffusionpriors,
      title={SDPose: Exploiting Diffusion Priors for Out-of-Domain and Robust Pose Estimation}, 
      author={Shuang Liang and Jing He and Chuanmeizhi Wang and Lejun Liao and Guo Zhang and Yingcong Chen and Yuan Yuan},
      year={2025},
      eprint={2509.24980},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2509.24980}, 
}

πŸ“„ License

This project is released under the MIT License.


πŸ™ Acknowledgements

This project is built upon the following excellent open-source projects:

  • MMPose: OpenMMLab pose estimation toolbox
  • Diffusers: HuggingFace diffusion models library
  • Marigold: Diffusion-based depth estimation
  • Lotus: Diffusion-based dense prediction
  • Stable Diffusion: Latent diffusion models
  • CycleGAN: Style Transfer for the COCO-OOD Ukiyo-e variant.
  • StyTR2: Style Transfer for the COCO-OOD Monet-oil variant.

πŸ“§ Contact

For questions, suggestions, or collaboration inquiries:


⭐ Star us on GitHub - it motivates us a lot!

🌐 Website | πŸ“„ Paper | πŸ€— Model-Body | πŸ€— Model-Wholebody | πŸ€— Demo

About

The official implementation of SDPose: Exploiting Diffusion Priors for Out-of-Domain and Robust Pose Estimation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages