[ICCV 2025] Bridging the Skeleton-Text Modality Gap: Diffusion-Powered Modality Alignment for Zero-shot Skeleton-based Action Recognition
This repository is the official PyTorch implementation of "Bridging the Skeleton-Text Modality Gap: Diffusion-Powered Modality Alignment for Zero-shot Skeleton-based Action Recognition." Our approach, Triplet Diffusion for Skeleton-Text Matching (TDSM), outperforms the very recent state-of-the-art zero-shot skeleton-based action recognition methods with large margins, demonstrating superior accuracy and scalability in zero-shot settings through effective skeleton-text matching.
- Jul 24, 2025: Youtube video about TDSM is uploaded ✨
- Jul 21, 2025: Codes of TDSM are released 🔥
- Jun 26, 2025: TDSM accepted to ICCV 2025 🎉
- Nov 16, 2024: This repository is created
@inproceedings{do2025bridging,
title={Bridging the Skeleton-Text Modality Gap: Diffusion-Powered Modality Alignment for Zero-shot Skeleton-based Action Recognition},
author={Do, Jeonghyeok and Kim, Munchurl},
booktitle={Proceedings of the IEEE/CVF international conference on computer vision},
year={2025}
}
- Python >= 3.9.19
- PyTorch >= 2.4.0
- Platforms: Ubuntu 22.04, CUDA 11.8
- We have included a dependency file for our experimental environment. To install all dependencies, create a new Anaconda virtual environment and execute the provided file. Run
conda env create -f requirements.yaml
.
We follow the evaluation setup from SynSE, PURLS, and SMIE.
Download the pre-extracted skeleton features (for SynSE and SMIE settings) and class descriptions from SA-DVAE. Then, arrange them as follows:
data
├──sk_feats
│ ├── shift_ntu60_5_r
│ ├── shift_ntu60_12_r
│ ├── shift_ntu60_20_r
│ ├── shift_ntu60_30_r
│ ├── shift_ntu120_10_r
│ ├── shift_ntu120_24_r
│ ├── shift_ntu120_40_r
│ └── shift_ntu120_60_r
│
├──label_splits
└──class_lists
├── ntu60.csv
├── ntu60_llm.txt
├── ntu120.csv
└── ntu120_llm.txt
Note: Pre-extracted skeleton features for the PURLS settings are not provided. Therefore, we extracted the skeleton features ourselves using the official Shift-GCN code.
# Download code
git clone https://github.com/KAIST-VICLab/TDSM
cd TDSM
# Train TDSM on SynSE benchmarks for the NTU-60 dataset (55/5 split)
python main.py --config ./config/tdsm_ntu60_unseen5.yaml
# Train TDSM on SynSE benchmarks for the NTU-60 dataset (48/12 split)
python main.py --config ./config/tdsm_ntu60_unseen12.yaml
# Train TDSM on SynSE benchmarks for the NTU-120 dataset (110/10 split)
python main.py --config ./config/tdsm_ntu120_unseen10.yaml
# Train TDSM on SynSE benchmarks for the NTU-120 dataset (96/24 split)
python main.py --config ./config/tdsm_ntu120_unseen24.yaml
# Train TDSM on PURLS benchmarks for the NTU-60 dataset (40/20 split)
python main.py --config ./config/tdsm_ntu60_unseen20.yaml
# Train TDSM on PURLS benchmarks for the NTU-60 dataset (30/30 split)
python main.py --config ./config/tdsm_ntu60_unseen30.yaml
# Train TDSM on PURLS benchmarks for the NTU-120 dataset (80/40 split)
python main.py --config ./config/tdsm_ntu120_unseen40.yaml
# Train TDSM on PURLS benchmarks for the NTU-120 dataset (60/60 split)
python main.py --config ./config/tdsm_ntu120_unseen60.yaml
Please visit our project page for more experimental results.
The source codes can be freely used for research and education only. Any commercial use should get formal permission from the principal investigator (Prof. Munchurl Kim, [email protected]).
This repository is built upon SkateFormer.