GitHub - KAIST-VICLab/TDSM: [ICCV 2025] Official repository of TDSM

[ICCV 2025] Bridging the Skeleton-Text Modality Gap: Diffusion-Powered Modality Alignment for Zero-shot Skeleton-based Action Recognition

Jeonghyeok Do¹ Munchurl Kim^†1

^†Corresponding author

¹Korea Advanced Institute of Science and Technology, South Korea

This repository is the official PyTorch implementation of "Bridging the Skeleton-Text Modality Gap: Diffusion-Powered Modality Alignment for Zero-shot Skeleton-based Action Recognition." Our approach, Triplet Diffusion for Skeleton-Text Matching (TDSM), outperforms the very recent state-of-the-art zero-shot skeleton-based action recognition methods with large margins, demonstrating superior accuracy and scalability in zero-shot settings through effective skeleton-text matching.

Network Architecture

📧 News

Jul 24, 2025: Youtube video about TDSM is uploaded ✨
Jul 21, 2025: Codes of TDSM are released 🔥
Jun 26, 2025: TDSM accepted to ICCV 2025 🎉
Nov 16, 2024: This repository is created

Reference

@inproceedings{do2025bridging,
  title={Bridging the Skeleton-Text Modality Gap: Diffusion-Powered Modality Alignment for Zero-shot Skeleton-based Action Recognition},
  author={Do, Jeonghyeok and Kim, Munchurl},
  booktitle={Proceedings of the IEEE/CVF international conference on computer vision},
  year={2025}
}

⚙️ Requirements

Python >= 3.9.19

PyTorch >= 2.4.0

Platforms: Ubuntu 22.04, CUDA 11.8

We have included a dependency file for our experimental environment. To install all dependencies, create a new Anaconda virtual environment and execute the provided file. Run conda env create -f requirements.yaml.

📁 Data Preparation

We follow the evaluation setup from SynSE, PURLS, and SMIE.

Download the pre-extracted skeleton features (for SynSE and SMIE settings) and class descriptions from SA-DVAE. Then, arrange them as follows:

data
  ├──sk_feats
  │   ├── shift_ntu60_5_r
  │   ├── shift_ntu60_12_r
  │   ├── shift_ntu60_20_r
  │   ├── shift_ntu60_30_r
  │   ├── shift_ntu120_10_r
  │   ├── shift_ntu120_24_r
  │   ├── shift_ntu120_40_r
  │   └── shift_ntu120_60_r
  │
  ├──label_splits
  └──class_lists
      ├── ntu60.csv
      ├── ntu60_llm.txt
      ├── ntu120.csv
      └── ntu120_llm.txt

Note: Pre-extracted skeleton features for the PURLS settings are not provided. Therefore, we extracted the skeleton features ourselves using the official Shift-GCN code.

Training

# Download code
git clone https://github.com/KAIST-VICLab/TDSM
cd TDSM

# Train TDSM on SynSE benchmarks for the NTU-60 dataset (55/5 split)
python main.py --config ./config/tdsm_ntu60_unseen5.yaml

# Train TDSM on SynSE benchmarks for the NTU-60 dataset (48/12 split)
python main.py --config ./config/tdsm_ntu60_unseen12.yaml

# Train TDSM on SynSE benchmarks for the NTU-120 dataset (110/10 split)
python main.py --config ./config/tdsm_ntu120_unseen10.yaml

# Train TDSM on SynSE benchmarks for the NTU-120 dataset (96/24 split)
python main.py --config ./config/tdsm_ntu120_unseen24.yaml

# Train TDSM on PURLS benchmarks for the NTU-60 dataset (40/20 split)
python main.py --config ./config/tdsm_ntu60_unseen20.yaml

# Train TDSM on PURLS benchmarks for the NTU-60 dataset (30/30 split)
python main.py --config ./config/tdsm_ntu60_unseen30.yaml

# Train TDSM on PURLS benchmarks for the NTU-120 dataset (80/40 split)
python main.py --config ./config/tdsm_ntu120_unseen40.yaml

# Train TDSM on PURLS benchmarks for the NTU-120 dataset (60/60 split)
python main.py --config ./config/tdsm_ntu120_unseen60.yaml

Results

Please visit our project page for more experimental results.

License

The source codes can be freely used for research and education only. Any commercial use should get formal permission from the principal investigator (Prof. Munchurl Kim, [email protected]).

Acknowledgement

This repository is built upon SkateFormer.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
assets		assets
config		config
feeders		feeders
model		model
LICENSE		LICENSE
README.md		README.md
TDSM Poster.png		TDSM Poster.png
main.py		main.py
requirements.yaml		requirements.yaml
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

[ICCV 2025] Bridging the Skeleton-Text Modality Gap: Diffusion-Powered Modality Alignment for Zero-shot Skeleton-based Action Recognition

Network Architecture

📧 News

Reference

Contents

⚙️ Requirements

📁 Data Preparation

Training

Results

License

Acknowledgement

About

Uh oh!

Languages

License

KAIST-VICLab/TDSM

Folders and files

Latest commit

History

Repository files navigation

[ICCV 2025] Bridging the Skeleton-Text Modality Gap: Diffusion-Powered Modality Alignment for Zero-shot Skeleton-based Action Recognition

Network Architecture

📧 News

Reference

Contents

⚙️ Requirements

📁 Data Preparation

Training

Results

License

Acknowledgement

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages