- 🔥 [2024.12.10] Our paper is accepted by AAAI-2025 !
This repository provides the PyTorch implementation for the paper Video Repurposing from User Generated Content: A Large-scale Dataset and Benchmark. The research introduces Repurpose-10K, a large-scale dataset designed to tackle the challenge of long-to-short video repurposing. The dataset contains over 10,000 videos and 120,000+ annotated clips, making it a benchmark for automatic video repurposing.
With the rise of short-form video platforms like TikTok, Instagram Reels, and YouTube Shorts, there is a growing need to efficiently extract engaging segments from long-form content such as vlogs, interviews, and live streams. Video repurposing involves:
- Identifying highly engaging segments from long videos.
- Ensuring narrative coherence in the repurposed clips.
- Optimizing for direct publishing on social media.

To address the lack of large-scale benchmarks for this task, Repurpose-10K was created by collecting real-world user interactions on User Generated Content (UGC). The annotation process involves:
- Initial segmentation using AI-assisted tools.
- User preference voting to mark preferred clips.
- Manual refinement of timestamps by content creators.
This ensures high-quality, human-curated ground truth labels for training video repurposing models.
To ensure a smooth experience running the scripts, set up a dedicated conda
environment by executing the following commands in your terminal:
conda create -n repurpose python=3.9
conda activate repurpose
pip install -r requirements.txt
The train/validation/test splits are provided in the /data
directory. Follow these steps for data preparation:
- Download the source videos using yt-dlp.
- Extract the required features as mentioned in our paper using these repositories:
To begin training the model, use the command below:
python main.py --config_path configs/Repurpose.yaml
For model evaluation, execute the following command:
python inference.py --config_path configs/Repurpose.yaml --resume your_ckpt_path
Replace your_ckpt_path
with the actual path to your checkpoint file.
@inproceedings{wu2025video,
title={Video Repurposing from User Generated Content: A Large-scale Dataset and Benchmark},
author={Wu, Yongliang and Zhu, Wenbo and Cao, Jiawang and Lu, Yi and Li, Bozheng and Chi, Weiheng and Qiu, Zihan and Su, Lirian and Zheng, Haolin and Wu, Jay and others},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={39},
number={8},
pages={8487--8495},
year={2025}
}
We would like to extend our gratitude to the authors and contributors of the following repositories, which have been instrumental in the development of our implementation: