GitHub - yongliang-wu/Repurpose: [AAAI2025] Video Repurposing from User Generated Content: A Large-scale Dataset and Benchmark

Video Repurposing from User Generated Content: A Large-scale Dataset and Benchmark

News

🔥 [2024.12.10] Our paper is accepted by AAAI-2025 !

Introduction

This repository provides the PyTorch implementation for the paper Video Repurposing from User Generated Content: A Large-scale Dataset and Benchmark. The research introduces Repurpose-10K, a large-scale dataset designed to tackle the challenge of long-to-short video repurposing. The dataset contains over 10,000 videos and 120,000+ annotated clips, making it a benchmark for automatic video repurposing.

What is Video Repurposing?

With the rise of short-form video platforms like TikTok, Instagram Reels, and YouTube Shorts, there is a growing need to efficiently extract engaging segments from long-form content such as vlogs, interviews, and live streams. Video repurposing involves:

Identifying highly engaging segments from long videos.
Ensuring narrative coherence in the repurposed clips.
Optimizing for direct publishing on social media.

About Repurpose-10K

To address the lack of large-scale benchmarks for this task, Repurpose-10K was created by collecting real-world user interactions on User Generated Content (UGC). The annotation process involves:

Initial segmentation using AI-assisted tools.
User preference voting to mark preferred clips.
Manual refinement of timestamps by content creators.

This ensures high-quality, human-curated ground truth labels for training video repurposing models.

Getting Started

Setting Up Your Environment

To ensure a smooth experience running the scripts, set up a dedicated conda environment by executing the following commands in your terminal:

conda create -n repurpose python=3.9
conda activate repurpose
pip install -r requirements.txt

Preparing Your Data

The train/validation/test splits are provided in the /data directory. Follow these steps for data preparation:

Download the source videos using yt-dlp.
Extract the required features as mentioned in our paper using these repositories:

Training Your Model

To begin training the model, use the command below:

python main.py --config_path configs/Repurpose.yaml

For model evaluation, execute the following command:

python inference.py --config_path configs/Repurpose.yaml --resume your_ckpt_path

Replace your_ckpt_path with the actual path to your checkpoint file.

Citation

@inproceedings{wu2025video,
  title={Video Repurposing from User Generated Content: A Large-scale Dataset and Benchmark},
  author={Wu, Yongliang and Zhu, Wenbo and Cao, Jiawang and Lu, Yi and Li, Bozheng and Chi, Weiheng and Qiu, Zihan and Su, Lirian and Zheng, Haolin and Wu, Jay and others},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={39},
  number={8},
  pages={8487--8495},
  year={2025}
}

Acknowledgments

We would like to extend our gratitude to the authors and contributors of the following repositories, which have been instrumental in the development of our implementation:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Video Repurposing from User Generated Content: A Large-scale Dataset and Benchmark

News

Introduction

What is Video Repurposing?

About Repurpose-10K

Getting Started

Setting Up Your Environment

Preparing Your Data

Training Your Model

Citation

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
configs		configs
data		data
dataset		dataset
models		models
utils		utils
LICENSE		LICENSE
README.md		README.md
inference.py		inference.py
main.py		main.py
requirements.txt		requirements.txt

License

yongliang-wu/Repurpose

Folders and files

Latest commit

History

Repository files navigation

Video Repurposing from User Generated Content: A Large-scale Dataset and Benchmark

News

Introduction

What is Video Repurposing?

About Repurpose-10K

Getting Started

Setting Up Your Environment

Preparing Your Data

Training Your Model

Citation

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages