ProCLIP: Prompt-aware of Frame Sampling for Efficient Text-Video Retrieval

Abstract:
Enabling efficient text-video retrieval on edge-end devices is critical for real-world applications. Yet, existing methods face a critical challenge in balancing accuracy and computational efficiency: uniform frame sampling methods ensure content coverage but incur prohibitive computational costs, while salient-frame sampling methods reduce overhead but suffer from query-agnostic frame selection that biases retrieval results. To address this, we propose ProCLIP, a user-centric framework that achieves state-of-the-art accuracy with significantly improved efficiency. We design a prompt-aware frame sampling strategy that dynamically guides lightweight feature extractors using textual prompts to select semantically relevant frames, overcoming the limitations of existing salient-frame sampling methods which rely on static, query-agnostic selection criteria. Moreover, we adopt a two-stage candidate pruning strategy that combines rapid coarse filtering via a lightweight module with CLIP-powered fine-grained re-ranking, enhancing retrieval efficiency while preserving accuracy. Experiments across benchmarks show ProCLIP achieves 75.3% latency reduction versus baselines while maintaining competitive accuracy, i.e., R@1=49.0 in MSR-VTT dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
fig		fig
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ProCLIP: Prompt-aware of Frame Sampling for Efficient Text-Video Retrieval

About

Uh oh!

Releases

Packages

tiffylong/ProCLIP

Folders and files

Latest commit

History

Repository files navigation

ProCLIP: Prompt-aware of Frame Sampling for Efficient Text-Video Retrieval

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages