ICCV 2025 Accepance Rate of 24% = 2699 / 11239
注1:欢迎各位大佬提交issue,分享ICCV 2025论文和开源项目!
注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision
欢迎扫码加入【CVer学术交流群】,可以获取ICCV 2025等最前沿工作!这是最大的计算机视觉AI知识星球!每日更新,第一时间分享最新最前沿的计算机视觉、AIGC、扩散模型、多模态、深度学习、自动驾驶、医疗影像和遥感等方向的学习资料,快加入学起来!
- 3DGS(Gaussian Splatting)
- Agent)
- Avatars
- Backbone
- CLIP
- Mamba
- Embodied AI
- GAN
- GNN
- 多模态大语言模型(MLLM)
- 大语言模型(LLM)
- 世界模型(World Model)
- OCR
- NeRF
- DETR
- 扩散模型(Diffusion Models)
- ReID(重识别)
- 长尾分布(Long-Tail)
- Vision Transformer
- 视觉和语言(Vision-Language)
- 自监督学习(Self-supervised Learning)
- 数据增强(Data Augmentation)
- 目标检测(Object Detection)
- 异常检测(Anomaly Detection)
- 目标跟踪(Visual Tracking)
- 语义分割(Semantic Segmentation)
- 实例分割(Instance Segmentation)
- 全景分割(Panoptic Segmentation)
- 医学图像(Medical Image)
- 医学图像分割(Medical Image Segmentation)
- 视频目标分割(Video Object Segmentation)
- 视频实例分割(Video Instance Segmentation)
- 参考图像分割(Referring Image Segmentation)
- 图像抠图(Image Matting)
- 图像编辑(Image Editing)
- Low-level Vision
- 超分辨率(Super-Resolution)
- 去噪(Denoising)
- 去模糊(Deblur)
- 自动驾驶(Autonomous Driving)
- 3D点云(3D Point Cloud)
- 3D目标检测(3D Object Detection)
- 3D语义分割(3D Semantic Segmentation)
- 3D目标跟踪(3D Object Tracking)
- 3D语义场景补全(3D Semantic Scene Completion)
- 3D配准(3D Registration)
- 3D人体姿态估计(3D Human Pose Estimation)
- 3D人体Mesh估计(3D Human Mesh Estimation)
- 3D Visual Grounding(3D视觉定位)
- 医学图像(Medical Image)
- 图像生成(Image Generation)
- 视频生成(Video Generation)
- 3D生成(3D Generation)
- 视频理解(Video Understanding)
- 行为检测(Action Detection)
- 具身智能(Embodied AI)
- 文本检测(Text Detection)
- 知识蒸馏(Knowledge Distillation)
- 模型剪枝(Model Pruning)
- 图像压缩(Image Compression)
- 三维重建(3D Reconstruction)
- 深度估计(Depth Estimation)
- 轨迹预测(Trajectory Prediction)
- 车道线检测(Lane Detection)
- 图像描述(Image Captioning)
- 视觉问答(Visual Question Answering)
- 手语识别(Sign Language Recognition)
- 视频预测(Video Prediction)
- 新视点合成(Novel View Synthesis)
- Zero-Shot Learning(零样本学习)
- 立体匹配(Stereo Matching)
- 特征匹配(Feature Matching)
- 暗光图像增强(Low-light Image Enhancement)
- 场景图生成(Scene Graph Generation)
- 风格迁移(Style Transfer)
- 隐式神经表示(Implicit Neural Representations)
- 图像质量评价(Image Quality Assessment)
- 视频质量评价(Video Quality Assessment)
- 压缩感知(Compressive Sensing)
- 数据集(Datasets)
- 新任务(New Tasks)
- 其他(Others)
TinyViM: Frequency Decoupling for Tiny Hybrid Vision Mamba
TinyViM: Frequency Decoupling for Tiny Hybrid Vision Mamba
Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers
- Project:https://tiger-ai-lab.github.io/Vamba/
- Paper:https://arxiv.org/abs/2503.11579
- Code:https://github.com/TIGER-AI-Lab/Vamba
FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers
- Paper: https://arxiv.org/abs/2501.16297
- Code: https://github.com/JiuTian-VL/JiuTian-FALCON
- Project: https://jiutian-vl.github.io/FALCON.github.io/
Medical World Model: Generative Simulation of Tumor Evolution for Treatment Planning
- Project: https://yijun-yang.github.io/MeWM/
- Paper: https://arxiv.org/abs/2506.02327
- Code: https://github.com/scott-yjyang/MeWM
From Reusing to Forecasting: Accelerating Diffusion Models with TaylorSeers
Medical World Model: Generative Simulation of Tumor Evolution for Treatment Planning
- Project: https://yijun-yang.github.io/MeWM/
- Paper: https://arxiv.org/abs/2506.02327
- Code: https://github.com/scott-yjyang/MeWM
Where, What, Why: Towards Explainable Driver Attention Prediction
- Paper: https://arxiv.org/abs/2506.23088
- Code: https://github.com/yuchen2199/Explainable-Driver-Attention-Prediction
- Project: https://github.com/yuchen2199/Explainable-Driver-Attention-Prediction
ROADWork Dataset: Learning to Recognize, Observe, Analyze and Drive Through Work Zones
- Paper: https://arxiv.org/abs/2406.07661
- Code: https://github.com/anuragxel/roadwork-dataset
- Project: https://www.cs.cmu.edu/~ILIM/roadwork_dataset/
DriveMM: All-in-One Large Multimodal Model for Autonomous Driving
- Project: https://zhijian11.github.io/DriveMM/
- Paper: https://arxiv.org/abs/2412.07689
- Code: https://github.com/zhijian11/DriveMM
EAMamba: Efficient All-Around Vision State Space Model for Image Restoration
#3D Visual Grounding(3D视觉定位)
DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models
Rethinking the Spatial and Temporal Redundancy for Efficient Image Editing
- Project: https://eff-edit.github.io
- Paper: https://arxiv.org/abs/2503.10270
- Code: https://github.com/yuriYanZeXuan/EEdit
Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers
- Project:https://tiger-ai-lab.github.io/Vamba/
- Paper:https://arxiv.org/abs/2503.11579
- Code:https://github.com/TIGER-AI-Lab/Vamba
ROADWork Dataset: Learning to Recognize, Observe, Analyze and Drive Through Work Zones
- Paper: https://arxiv.org/abs/2406.07661
- Code: https://github.com/anuragxel/roadwork-dataset
- Project: https://www.cs.cmu.edu/~ILIM/roadwork_dataset/
Music Grounding by Short Video
- Project: https://rucmm.github.io/VMMR/
- Paper: https://arxiv.org/abs/2408.16990
- Code link: https://github.com/xxayt/MGSV