Skip to content

vivoCameraResearch/any-to-bokeh

Repository files navigation

[ICLR2026] Any-to-Bokeh: Arbitrary-Subject Video Refocusing with Video Diffusion Model

arxiv  page  License

overview

📖TL;DR: Any-to-Bokeh is a novel one-step video bokeh framework that converts arbitrary input videos into temporally coherent, depth-aware bokeh effects.

📢 News

  • [2026-02-01] Our papers are accepted ICLR2026! 🎉 🎉 🎉
  • [2025-07-11] We have officially released the model weights for public use!
    You can now download the pretrained weights via the google drive.

✅ ToDo List for Any-to-Bokeh Release

  • Release the demo inference files
  • Release the inference pipeline
  • Release the model weights
  • Release the training files

🔧 Installation

conda create -n any2bokeh python=3.10 -y
conda activate any2bokeh
# The default CUDA version is 12.4, please modify it according to your configuration.

# Install pytorch. 
pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu124

# Clone repo
git clone https://github.com/vivoCameraResearch/any-to-bokeh.git
cd any2bokeh
pip install -r requirements.txt

🔥 Training

Any-to-Bokeh training is organized into three phases. Simply run the bash scripts for each stage in sequence (see the training scripts directory).

⏬ Demo Inference

We obtained 8 demos from DAVIS dataset

  1. Download the pre-trained weights in google drive in ./checkpoints folder.
  2. Run the demo script python test/inference_demo.py. The results will be saved in the ./output folder.

🏃 Inference Custom Video

Before bokeh rendering, two data preprocessing steps are required.

Data Preprocessing

1. Get Object Mask.

We recommend using Grounded-SAM to get the mask of the focusing target. You can generate the mask by adapting the sample code in any2bokeh_sam.py

2. Depth Prediction.

First, split the video into frames, place it in a folder, and use the utils/script_mp4.py.

python utils/split_mp4.py input.mp4

Then, install the video depth anything to get depth information for each frame by our script.

python utils/pre_process.py \
    --img_folder path/to/images \
    --mask_folder path/to/masks \  # Path to the mask obtained via sam2
    --disp_dir output/directory \

Case1: Fixed focus plane

The folder aif_folder that stores the video frames, the corresponding folder disp_folder that has been preprocessed, and the value k representing the intensity of bokeh into a CSV file in the following format (like demo.csv):

aif_folder disp_folder k
demo_dataset/videos/xxx demo_dataset/disp/xxx 16

Then, run the script

python test/inference_demo.py --val_csv_path csv_file/demo.csv

Case2: Changed blur strength

First, define the blur strength k for each frame. Specifically, the filename of the depth file for each frame needs to be modified. We provide a simple modification script for this purpose.

Next, the CSV configuration for case1 should be updated to the following template(e.g., change_k_demo.csv):

aif_folder disp_folder k
demo_dataset/videos/xxx demo_dataset/disp_change_k/xxx change

Then, run the script

python test/inference_demo.py --val_csv_path csv_file/demo_change_k.csv

Case3: Changed focus plane

We use the number identified by _zf_ to represent the disparity value of the focus plane. You can customize this value for each frame to adjust the focus plane. We provide a simple modification script for this purpose.

Next, the CSV configuration is the same as in case1 (e.g., change_f_demo.csv):

aif_folder disp_folder k
demo_dataset/videos/xxx demo_dataset/disp_change_f/xxx 16

Then, run the script

python test/inference_demo.py --val_csv_path csv_file/demo_change_f.csv

🚩 Metrics

We provide the VEPI metric, proposed in the paper, in edge_batch.py to assess the model’s ability to preserve detail at the edges of the focused subject. All other metrics can be obtained by running vid_metrics.py

📜 Acknowledgement

This codebase builds on SVD_Xtend. Thanks for open-sourcing! Besides, we acknowledge following great open-sourcing projects:

🌏 Citation

@article{yang2025any,
  title={Any-to-Bokeh: One-Step Video Bokeh via Multi-Plane Image Guided Diffusion},
  author={Yang, Yang and Zheng, Siming and Chen, Jinwei and Wu, Boxi and He, Xiaofei and Cai, Deng and Li, Bo and Jiang, Peng-Tao},
  journal={arXiv preprint arXiv:2505.21593},
  year={2025}
}

📧 Contact

If you have any questions and improvement suggestions, please email Yang Yang (yangyang98@zju.edu.cn), or open an issue.

About

[ICLR2026] Any-to-Bokeh is a novel one-step video bokeh framework that converts arbitrary input videos into temporally coherent, depth-aware bokeh effects.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors