[ICLR2026] Any-to-Bokeh: Arbitrary-Subject Video Refocusing with Video Diffusion Model

📖TL;DR: Any-to-Bokeh is a novel one-step video bokeh framework that converts arbitrary input videos into temporally coherent, depth-aware bokeh effects.

📢 News

[2026-02-01] Our papers are accepted ICLR2026! 🎉 🎉 🎉
[2025-07-11] We have officially released the model weights for public use!
You can now download the pretrained weights via the google drive.

✅ ToDo List for Any-to-Bokeh Release

Release the demo inference files
Release the inference pipeline
Release the model weights
Release the training files

🔧 Installation

conda create -n any2bokeh python=3.10 -y
conda activate any2bokeh
# The default CUDA version is 12.4, please modify it according to your configuration.

# Install pytorch. 
pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu124

# Clone repo
git clone https://github.com/vivoCameraResearch/any-to-bokeh.git
cd any2bokeh
pip install -r requirements.txt

🔥 Training

Any-to-Bokeh training is organized into three phases. Simply run the bash scripts for each stage in sequence (see the training scripts directory).

⏬ Demo Inference

We obtained 8 demos from DAVIS dataset

Download the pre-trained weights in google drive in ./checkpoints folder.
Run the demo script python test/inference_demo.py. The results will be saved in the ./output folder.

🏃 Inference Custom Video

Before bokeh rendering, two data preprocessing steps are required.

Data Preprocessing

1. Get Object Mask.

We recommend using Grounded-SAM to get the mask of the focusing target. You can generate the mask by adapting the sample code in any2bokeh_sam.py

2. Depth Prediction.

First, split the video into frames, place it in a folder, and use the utils/script_mp4.py.

python utils/split_mp4.py input.mp4

Then, install the video depth anything to get depth information for each frame by our script.

python utils/pre_process.py \
    --img_folder path/to/images \
    --mask_folder path/to/masks \  # Path to the mask obtained via sam2
    --disp_dir output/directory \

Case1: Fixed focus plane

The folder aif_folder that stores the video frames, the corresponding folder disp_folder that has been preprocessed, and the value k representing the intensity of bokeh into a CSV file in the following format (like demo.csv):

aif_folder	disp_folder	k
demo_dataset/videos/xxx	demo_dataset/disp/xxx	16

Then, run the script

python test/inference_demo.py --val_csv_path csv_file/demo.csv

Case2: Changed blur strength

First, define the blur strength k for each frame. Specifically, the filename of the depth file for each frame needs to be modified. We provide a simple modification script for this purpose.

Next, the CSV configuration for case1 should be updated to the following template(e.g., change_k_demo.csv):

aif_folder	disp_folder	k
demo_dataset/videos/xxx	demo_dataset/disp_change_k/xxx	change

Then, run the script

python test/inference_demo.py --val_csv_path csv_file/demo_change_k.csv

Case3: Changed focus plane

We use the number identified by _zf_ to represent the disparity value of the focus plane. You can customize this value for each frame to adjust the focus plane. We provide a simple modification script for this purpose.

Next, the CSV configuration is the same as in case1 (e.g., change_f_demo.csv):

aif_folder	disp_folder	k
demo_dataset/videos/xxx	demo_dataset/disp_change_f/xxx	16

Then, run the script

python test/inference_demo.py --val_csv_path csv_file/demo_change_f.csv

🚩 Metrics

We provide the VEPI metric, proposed in the paper, in edge_batch.py to assess the model’s ability to preserve detail at the edges of the focused subject. All other metrics can be obtained by running vid_metrics.py

📜 Acknowledgement

This codebase builds on SVD_Xtend. Thanks for open-sourcing! Besides, we acknowledge following great open-sourcing projects:

🌏 Citation

@article{yang2025any,
  title={Any-to-Bokeh: One-Step Video Bokeh via Multi-Plane Image Guided Diffusion},
  author={Yang, Yang and Zheng, Siming and Chen, Jinwei and Wu, Boxi and He, Xiaofei and Cai, Deng and Li, Bo and Jiang, Peng-Tao},
  journal={arXiv preprint arXiv:2505.21593},
  year={2025}
}

📧 Contact

If you have any questions and improvement suggestions, please email Yang Yang (yangyang98@zju.edu.cn), or open an issue.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
assets		assets
csv_file		csv_file
demo_dataset		demo_dataset
metrics		metrics
models		models
pipelines		pipelines
scripts		scripts
test		test
train		train
utils		utils
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
any2bokeh_sam.py		any2bokeh_sam.py
dataset_synthsis.py		dataset_synthsis.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[ICLR2026] Any-to-Bokeh: Arbitrary-Subject Video Refocusing with Video Diffusion Model

📢 News

✅ ToDo List for Any-to-Bokeh Release

🔧 Installation

🔥 Training

⏬ Demo Inference

🏃 Inference Custom Video

Data Preprocessing

1. Get Object Mask.

2. Depth Prediction.

Case1: Fixed focus plane

Case2: Changed blur strength

Case3: Changed focus plane

🚩 Metrics

📜 Acknowledgement

🌏 Citation

📧 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

[ICLR2026] Any-to-Bokeh: Arbitrary-Subject Video Refocusing with Video Diffusion Model

📢 News

✅ ToDo List for Any-to-Bokeh Release

🔧 Installation

🔥 Training

⏬ Demo Inference

🏃 Inference Custom Video

Data Preprocessing

1. Get Object Mask.

2. Depth Prediction.

Case1: Fixed focus plane

Case2: Changed blur strength

Case3: Changed focus plane

🚩 Metrics

📜 Acknowledgement

🌏 Citation

📧 Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages