⇆ Marigold-DC: Zero-Shot Monocular Depth Completion with Guided Diffusion (ICCV 2025)

This repository represents the official implementation of the paper titled "Marigold-DC: Zero-Shot Monocular Depth Completion with Guided Diffusion".

Photogrammetry and Remote Sensing team: Massimiliano Viola, Kevin Qu, Nando Metzger, Bingxin Ke, Alexander Becker, Konrad Schindler, Anton Obukhov.

🛠️ Setup

📦 Clone the repository:

git clone https://github.com/prs-eth/Marigold-DC.git
cd Marigold-DC

🐍 Create python environment:

python -m venv venv/marigold_dc

⚡ Activate the environment:

source venv/marigold_dc/bin/activate

💻 Install the dependencies:

pip install -r requirements.txt

🚀 Usage

The script performs densification of the input sparse depth, provided as a sparse numpy array, and saves the output as a dense numpy array, along with the visualization. Optimal default settings are applied. By default, it processes the teaser image and uses 100-point guidance.

🏃🏻‍♂️‍➡️ Simply run as follows:

python -m marigold_dc

🧩 Customize image and sparse depth inputs as follows:

python -m marigold_dc \
    --in-image <PATH_RGB_IMAGE> \
    --in-depth <PATH_SPARSE_DEPTH> \
    --out-depth <PATH_DENSE_DEPTH>

🛠️ Customize other settings:

--num_inference_steps <int> specifies the number of diffusion inference steps (default: 50).
--ensemble_size <int> specifies the number of predictions to be ensembled (default: 1).
--processing_resolution <int> specifies the processing resolution for the denoising process (default: 768. Using 0 means processing at original resolution).
--checkpoint <path> allows overriding the base monocular depth estimation model checkpoint; can be a local path or a Hugging Face repository.

🏋️‍♂️ Training

None — the method is purely test-time; please refer to the paper for more details.

⬇ Checkpoint cache

By default, the checkpoint is stored in the Hugging Face cache, which defaults to the home directory on Linux and Mac. This is often problematic in cluster environments. The HF_HOME environment variable defines the cache location and can be overridden, e.g.:

export HF_HOME=/large_volume/cache

🦿 Evaluation on test datasets

Set the data directory variable (needed in evaluation scripts) and download the evaluation datasets there, following the instructions in DATASETS.md to create the sparse depth maps in a reproducible way.

export BASE_DATA_DIR=<YOUR_DATA_DIR>  # e.g., ~/Marigold-DC/datasets/

Each dataset in the data directory should have the following format:

dataset_name/
├── rgb/                # RGB images (png, jpg, or jpeg)
│   ├── image_001.png
│   └── ...
├── sparse/             # Sparse depth arrays in meters (.npy files), null values have 0
│   ├── image_001.npy
│   └── ...
└── gt/                 # Ground truth dense depth arrays in meters (.npy files)
    ├── image_001.npy
    └── ...

Run inference and evaluation scripts, for example:

# Scannet
bash script/eval/11_infer_scannet.sh  # Run inference
bash script/eval/12_eval_scannet.sh   # Evaluate predictions

All scripts with the correct inference parameters are available in the script/eval/ directory.

🏎️💨 Inference speed

By default, the code runs with bfloat16 precision on supported GPUs, enabling faster inference with negligible accuracy loss while preserving sufficient gradient precision for backpropagation. Full-precision inference in float32 can be re-enabled by specifying the --use_full_precision flag.

In addition, a lightweight Tiny VAE (the suggested option for CPU processing) can also be enabled on GPU by setting the --use_tiny_vae flag. Note that this comes at the cost of prediction quality.

Compiling the model can further improve inference speed, but again at the cost of performance. This is most beneficial when the same pipeline instance is used repeatedly, and can be achieved by calling torch.compile after the pipeline has been loaded:

pipe.vae = torch.compile(pipe.vae, mode="reduce-overhead", fullgraph=True)
pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)

Below, we report average runtime on A100 and performance on the 100 samples from NYUv2 used for ablation in the main paper.

Variant	Time (50 Steps)	Speed	MAE	RMSE
float32	18.03 sec	2.77 iter/s	0.066	0.171
bfloat16 (*)	10.71 sec	4.67 iter/s	0.066	0.171
bfloat16 + compile	8.85 sec	5.65 iter/s	0.067	0.172
Tiny VAE	9.99 sec	5.00 iter/s	0.068	0.174
TinyVAE + bfloat16	5.63 sec	8.88 iter/s	0.069	0.173
TinyVAE + bfloat16 + compile	3.70 sec	13.53 iter/s	0.070	0.175

(*) Used by default

Abstract

Depth completion upgrades sparse depth measurements into dense depth maps, guided by a conventional image. Existing methods for this highly ill-posed task operate in tightly constrained settings, and tend to struggle when applied to images outside the training domain, as well as when the available depth measurements are sparse, irregularly distributed, or of varying density. Inspired by recent advances in monocular depth estimation, we reframe depth completion as image-conditional depth map generation, guided by a sparse set of measurements. Our method, Marigold-DC, builds on a pretrained latent diffusion model (LDM) for depth estimation and injects the depth observations as test-time guidance, via an optimization scheme that runs in tandem with the iterative inference of denoising diffusion. The method exhibits excellent zero-shot generalization across a diverse range of environments and handles even extremely sparse guidance effectively. Our results suggest that contemporary monodepth priors greatly robustify depth completion: it may be better to view the task as recovering dense depth from (dense) image pixels, guided by sparse depth; rather than as inpainting (sparse) depth, guided by an image.

📢 News

2025-10-16: Added options to speed up inference.
2025-10-08: Evaluation code is released.
2025-07-23: The paper is accepted at ICCV 2025.
2024-12-19: ArXiv paper and demo release.
2024-12-18: Code release (this repository).

🎓 Citation

@misc{viola2024marigolddc,
    title={Marigold-DC: Zero-Shot Monocular Depth Completion with Guided Diffusion}, 
    author={Massimiliano Viola and Kevin Qu and Nando Metzger and Bingxin Ke and Alexander Becker and Konrad Schindler and Anton Obukhov},
    year={2024},
    eprint={2412.13389},
    archivePrefix={arXiv},
    primaryClass={cs.CV},
}

🎫 License

The code of this work is licensed under the Apache License, Version 2.0 (as defined in the LICENSE).

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
data		data
doc		doc
script		script
.gitignore		.gitignore
DATASETS.md		DATASETS.md
LICENSE.txt		LICENSE.txt
README.md		README.md
marigold_dc.py		marigold_dc.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

⇆ Marigold-DC: Zero-Shot Monocular Depth Completion with Guided Diffusion (ICCV 2025)

🛠️ Setup

🚀 Usage

🏋️‍♂️ Training

⬇ Checkpoint cache

🦿 Evaluation on test datasets

🏎️💨 Inference speed

Abstract

📢 News

🎓 Citation

🎫 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

License

prs-eth/Marigold-DC

Folders and files

Latest commit

History

Repository files navigation

⇆ Marigold-DC: Zero-Shot Monocular Depth Completion with Guided Diffusion (ICCV 2025)

🛠️ Setup

🚀 Usage

🏋️‍♂️ Training

⬇ Checkpoint cache

🦿 Evaluation on test datasets

🏎️💨 Inference speed

Abstract

📢 News

🎓 Citation

🎫 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages