Skip to content

Commit 0873026

Browse files
authored
Share the codes with metric depth to support streaming metric depth (#93)
* merge depth files with metric depth * save point cloud for metric * update metric readme * remove duplicate files * Update README.md * Update README.md * optimize the point cloud saving
1 parent 02f3bd7 commit 0873026

File tree

25 files changed

+80
-2891
lines changed

25 files changed

+80
-2891
lines changed

README.md

Lines changed: 28 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -21,17 +21,27 @@ This work presents **Video Depth Anything** based on [Depth Anything V2](https:/
2121
![teaser](assets/teaser_video_v2.png)
2222

2323
## News
24+
- **2025-09-12:** Support streaming mode for metric depth models.
2425
- **2025-08-28:** Release ViT-base model for relative depth and ViT-small/base models for video metric depth.
2526
- **2025-07-03:** 🚀🚀🚀 Release an experimental version of training-free **streaming video depth estimation**.
2627
- **2025-07-03:** Release our implementation of [training loss](https://github.com/DepthAnything/Video-Depth-Anything/tree/main/loss).
27-
- **2025-04-25:** 🌟🌟🌟 Release [metric depth model](https://github.com/DepthAnything/Video-Depth-Anything/tree/main/metric_depth) based on Video-Depth-Anything-Large.
28+
- **2025-04-25:** 🌟🌟🌟 Release metric depth model based on Video-Depth-Anything-Large.
2829
- **2025-04-05:** Our paper has been accepted for a **highlight** presentation at [CVPR 2025](https://cvpr.thecvf.com/) (13.5% of the accepted papers).
2930
- **2025-03-11:** Add full dataset inference and evaluation [scripts](https://github.com/DepthAnything/Video-Depth-Anything/tree/main/benchmark).
3031
- **2025-02-08:** Enable autocast inference. Support grayscale video, NPZ and EXR output formats.
3132
- **2025-01-21:** Paper, project page, code, models, and demo are all released.
3233

3334

3435
## Release Notes
36+
- **2025-08-28:** 🚀🚀🚀 Metric depth models released
37+
38+
| δ1 | MoGe-2-L | UniDepthV2-L | DepthPro | VDA-S-Metric | VDA-B-Metric | VDA-L-Metric |
39+
|:-|:-:|:-:|:-:|:-:|:-:|:-:|
40+
| KITTI | 0.415 | **0.982** | 0.822 | 0.877 | 0.887 | *0.910* |
41+
| NYUv2 | *0.967* | **0.989** | 0.953 | 0.850| 0.883 | 0.908 |
42+
| **TAE** | | | | | | |
43+
| Scannet | 2.56 | 1.41 | 2.73 | 1.48 | *1.26* | **1.09** |
44+
3545
- **2025-02-08:** 🚀🚀🚀 Inference speed and memory usage improvement
3646
<table>
3747
<thead>
@@ -67,13 +77,14 @@ This work presents **Video Depth Anything** based on [Depth Anything V2](https:/
6777
The Latency and GPU VRAM results are obtained on a single A100 GPU with input of shape 1 x 32 x 518 × 518.
6878

6979
## Pre-trained Models
70-
We provide **sevaral models** of varying scales for robust and consistent video depth estimation. For the usage of metric depth models, please refer to [Metric Depth](./metric_depth/README.md).
80+
We provide **several models** of varying scales for robust and consistent video depth estimation.
7181

72-
| Model | Params | Checkpoint |
82+
| Relative Depth Model | Params | Checkpoint |
7383
|:-|-:|:-:|
7484
| Video-Depth-Anything-Small | 28.4M | [Download](https://huggingface.co/depth-anything/Video-Depth-Anything-Small/resolve/main/video_depth_anything_vits.pth?download=true) |
7585
| Video-Depth-Anything-Base | 113.1M | [Download](https://huggingface.co/depth-anything/Video-Depth-Anything-Base/blob/main/video_depth_anything_vitb.pth) |
7686
| Video-Depth-Anything-Large | 381.8M | [Download](https://huggingface.co/depth-anything/Video-Depth-Anything-Large/resolve/main/video_depth_anything_vitl.pth?download=true) |
87+
| **Metric Depth Model** | **Params** | **Checkpoint** |
7788
| Metric-Video-Depth-Anything-Small | 28.4M | [Download](https://huggingface.co/depth-anything/Metric-Video-Depth-Anything-Small/blob/main/metric_video_depth_anything_vits.pth) |
7889
| Metric-Video-Depth-Anything-Base | 113.1M | [Download](https://huggingface.co/depth-anything/Metric-Video-Depth-Anything-Base/blob/main/metric_video_depth_anything_vitb.pth) |
7990
| Metric-Video-Depth-Anything-Large | 381.8M | [Download](https://huggingface.co/depth-anything/Metric-Video-Depth-Anything-Large/resolve/main/metric_video_depth_anything_vitl.pth) |
@@ -94,9 +105,14 @@ Download the checkpoints listed [here](#pre-trained-models) and put them under t
94105
bash get_weights.sh
95106
```
96107

97-
### Inference a video
108+
### Run inference on a video
109+
We support both relative depth and metric depth:
98110
```bash
111+
# For relative depth
99112
python3 run.py --input_video ./assets/example_videos/davis_rollercoaster.mp4 --output_dir ./outputs --encoder vitl
113+
114+
# For metric depth
115+
python3 run.py --input_video ./assets/example_videos/davis_rollercoaster.mp4 --output_dir ./outputs --encoder vitl --metric
100116
```
101117

102118
Options:
@@ -107,17 +123,22 @@ Options:
107123
- `--encoder` (optional): `vits` for Video-Depth-Anything-Small, `vitb` for Video-Depth-Anything-Base, `vitl` for Video-Depth-Anything-Large.
108124
- `--max_len` (optional): maximum length of the input video, `-1` means no limit
109125
- `--target_fps` (optional): target fps of the input video, `-1` means the original fps
126+
- `--metric` (optional): use metric depth models trained on Virtual KITTI and IRS datasets
110127
- `--fp32` (optional): Use `fp32` precision for inference. By default, we use `fp16`.
111128
- `--grayscale` (optional): Save the grayscale depth map, without applying color palette.
112129
- `--save_npz` (optional): Save the depth map in `npz` format.
113130
- `--save_exr` (optional): Save the depth map in `exr` format.
114131

115-
### Inference a video using streaming mode (Experimental features)
132+
### Run inference on a video using streaming mode (Experimental features)
116133
We implement an experimental streaming mode **without training**. In details, we save the hidden states of temporal attentions for each frames in the caches, and only send a single frame into our video depth model during inference by reusing these past hidden states in temporal attentions. We hack our pipeline to align the original inference setting in the offline mode. Due to the inevitable gap between training and testing, we observe a **performance drop** between the streaming model and the offline model (e.g. the `d1` of ScanNet drops from `0.926` to `0.836`). Finetuning the model in the streaming mode will greatly improve the performance. We leave it for future work.
117134

118135
To run the streaming model:
119136
```bash
137+
# For relative depth
120138
python3 run_streaming.py --input_video ./assets/example_videos/davis_rollercoaster.mp4 --output_dir ./outputs_streaming --encoder vitl
139+
140+
# For metric depth
141+
python3 run_streaming.py --input_video ./assets/example_videos/davis_rollercoaster.mp4 --output_dir ./outputs_streaming --encoder vitl --metric
121142
```
122143
Options:
123144
- `--input_video`: path of input video
@@ -127,15 +148,13 @@ Options:
127148
- `--encoder` (optional): `vits` for Video-Depth-Anything-Small, `vitb` for Video-Depth-Anything-Base, `vitl` for Video-Depth-Anything-Large.
128149
- `--max_len` (optional): maximum length of the input video, `-1` means no limit
129150
- `--target_fps` (optional): target fps of the input video, `-1` means the original fps
151+
- `--metric` (optional): use metric depth models trained on Virtual KITTI and IRS datasets
130152
- `--fp32` (optional): Use `fp32` precision for inference. By default, we use `fp16`.
131153
- `--grayscale` (optional): Save the grayscale depth map, without applying color palette.
132154

133155
## Training Loss
134156
Our training loss is in `loss/` directory. Please see the `loss/test_loss.py` for usage.
135157

136-
## Fine-tuning to a metric-depth video model
137-
Please refer to [Metric Depth](./metric_depth/README.md).
138-
139158
## Benchmark
140159
Please refer to [Benchmark](./benchmark/README.md).
141160

@@ -154,4 +173,4 @@ If you find this project useful, please consider citing:
154173

155174

156175
## LICENSE
157-
Video-Depth-Anything-Small model is under the Apache-2.0 license. Video-Depth-Anything-Large model is under the CC-BY-NC-4.0 license. For business cooperation, please send an email to Hengkai Guo at guohengkaighk@gmail.com.
176+
Video-Depth-Anything-Small model is under the Apache-2.0 license. Video-Depth-Anything-Base/Large model is under the CC-BY-NC-4.0 license. For business cooperation, please send an email to Hengkai Guo at guohengkaighk@gmail.com.

metric_depth/README.md

Lines changed: 0 additions & 47 deletions
This file was deleted.

metric_depth/depth_to_pointcloud.py

Lines changed: 0 additions & 71 deletions
This file was deleted.

metric_depth/run.py

Lines changed: 0 additions & 82 deletions
This file was deleted.

0 commit comments

Comments
 (0)