You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+28-9Lines changed: 28 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -21,17 +21,27 @@ This work presents **Video Depth Anything** based on [Depth Anything V2](https:/
21
21

22
22
23
23
## News
24
+
-**2025-09-12:** Support streaming mode for metric depth models.
24
25
-**2025-08-28:** Release ViT-base model for relative depth and ViT-small/base models for video metric depth.
25
26
-**2025-07-03:** 🚀🚀🚀 Release an experimental version of training-free **streaming video depth estimation**.
26
27
-**2025-07-03:** Release our implementation of [training loss](https://github.com/DepthAnything/Video-Depth-Anything/tree/main/loss).
27
-
-**2025-04-25:** 🌟🌟🌟 Release [metric depth model](https://github.com/DepthAnything/Video-Depth-Anything/tree/main/metric_depth) based on Video-Depth-Anything-Large.
28
+
-**2025-04-25:** 🌟🌟🌟 Release metric depth model based on Video-Depth-Anything-Large.
28
29
-**2025-04-05:** Our paper has been accepted for a **highlight** presentation at [CVPR 2025](https://cvpr.thecvf.com/) (13.5% of the accepted papers).
29
30
-**2025-03-11:** Add full dataset inference and evaluation [scripts](https://github.com/DepthAnything/Video-Depth-Anything/tree/main/benchmark).
30
31
-**2025-02-08:** Enable autocast inference. Support grayscale video, NPZ and EXR output formats.
31
32
-**2025-01-21:** Paper, project page, code, models, and demo are all released.
-**2025-02-08:** 🚀🚀🚀 Inference speed and memory usage improvement
36
46
<table>
37
47
<thead>
@@ -67,13 +77,14 @@ This work presents **Video Depth Anything** based on [Depth Anything V2](https:/
67
77
The Latency and GPU VRAM results are obtained on a single A100 GPU with input of shape 1 x 32 x 518 × 518.
68
78
69
79
## Pre-trained Models
70
-
We provide **sevaral models** of varying scales for robust and consistent video depth estimation. For the usage of metric depth models, please refer to [Metric Depth](./metric_depth/README.md).
80
+
We provide **several models** of varying scales for robust and consistent video depth estimation.
-`--encoder` (optional): `vits` for Video-Depth-Anything-Small, `vitb` for Video-Depth-Anything-Base, `vitl` for Video-Depth-Anything-Large.
108
124
-`--max_len` (optional): maximum length of the input video, `-1` means no limit
109
125
-`--target_fps` (optional): target fps of the input video, `-1` means the original fps
126
+
-`--metric` (optional): use metric depth models trained on Virtual KITTI and IRS datasets
110
127
-`--fp32` (optional): Use `fp32` precision for inference. By default, we use `fp16`.
111
128
-`--grayscale` (optional): Save the grayscale depth map, without applying color palette.
112
129
-`--save_npz` (optional): Save the depth map in `npz` format.
113
130
-`--save_exr` (optional): Save the depth map in `exr` format.
114
131
115
-
### Inference a video using streaming mode (Experimental features)
132
+
### Run inference on a video using streaming mode (Experimental features)
116
133
We implement an experimental streaming mode **without training**. In details, we save the hidden states of temporal attentions for each frames in the caches, and only send a single frame into our video depth model during inference by reusing these past hidden states in temporal attentions. We hack our pipeline to align the original inference setting in the offline mode. Due to the inevitable gap between training and testing, we observe a **performance drop** between the streaming model and the offline model (e.g. the `d1` of ScanNet drops from `0.926` to `0.836`). Finetuning the model in the streaming mode will greatly improve the performance. We leave it for future work.
-`--encoder` (optional): `vits` for Video-Depth-Anything-Small, `vitb` for Video-Depth-Anything-Base, `vitl` for Video-Depth-Anything-Large.
128
149
-`--max_len` (optional): maximum length of the input video, `-1` means no limit
129
150
-`--target_fps` (optional): target fps of the input video, `-1` means the original fps
151
+
-`--metric` (optional): use metric depth models trained on Virtual KITTI and IRS datasets
130
152
-`--fp32` (optional): Use `fp32` precision for inference. By default, we use `fp16`.
131
153
-`--grayscale` (optional): Save the grayscale depth map, without applying color palette.
132
154
133
155
## Training Loss
134
156
Our training loss is in `loss/` directory. Please see the `loss/test_loss.py` for usage.
135
157
136
-
## Fine-tuning to a metric-depth video model
137
-
Please refer to [Metric Depth](./metric_depth/README.md).
138
-
139
158
## Benchmark
140
159
Please refer to [Benchmark](./benchmark/README.md).
141
160
@@ -154,4 +173,4 @@ If you find this project useful, please consider citing:
154
173
155
174
156
175
## LICENSE
157
-
Video-Depth-Anything-Small model is under the Apache-2.0 license. Video-Depth-Anything-Large model is under the CC-BY-NC-4.0 license. For business cooperation, please send an email to Hengkai Guo at guohengkaighk@gmail.com.
176
+
Video-Depth-Anything-Small model is under the Apache-2.0 license. Video-Depth-Anything-Base/Large model is under the CC-BY-NC-4.0 license. For business cooperation, please send an email to Hengkai Guo at guohengkaighk@gmail.com.
0 commit comments