Skip to content

Commit 3faeb65

Browse files
authored
Update the README (#88)
* Update README.md * Update README.md for metric depth * Update README.md to remove V2 * Remove V2 * Update README.md
1 parent 393a32f commit 3faeb65

File tree

2 files changed

+25
-28
lines changed

2 files changed

+25
-28
lines changed

README.md

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ This work presents **Video Depth Anything** based on [Depth Anything V2](https:/
2121
![teaser](assets/teaser_video_v2.png)
2222

2323
## News
24-
- **2025-08-28:** Release Video-Depth-Anything-Base and corresponding metric model.
24+
- **2025-08-28:** Release ViT-base model for relative depth and ViT-small/base models for video metric depth.
2525
- **2025-07-03:** 🚀🚀🚀 Release an experimental version of training-free **streaming video depth estimation**.
2626
- **2025-07-03:** Release our implementation of [training loss](https://github.com/DepthAnything/Video-Depth-Anything/tree/main/loss).
2727
- **2025-04-25:** 🌟🌟🌟 Release [metric depth model](https://github.com/DepthAnything/Video-Depth-Anything/tree/main/metric_depth) based on Video-Depth-Anything-Large.
@@ -49,14 +49,14 @@ This work presents **Video Depth Anything** based on [Depth Anything V2](https:/
4949
</thead>
5050
<tbody>
5151
<tr>
52-
<td>Video-Depth-Anything-V2-Small</td>
52+
<td>Video-Depth-Anything-Small</td>
5353
<td>9.1</td>
5454
<td><strong>7.5</strong></td>
5555
<td>7.3</td>
5656
<td><strong>6.8</strong></td>
5757
</tr>
5858
<tr>
59-
<td>Video-Depth-Anything-V2-Large</td>
59+
<td>Video-Depth-Anything-Large</td>
6060
<td>67</td>
6161
<td><strong>14</strong></td>
6262
<td>26.7</td>
@@ -67,16 +67,16 @@ This work presents **Video Depth Anything** based on [Depth Anything V2](https:/
6767
The Latency and GPU VRAM results are obtained on a single A100 GPU with input of shape 1 x 32 x 518 × 518.
6868

6969
## Pre-trained Models
70-
We provide **two models** of varying scales for robust and consistent video depth estimation:
70+
We provide **sevaral models** of varying scales for robust and consistent video depth estimation. For the usage of metric depth models, please refer to [Metric Depth](./metric_depth/README.md).
7171

7272
| Model | Params | Checkpoint |
7373
|:-|-:|:-:|
74-
| Video-Depth-Anything-V2-Small | 28.4M | [Download](https://huggingface.co/depth-anything/Video-Depth-Anything-Small/resolve/main/video_depth_anything_vits.pth?download=true) |
75-
| Video-Depth-Anything-V2-Base | 113.1M | [Download](https://huggingface.co/depth-anything/Video-Depth-Anything-Base/blob/main/video_depth_anything_vitb.pth) |
76-
| Video-Depth-Anything-V2-Large | 381.8M | [Download](https://huggingface.co/depth-anything/Video-Depth-Anything-Large/resolve/main/video_depth_anything_vitl.pth?download=true) |
77-
| Video-Depth-Anything-V2-Small-Metric | 28.4M | [Download](https://huggingface.co/depth-anything/Metric-Video-Depth-Anything-Small/blob/main/metric_video_depth_anything_vits.pth) |
78-
| Video-Depth-Anything-V2-Base-Metric | 113.1M | [Download](https://huggingface.co/depth-anything/Metric-Video-Depth-Anything-Base/blob/main/metric_video_depth_anything_vitb.pth) |
79-
| Video-Depth-Anything-V2-Large-Metric | 381.8M | [Download](https://huggingface.co/depth-anything/Metric-Video-Depth-Anything-Large/resolve/main/metric_video_depth_anything_vitl.pth) |
74+
| Video-Depth-Anything-Small | 28.4M | [Download](https://huggingface.co/depth-anything/Video-Depth-Anything-Small/resolve/main/video_depth_anything_vits.pth?download=true) |
75+
| Video-Depth-Anything-Base | 113.1M | [Download](https://huggingface.co/depth-anything/Video-Depth-Anything-Base/blob/main/video_depth_anything_vitb.pth) |
76+
| Video-Depth-Anything-Large | 381.8M | [Download](https://huggingface.co/depth-anything/Video-Depth-Anything-Large/resolve/main/video_depth_anything_vitl.pth?download=true) |
77+
| Metric-Video-Depth-Anything-Small | 28.4M | [Download](https://huggingface.co/depth-anything/Metric-Video-Depth-Anything-Small/blob/main/metric_video_depth_anything_vits.pth) |
78+
| Metric-Video-Depth-Anything-Base | 113.1M | [Download](https://huggingface.co/depth-anything/Metric-Video-Depth-Anything-Base/blob/main/metric_video_depth_anything_vitb.pth) |
79+
| Metric-Video-Depth-Anything-Large | 381.8M | [Download](https://huggingface.co/depth-anything/Metric-Video-Depth-Anything-Large/resolve/main/metric_video_depth_anything_vitl.pth) |
8080

8181

8282
## Usage
@@ -104,7 +104,7 @@ Options:
104104
- `--output_dir`: path to save the output results
105105
- `--input_size` (optional): By default, we use input size `518` for model inference.
106106
- `--max_res` (optional): By default, we use maximum resolution `1280` for model inference.
107-
- `--encoder` (optional): `vits` for Video-Depth-Anything-V2-Small, `vitl` for Video-Depth-Anything-V2-Large.
107+
- `--encoder` (optional): `vits` for Video-Depth-Anything-Small, `vitb` for Video-Depth-Anything-Base, `vitl` for Video-Depth-Anything-Large.
108108
- `--max_len` (optional): maximum length of the input video, `-1` means no limit
109109
- `--target_fps` (optional): target fps of the input video, `-1` means the original fps
110110
- `--fp32` (optional): Use `fp32` precision for inference. By default, we use `fp16`.
@@ -124,7 +124,7 @@ Options:
124124
- `--output_dir`: path to save the output results
125125
- `--input_size` (optional): By default, we use input size `518` for model inference.
126126
- `--max_res` (optional): By default, we use maximum resolution `1280` for model inference.
127-
- `--encoder` (optional): `vits` for Video-Depth-Anything-V2-Small, `vitl` for Video-Depth-Anything-V2-Large.
127+
- `--encoder` (optional): `vits` for Video-Depth-Anything-Small, `vitb` for Video-Depth-Anything-Base, `vitl` for Video-Depth-Anything-Large.
128128
- `--max_len` (optional): maximum length of the input video, `-1` means no limit
129129
- `--target_fps` (optional): target fps of the input video, `-1` means the original fps
130130
- `--fp32` (optional): Use `fp32` precision for inference. By default, we use `fp16`.

metric_depth/README.md

Lines changed: 13 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -2,46 +2,43 @@
22
We here provide a simple demo for our fine-tuned Video-Depth-Anything metric model. We fine-tune our pre-trained model on Virtual KITTI and IRS datasets for metric depth estimation.
33

44
# Pre-trained Models
5-
We provide our large model:
5+
We provide three models for metric video depth estimation:
66

77
| Base Model | Params | Checkpoint |
88
|:-|-:|:-:|
9-
| Metric-Video-Depth-Anything-V2-Large | 381.8M | [Download](https://huggingface.co/depth-anything/Metric-Video-Depth-Anything-Large/resolve/main/metric_video_depth_anything_vitl.pth) |
10-
| Metric-Video-Depth-Anything-V2-base | 113.1M | [Download](https://huggingface.co/depth-anything/Metric-Video-Depth-Anything-Base/blob/main/metric_video_depth_anything_vitb.pth) |
11-
| Metric-Video-Depth-Anything-V2-Small | 28.4M | [Download](https://huggingface.co/depth-anything/Metric-Video-Depth-Anything-Small/blob/main/metric_video_depth_anything_vits.pth) |
9+
| Metric-Video-Depth-Anything-Small | 28.4M | [Download](https://huggingface.co/depth-anything/Metric-Video-Depth-Anything-Small/blob/main/metric_video_depth_anything_vits.pth) |
10+
| Metric-Video-Depth-Anything-Base | 113.1M | [Download](https://huggingface.co/depth-anything/Metric-Video-Depth-Anything-Base/blob/main/metric_video_depth_anything_vitb.pth) |
11+
| Metric-Video-Depth-Anything-Large | 381.8M | [Download](https://huggingface.co/depth-anything/Metric-Video-Depth-Anything-Large/resolve/main/metric_video_depth_anything_vitl.pth) |
1212

1313
# Metric depth evaluation
14-
We evaluate our model on KITTI and NYU datasets for video metric depth. The evaluation results are as follows.
14+
We evaluate our models for video metric depth without aligning the scale. The evaluation results are as follows.
1515

16-
| δ1 | MogeV2-L | UnidepthV2-L | DepthPro | VDA-S-Metric | VDA-B-Metric | VDA-L-Metric |
16+
| δ1 | MoGe-2-L | UniDepthV2-L | DepthPro | VDA-S-Metric | VDA-B-Metric | VDA-L-Metric |
1717
|:-|:-:|:-:|:-:|:-:|:-:|:-:|
1818
| KITTI | 0.415 | **0.982** | 0.822 | 0.877 | 0.887 | *0.910* |
19-
| NYU_v2 | *0.967* | **0.989** | 0.953 | 0.850| 0.883 | 0.908 |
19+
| NYUv2 | *0.967* | **0.989** | 0.953 | 0.850| 0.883 | 0.908 |
2020

21-
| tae | MogeV2-L | UnidepthV2-L | DepthPro | VDA-S-Metric | VDA-B-Metric | VDA-L-Metric |
21+
| TAE | MoGe-2-L | UniDepthV2-L | DepthPro | VDA-S-Metric | VDA-B-Metric | VDA-L-Metric |
2222
|:-|:-:|:-:|:-:|:-:|:-:|:-:|
2323
| Scannet | 2.56 | 1.41 | 2.73 | 1.48 | *1.26* | **1.09** |
2424

2525

2626
# Usage
2727
## Preparation
28-
```bash
29-
git clone https://github.com/DepthAnything/Video-Depth-Anything.git
30-
cd Video-Depth-Anything
31-
pip3 install -r requirements.txt
32-
cd metric_depth
33-
```
34-
Download the checkpoints and put them under the `checkpoints` directory.
28+
Download the checkpoints and put them under the `metric_depth/checkpoints` directory.
3529

3630
## Use our models
3731
### Running script on video
3832
```bash
33+
cd metric_depth
3934
python3 run.py \
4035
--input_video <YOUR_VIDEO_PATH> \
41-
--output_dir <YOUR_OUTPUT_DIR>
36+
--output_dir <YOUR_OUTPUT_DIR> \
37+
--encoder vitl
4238
```
4339
### Project video to point clouds
4440
```bash
41+
cd metric_depth
4542
python3 depth_to_pointcloud.py \
4643
--input_video <YOUR_VIDEO_PATH> \
4744
--output_dir <YOUR_OUTPUT_DIR> \

0 commit comments

Comments
 (0)