Skip to content

AIVFI/Video-Depth-Estimation-Rankings-and-Stereo-Video-Conversion-Rankings

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

208 Commits
 
 

Repository files navigation

Video Depth Estimation Rankings
and Stereo Video Conversion Rankings

Researchers, if you have found your way here, please consider developing a new Stereo Video Conversion model with a resolution of 1080p and 24fps based on the Wan2.5 backbone, which should soon be available to everyone.

Recently, three research papers describing new Stereo Video Conversion models appeared almost simultaneously, including two based on the Wan2.1-T2V-1.3B backbone. Unfortunately, the low resolution of 480p and 16fps does not provide full immersion, despite the fact that the Wan backbone offers significantly higher quality than the Stable Video Diffusion (SVD) backbone - see the comparison. However, I must emphasise that I am very pleased that the new models are raising the bar in terms of quality once again. So pleased, in fact, that I want more...

backbone open source resolution fps video length Stereo Video Conversion models
Wan2.1-T2V-1.3B Yes 832×480 16 5s HairGuard
StereoPilot
StereoWorld
Wan2.2-TI2V-5B Yes 1280×704 24 5s -
Wan2.2-T2V-A14B Yes 1280×720 16 5s -
Wan2.5 Yes? 1080p 24 10s -
Wan2.6 No 1080p 24 15s -

Awesome Stereo Video Conversion

The following list includes all Stereo Video Conversion methods from the last 10 months, from 1 April 2025 to 1 February 2026. This list was created because there is a significant problem with public access to the latest Stereo Video Conversion models, which makes it difficult for researchers to compare their work with the current state of the art and to use the same test set. Consequently, this also makes it difficult to present a single ranking that showcases all of the best models.

Method Backbone Submitted on
(arXiv)
     Venue      Official
  repository  
Code
(website)
HairGuard VACE (based on Wan2.1-1.3B) 6 Jan 2026 arXiv - -
StereoPilot Wan2.1-T2V-1.3B 18 Dec 2025 arXiv GitHub Stars GitHub Stars
Elastic3D SVD 16 Dec 2025 arXiv - GitHub Stars
StereoWorld Wan2.1-T2V-1.3B 10 Dec 2025 arXiv - GitHub Stars
Restereo StereoCrafter (based on SVD) 6 Jun 2025 arXiv - -
M2SVid SVD 22 May 2025 3DV - GitHub Stars
Eye2Eye Lumiere 30 Apr 2025 arXiv - GitHub Stars

Awesome Synthetic RGB-D Image Datasets for Training HD Video Depth Estimation Models

Although video depth estimation models should be trained mainly on synthetic RGB-D video datasets I decided to add two synthetic RGB-D image datasets because of their unique features.

Dataset      Venue      Resolution Unique features
1 SynthHuman
📌 Human faces 😍
ICCV 384×512 The dataset contains 98040 samples feature the face, 99976 sample feature the full body and 99992 samples feature the upper body. DAViD trained on this dataset alone achieved better depth estimation results than Depth Anything V2 Large, Depth Pro and even Sapiens-2B on the Goliath-Face test set. See the results in Table 1.
2 MegaSynth CVPR 512×512 Huge size: 700K scenes and the incredible improvement in depth estimation results of the fine-tuned Depth Anything V2 ViT-B model on MegaSynth and evaluated on Hypersim. See the results in Table 6.

Awesome Synthetic RGB-D Video Datasets for Training and Testing HD Video Depth Estimation Models

The following list contains only synthetic RGB-D datasets in which at least some of the images can be composited into a video sequence of at least 32 frames. The minimum number of frames was chosen on the basis of the ablation studies shown in Table 5 by the Video Depth Anything researchers.

Most datasets contain ready-to-use video sequences of appropriately numbered images in individual folders, but in the case of the PLT-D3 dataset, images from at least two folders have to be combined to make a longer video sequence and in the case of the ClaraVid dataset, images have to be arranged in the correct order to make a 32-frame video sequence, for example in the order given in Appendix 4: Notes for "Awesome Synthetic RGB-D Video Datasets for Training and Testing HD Video Depth Estimation Models".

Researchers, if you are going to use the following list to select datasets to train your models check their quality very carefully and choose the best ones. I have only visually checked a few of them and have marked on the list 2 datasets to check particularly carefully and 2 datasets that in my opinion are not suitable for training video depth estimation models. I have given the reasons for such markings in the same Appendix 4.

In selecting the best datasets, comparisons of their quality can be very helpful, such as in Table 9, Table 6, another Table 6 for depth estimation models and TABLE V plus TABLE IV for stereo matching models, although a similar technique can also be used for depth estimation models.

Dataset       Venue       Resolution D
A
3
T
G
C
M
o
2
D
P
S
T
2
U
D
2
V
D
A
D
2
U
P
O
M
R
D
B
o
T
1 BEDLAM2.0
📌 Human poses 😍
NeurIPS 1280×720 - - - - - - - - - - -
2 OmniWorld-Game
📌 18,515K frames 😍
ICLR 1280×720 - - - - - - - - - - -
3 C3I-SynFace
📌 Human faces 😍
DIB 640×480 - - - - - - - - - - -
4 ClaraVid ICCV 4032x3024 - - - - - - - - - - -
5 Spring CVPR 1920×1080 T T E E T - - T - - -
6 HorizonGS CVPR 1920×1080 - - - - - - - - - - -
7 PLT-D3 HD 1920×1080 - - - - - - - - - - -
8 MVS-Synth CVPR 1920×1080 T T T T T - - - - - -
9 SYNTHIA-SF BMVC 1920×1080 - - - - - - - - - - -
10 SynDrone
Check before use!
ICCVW 1920×1080 - - - - - - - - - - -
11 Mid-Air CVPRW 1024×1024 - T T - - - - - - - -
12 MatrixCity ICCV 1000×1000 T T T - - T - - - - -
13 StereoCarla arXiv 1600×900 - - - - - - - - - - -
14 LightwheelOcc - 1600×900 - - - - - - - - - - -
15 SAIL-VOS 3D CVPR 1280×800 - - - T - - - - - - -
16 SHIFT CVPR 1280×800 - - - - - - - - - - -
17 SYNTHIA-Seqs
🚫 Do not use! 🚫
CVPR 1280×760 - T T - - - - - - - -
18 BEDLAM CVPR 1280×720 - - - T T T - - - - -
19 Dynamic Replica CVPR 1280×720 - T - T T T - - T - -
20 Infinigen SV arXiv 1280×720 - - - - - - - - - - -
21 Infinigen CVPR 1280×720 - - - - - - - - - - -
22 DigiDogs
🚫 Do not use! 🚫
WACVW 1280×720 - - - - - - - - - - -
23 Aria Synthetic Environments
Check before use!
- 704×704 - - - - - - - - - - -
24 TartanGround IROS 640×640 - - - - - - - - - - -
25 TartanAir V2 - 640×640 - - - - - - - - - - -
26 BlinkVision ECCV 960×540 - - - - - - - T - - -
27 PointOdyssey ICCV 960×540 T - - - T T T T T E -
28 DyDToF CVPR 960×540 - - - - - - - - - E -
29 IRS ICME 960×540 T T T T - - T - - - -
30 Scene Flow CVPR 960×540 - E - - - - - - - - -
31 THUD++ arXiv 730×530 - - - - - - - - - - -
32 TAU Agent TCI 1024×512 T - - - - - - - - - -
33 TransPhy3D arXiv 512×512 - - - - - - - - - - -
34 3D Ken Burns TOG 512×512 T T T T - - - - - - -
35 SynPhoRest - 848×480 - - - - - - - - - - -
36 TartanAir IROS 640×480 T T T T T T T T T T -
37 ParallelDomain-4D ECCV 640×480 - - - - - - - - T - -
38 EDEN WACV 640×480 T - T T - T - - - - -
39 GTA-SfM RAL 640×480 T T T - - - - - - - -
40 InteriorNet BMVC 640×480 - - - - - - - - - - -
41 SYNTHIA-AL ICCVW 640×480 - - - - - - - - - - -
42 MPI Sintel ECCV 1024×436 E E E E E E E E E - E
43 Virtual KITTI 2 arXiv 1242×375 T T - T T - T - - - -
44 TartanAir Shibuya ICRA 640×360 - - - - - - - - - - E
Total: T (training) 11 11 9 9 7 6 4 4 4 1 0
Total: E (testing) 1 2 2 2 1 1 1 1 1 2 2

List of Rankings

Stereo Video Conversion Rankings

  1. Stereo4D (400 video clips with 16 frames each at 5 fps): LPIPS<=0.242
  2. StereoWorld-11M (1000 video clips with 81 frames each at 12 fps): LPIPS<=0.1869

Video Depth Estimation Rankings

  1. Infinigen: OPW<=0.054
  2. ScanNet (170 frames): TAE<=2.2
  3. Bonn RGB-D Dynamic (5 video clips with 110 frames each): δ1>=0.979
  4. Bonn RGB-D Dynamic (5 video clips with 110 frames each): AbsRel<=0.052

Single Image Depth Estimation Rankings

  1. DIODE: δ1>=0.953
  2. NYU-Depth V2: δ1>=0.983
  3. NYU-Depth V2: AbsRel<=0.0421

Appendices


Stereo4D (400 video clips with 16 frames each at 5 fps): LPIPS<=0.242

RK Model
Links:
         Venue   Repository    
   LPIPS ↓   
{Input fr.}
3DV
Table 1
M2SVid
1 M2SVid
3DV
0.180 {MF}
2 SVG
ICLR GitHub Stars
0.217 {MF}
3 StereoCrafter
arXiv GitHub Stars
0.242 {MF}

Back to Top Back to the List of Rankings

StereoWorld-11M (1000 video clips with 81 frames each at 12 fps): LPIPS<=0.1869

RK Model
Links:
         Venue   Repository    
   LPIPS ↓   
{Input fr.}
arXiv
Table 2
StereoWorld
1 StereoWorld
arXiv
0.0952 {MF}
2 StereoCrafter
arXiv GitHub Stars
0.1869 {MF}

Back to Top Back to the List of Rankings

Infinigen: OPW<=0.054

RK Model
Links:
         Venue   Repository    
  OPW ↓  
{Input fr.}
arXiv
Table 2
StableDPT
1 StableDPT
arXiv
0.023 {MF}
2 VDA-L
CVPR GitHub Stars
0.026 {MF}
3 Depth Anything V2 Large
NeurIPS GitHub Stars
0.039 {1}
4 FlashDepth
ICCV GitHub Stars
0.054 {MF}

Back to Top Back to the List of Rankings

ScanNet (170 frames): TAE<=2.2

RK Model
Links:
         Venue   Repository    
  TAE ↓  
{Input fr.}
CVPR
Table 1
VDA
1 VDA-L
CVPR GitHub Stars
0.570 {MF}
2 DepthCrafter
CVPR GitHub Stars
0.639 {MF}
3 Depth Any Video
ICLR GitHub Stars
0.967 {MF}
4 ChronoDepth
CVPR GitHub Stars
1.022 {MF}
5 Depth Anything V2 Large
NeurIPS GitHub Stars
1.140 {1}
6 NVDS
ICCV GitHub Stars
2.176 {4}

Back to Top Back to the List of Rankings

Bonn RGB-D Dynamic (5 video clips with 110 frames each): δ1>=0.979

📝 Note: See Figure 4

RK Model
Links:
         Venue   Repository    
     δ1 ↑     
{Input fr.}
ICCV
Table 2
ST2
     δ1 ↑     
{Input fr.}
CVPR
Table 2
Uni4D
     δ1 ↑     
{Input fr.}
CVPR
Table S1
VDA
1 SpatialTrackerV2
ICCV GitHub Stars
0.988 {MF} - -
2 Depth Pro
ICLR GitHub Stars
- 0.986 {1} -
3-4 Metric3D v2
TPAMI GitHub Stars
- 0.985 {1} -
3-4 UniDepth
CVPR GitHub Stars
- 0.985 {1} -
5 Uni4D
CVPR GitHub Stars
- 0.983 {MF} -
6 VDA-L
CVPR GitHub Stars
0.982 {MF} - 0.972 {MF}
7 Depth Any Video
ICLR GitHub Stars
- - 0.981 {MF}
8 DepthCrafter
CVPR GitHub Stars
0.979 {MF} 0.976 {MF} 0.979 {MF}

Back to Top Back to the List of Rankings

Bonn RGB-D Dynamic (5 video clips with 110 frames each): AbsRel<=0.052

📝 Note: See Figure 4

RK Model
Links:
         Venue   Repository    
  AbsRel ↓  
{Input fr.}
ICCV
Table 2
ST2
  AbsRel ↓  
{Input fr.}
CVPR
Table 2
Uni4D
  AbsRel ↓  
{Input fr.}
arXiv
Table 7
MoRE
  AbsRel ↓  
{Input fr.}
arXiv
Table 5
π3
  AbsRel ↓  
{Input fr.}
CVPR
Table S1
VDA
1 SpatialTrackerV2
ICCV GitHub Stars
0.028 {MF} - - - -
2 MegaSaM
CVPR GitHub Stars
0.037 {MF} - - - -
3 Uni4D
CVPR GitHub Stars
- 0.038 {MF} - - -
4 UniDepth
CVPR GitHub Stars
- 0.040 {1} - - -
5 MoRE
arXiv GitHub Stars
- - 0.042 {MF} - -
6 π3
arXiv GitHub Stars
- - 0.043 {MF} 0.043 {MF} -
7 Metric3D v2
TPAMI GitHub Stars
- 0.044 {1} - - -
8-9 Depth Pro
ICLR GitHub Stars
- 0.049 {1} - - -
8-9 VDA-L
CVPR GitHub Stars
0.049 {MF} - - - 0.053 {MF}
10 Depth Any Video
ICLR GitHub Stars
- - - - 0.051 {MF}
11 VGGT
CVPR GitHub Stars
0.056 {MF} - 0.052 {MF} 0.052 {MF} -

Back to Top Back to the List of Rankings

DIODE: δ1>=0.953

RK Model
Links:
         Venue   Repository    
     δ1 ↑     
{Input fr.}
arXiv
Table 4
DA3
     δ1 ↑     
{Input fr.}
ICCV
Table 2
GC
     δ1 ↑     
{Input fr.}
NeurIPS
Table 1
PPD
     δ1 ↑     
{Input fr.}
NeurIPS
Table 2
DA V2
1 DA3 Teacher
arXiv GitHub Stars
0.966 {1} - - -
2 GeometryCrafter(D)
ICCV GitHub Stars
- 0.962 {1} - -
3 Pixel-Perfect Depth 1024
NeurIPS GitHub Stars
- - 0.959 {1} -
4 Depth Anything V2 Giant
NeurIPS GitHub Stars
- - - 0.954 {1}
5 VGGT
CVPR GitHub Stars
0.953 {1} - - -

Back to Top Back to the List of Rankings

NYU-Depth V2: δ1>=0.983

RK Model
Links:
         Venue   Repository    
     δ1 ↑     
{Input fr.}
CVPR
Table A2
MoGe
     δ1 ↑     
{Input fr.}
arXiv
Table B.4
MoGe-2
     δ1 ↑     
{Input fr.}
arXiv
Table 5
DAD
     δ1 ↑     
{Input fr.}
arXiv
Table 1
BriGeS
     δ1 ↑     
{Input fr.}
CVPR
Table 2
DA
     δ1 ↑     
{Input fr.}
NeurIPS
Table 2
DA V2
1-2 UniDepth
CVPR GitHub Stars
0,987 {1} 0.987 {1} - - - -
1-2 UniDepthV2
arXiv GitHub Stars
- 0.987 {1} - - - -
3-4 MoGe
CVPR GitHub Stars
0,986 {1} 0.986 {1} - - - -
3-4 MoGe-2
arXiv GitHub Stars
- 0.986 {1} - - - -
5 Distill Any Depth
arXiv GitHub Stars
- - 0.985 {1} - - -
6-7 DA Large + BriGeS
arXiv
- - - 0.984 {1} - -
6-7 Depth Anything Large
CVPR GitHub Stars
0,984 {1} 0.984 {1} - 0.981 {1} 0.981 {1} 0.981 {1}
8 Depth Anything V2 Large
NeurIPS GitHub Stars
0,983 {1} 0.983 {1} 0.979 {1} 0.979 {1} - 0.979 {1}

Back to Top Back to the List of Rankings

NYU-Depth V2: AbsRel<=0.0421

RK Model
Links:
         Venue   Repository    
  AbsRel ↓  
{Input fr.}
arXiv
Table B.4
MoGe-2
  AbsRel ↓  
{Input fr.}
CVPR
Table A2
MoGe
  AbsRel ↓  
{Input fr.}
arXiv
Table 1
BriGeS
  AbsRel ↓  
{Input fr.}
arXiv
Table 1
BRIDGE
  AbsRel ↓  
{Input fr.}
arXiv
Table 2
FE2E
    AbsRel ↓    
{Input fr.}
NeurIPS
Table 1
PPD
    AbsRel ↓    
{Input fr.}
NeurIPS
Table 2
DA V2
  AbsRel ↓  
{Input fr.}
NeurIPS
Table 2
BD
  AbsRel ↓  
{Input fr.}
CVPR
Table 2
DA
   AbsRel ↓   
{Input fr.}
arXiv
Table 4
M3D v2
1 MoGe-2
arXiv GitHub Stars
0.0335 {1} - - - - - - - - -
2-3 MoGe
CVPR GitHub Stars
0.0338 {1} 0.0338 {1} - - - - - - - -
2-3 UniDepthV2
arXiv GitHub Stars
0.0338 {1} - - - - - - - - -
4 UniDepth
CVPR GitHub Stars
0.0378 {1} 0.0378 {1} - - - - - - - -
5 DA Large + BriGeS
arXiv
- - 0.040 {1} - - - - - - -
6-8 BRIDGE
arXiv GitHub Stars
- - - 0.041 {1} - - - - - -
6-8 FE2E
arXiv GitHub Stars
- - - - 0.041 {1} - - - - -
6-8 Pixel-Perfect Depth 1024
NeurIPS GitHub Stars
- - - - - 0.041 {1} - - - -
9 Depth Anything V2 Large
NeurIPS GitHub Stars
0.0414 {1} 0.0414 {1} 0.043 {1} 0.045 {1} 0.045 {1} 0.045 {1} 0.045 {1} - - -
10-12 BetterDepth
NeurIPS
- - - - - - - 0.042 {1} - -
10-12 Depth Anything Large
CVPR GitHub Stars
0.0420 {1} 0.0420 {1} 0.042 {1} - 0.043 {1} - 0.043 {1} 0.043 {1} 0.043 {1} 0.043 {1}
10-12 Metric3D v2 ViT-Large
TPAMI GitHub Stars
0.134 {1} 0.134 {1} - 0.058 {1} - - - - - 0.042 {1}
13 Depth Pro
ICLR GitHub Stars
0.0421 {1} - - 0.245 {1} - - - - - -

Back to Top Back to the List of Rankings

Appendix 4: Notes for "Awesome Synthetic RGB-D Video Datasets for Training and Testing HD Video Depth Estimation Models"

📝 Note 1: Example of arranging images in the correct order to make a 32-frame video sequence for the ClaraVid dataset:

<your-data-path>/008_urban_dense_1/left_rgb/45deg_low_h/00360.jpg
<your-data-path>/008_urban_dense_1/left_rgb/45deg_low_h/00320.jpg
<your-data-path>/008_urban_dense_1/left_rgb/45deg_low_h/00280.jpg
<your-data-path>/008_urban_dense_1/left_rgb/45deg_low_h/00240.jpg
<your-data-path>/008_urban_dense_1/left_rgb/45deg_low_h/00200.jpg
<your-data-path>/008_urban_dense_1/left_rgb/45deg_low_h/00160.jpg
<your-data-path>/008_urban_dense_1/left_rgb/45deg_low_h/00120.jpg
<your-data-path>/008_urban_dense_1/left_rgb/45deg_low_h/00080.jpg
<your-data-path>/008_urban_dense_1/left_rgb/45deg_low_h/00040.jpg
<your-data-path>/008_urban_dense_1/left_rgb/45deg_low_h/00000.jpg
<your-data-path>/008_urban_dense_1/left_rgb/45deg_low_h/00001.jpg
<your-data-path>/008_urban_dense_1/left_rgb/45deg_low_h/00002.jpg
<your-data-path>/008_urban_dense_1/left_rgb/45deg_low_h/00003.jpg
<your-data-path>/008_urban_dense_1/left_rgb/45deg_low_h/00004.jpg
<your-data-path>/008_urban_dense_1/left_rgb/45deg_low_h/00005.jpg
<your-data-path>/008_urban_dense_1/left_rgb/45deg_low_h/00006.jpg
<your-data-path>/008_urban_dense_1/left_rgb/45deg_low_h/00007.jpg
<your-data-path>/008_urban_dense_1/left_rgb/45deg_low_h/00008.jpg
<your-data-path>/008_urban_dense_1/left_rgb/45deg_low_h/00009.jpg
<your-data-path>/008_urban_dense_1/left_rgb/45deg_low_h/00010.jpg
<your-data-path>/008_urban_dense_1/left_rgb/45deg_low_h/00011.jpg
<your-data-path>/008_urban_dense_1/left_rgb/45deg_low_h/00012.jpg
<your-data-path>/008_urban_dense_1/left_rgb/45deg_low_h/00013.jpg
<your-data-path>/008_urban_dense_1/left_rgb/45deg_low_h/00014.jpg
<your-data-path>/008_urban_dense_1/left_rgb/45deg_low_h/00015.jpg
<your-data-path>/008_urban_dense_1/left_rgb/45deg_low_h/00016.jpg
<your-data-path>/008_urban_dense_1/left_rgb/45deg_low_h/00017.jpg
<your-data-path>/008_urban_dense_1/left_rgb/45deg_low_h/00018.jpg
<your-data-path>/008_urban_dense_1/left_rgb/45deg_low_h/00019.jpg
<your-data-path>/008_urban_dense_1/left_rgb/45deg_low_h/00059.jpg
<your-data-path>/008_urban_dense_1/left_rgb/45deg_low_h/00099.jpg
<your-data-path>/008_urban_dense_1/left_rgb/45deg_low_h/00139.jpg

📝 Note 2: Do not use the SYNTHIA-Seqs dataset for training HD video depth estimation models! The depth maps in this dataset do not match the corresponding RGB images. This is particularly evident in the example of tree leaves:
<your-data-path>/SYNTHIA-SEQS-01-SPRING/Depth/Stereo_Left/Omni_F/000071.png
<your-data-path>/SYNTHIA-SEQS-01-SPRING/RGB/Stereo_Left/Omni_F/000071.png.
📝 Note 3: Do not use the DigiDogs dataset for training HD video depth estimation models! The depth maps in this dataset do not match the corresponding RGB images. See the objects behind the campfire, the shifting position of the vegetation on the left and the clear banding on the depth map:
<your-data-path>/DigiDogs2024_full/09_22_2022/00054/images/img_00012.tiff.
📝 Note 4: Check before use the SynDrone dataset for training HD video depth estimation models! The depth maps in this dataset have large white areas of unknown depth, which should not happen with a synthetic dataset. Example depth map:
<your-data-path>/Town01_Opt_120_depth/Town01_Opt_120/ClearNoon/height20m/depth/00031.png.
📝 Note 5: Check before use the Aria Synthetic Environments dataset for training HD video depth estimation models! The depth maps in this dataset have large white areas of unknown depth, which should not happen with a synthetic dataset. Example depth map:
<your-data-path>/75/depth/depth0000109.png.

Back to Top Back to the List of Rankings

Appendix 5: List of all research papers from the above rankings

Method Abbr. Paper      Venue     
(Alt link)
Official
  repository  
BetterDepth BD BetterDepth: Plug-and-Play Diffusion Refiner for Zero-Shot Monocular Depth Estimation NeurIPS -
BRIDGE - BRIDGE - Building Reinforcement-Learning Depth-to-Image Data Generation Engine for Monocular Depth Estimation arXiv GitHub Stars
BriGeS - Bridging Geometric and Semantic Foundation Models for Generalized Monocular Depth Estimation arXiv -
ChronoDepth - Learning Temporally Consistent Video Depth from Video Diffusion Priors CVPR GitHub Stars
Depth Any Video DAV Depth Any Video with Scalable Synthetic Data ICLR GitHub Stars
Depth Anything DA Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data CVPR GitHub Stars
Depth Anything 3 DA3 Depth Anything 3: Recovering the Visual Space from Any Views arXiv GitHub Stars
Depth Anything V2 DA V2 Depth Anything V2 NeurIPS GitHub Stars
Depth Pro DP Depth Pro: Sharp Monocular Metric Depth in Less Than a Second ICLR GitHub Stars
DepthCrafter DC DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos CVPR GitHub Stars
Distill Any Depth DAD Distill Any Depth: Distillation Creates a Stronger Monocular Depth Estimator arXiv GitHub Stars
FE2E - From Editor to Dense Geometry Estimator arXiv GitHub Stars
FlashDepth - FlashDepth: Real-time Streaming Video Depth Estimation at 2K Resolution ICCV GitHub Stars
GeometryCrafter GC GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors ICCV GitHub Stars
M2SVid - M2SVid: End-to-End Inpainting and Refinement for Monocular-to-Stereo Video Conversion 3DV -
MegaSaM - MegaSaM: Accurate, Fast, and Robust Structure and Motion from Casual Dynamic Videos CVPR GitHub Stars
Metric3D v2 M3D v2 Metric3D v2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation TPAMI
arXiv
GitHub Stars
MoGe - MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision CVPR GitHub Stars
MoGe-2 Mo2 MoGe-2: Accurate Monocular Geometry with Metric Scale and Sharp Details arXiv GitHub Stars
MoRE - MoRE: 3D Visual Geometry Reconstruction Meets Mixture-of-Experts arXiv GitHub Stars
NVDS - Neural Video Depth Stabilizer ICCV GitHub Stars
Pixel-Perfect Depth PPD Pixel-Perfect Depth with Semantics-Prompted Diffusion Transformers NeurIPS GitHub Stars
SpatialTrackerV2 ST2 SpatialTrackerV2: 3D Point Tracking Made Easy ICCV GitHub Stars
StableDPT - StableDPT: Temporal Stable Monocular Video Depth Estimation arXiv -
StereoCrafter - StereoCrafter: Diffusion-based Generation of Long and High-fidelity Stereoscopic 3D from Monocular Videos arXiv GitHub Stars
StereoWorld - StereoWorld: Geometry-Aware Monocular-to-Stereo Video Generation arXiv -
SVG - SVG: 3D Stereoscopic Video Generation via Denoising Frame Matrix ICLR GitHub Stars
Uni4D - Uni4D: Unifying Visual Foundation Models for 4D Modeling from a Single Video CVPR GitHub Stars
UniDepth - UniDepth: Universal Monocular Metric Depth Estimation CVPR GitHub Stars
UniDepthV2 UD2 UniDepthV2: Universal Monocular Metric Depth Estimation Made Simpler arXiv GitHub Stars
VGGT - VGGT: Visual Geometry Grounded Transformer CVPR GitHub Stars
Video Depth Anything VDA Video Depth Anything: Consistent Depth Estimation for Super-Long Videos CVPR GitHub Stars
π3 - π3: Scalable Permutation-Equivariant Visual Geometry Learning arXiv GitHub Stars

Back to Top Back to the List of Rankings

List of research papers to be added to the rankings

Method Abbr. Paper      Venue     
(Alt link)
Official
  repository  
HairGuard - Guardians of the Hair: Rescuing Soft Boundaries in Depth, Stereo, and Novel Views arXiv -
StereoPilot - StereoPilot: Learning Unified and Efficient Stereo Conversion via Generative Priors arXiv GitHub Stars
Elastic3D - Elastic3D: Controllable Stereo Video Conversion with Guided Latent Decoding arXiv -
Restereo - Restereo: Diffusion stereo video generation and restoration arXiv -
Eye2Eye - Eye2Eye: A Simple Approach for Monocular-to-Stereo Video Synthesis arXiv -

Back to Top Back to the List of Rankings

About

BRIDGE BriGeS ChronoDepth Depth Any Video Depth Anything Depth Pro DepthCrafter Distill Any Depth Elastic3D Eye2Eye FE2E FlashDepth GeometryCrafter HairGuard M2SVid MegaSaM Metric3D MoGe MoRE NVDS Pixel-Perfect Depth Restereo SpatialTrackerV2 StableDPT StereoCrafter StereoPilot StereoWorld SVG Uni4D UniDepth VGGT Video Depth Anything π^3

Topics

Resources

Stars

Watchers

Forks

Packages