|
14 | 14 | [Leheng Li](https://len-li.github.io/)<sup>1</sup>,
|
15 | 15 | [Kaiqiang Zhou]()<sup>3</sup>,
|
16 | 16 | [Hongbo Zhang]()<sup>3</sup>,
|
17 |
| -[Bingbing Liu](https://scholar.google.com/citations?user=-rCulKwAAAAJ&hl=en)<sup>3</sup>,<br> |
| 17 | +[Bingbing Liu]()<sup>3</sup>,<br> |
18 | 18 | [Ying-Cong Chen](https://www.yingcong.me/)<sup>1,4✉</sup>
|
19 | 19 |
|
20 | 20 | <span class="author-block"><sup>1</sup>HKUST(GZ)</span>
|
|
32 | 32 | We present **Lotus**, a diffusion-based visual foundation model for dense geometry prediction. With minimal training data, Lotus achieves SoTA performance in two key geometry perception tasks, i.e., zero-shot depth and normal estimation. "Avg. Rank" indicates the average ranking across all metrics, where lower values are better. Bar length represents the amount of training data used.
|
33 | 33 |
|
34 | 34 | ## 📢 News
|
| 35 | +- 2025-04-03: The training code of Lotus (Generative & Discriminative) is now available! |
35 | 36 | - 2025-01-17: Please check out our latest models ([lotus-normal-g-v1-1](https://huggingface.co/jingheya/lotus-normal-g-v1-1), [lotus-normal-d-v1-1](https://huggingface.co/jingheya/lotus-normal-d-v1-1)), which were trained with aligned surface normals, leading to improved performance!
|
36 | 37 | - 2024-11-13: The demo now supports video depth estimation!
|
37 | 38 | - 2024-11-13: The Lotus disparity models ([Generative](https://huggingface.co/jingheya/lotus-depth-g-v2-0-disparity) & [Discriminative](https://huggingface.co/jingheya/lotus-depth-d-v2-0-disparity)) are now available, which achieve better performance!
|
@@ -68,7 +69,54 @@ pip install -r requirements.txt
|
68 | 69 | python app.py normal
|
69 | 70 | ```
|
70 | 71 |
|
71 |
| -## 🕹️ Usage |
| 72 | +## 🔥 Training |
| 73 | +1. Initialize your Accelerate environment with: |
| 74 | + ``` |
| 75 | + accelerate config --config_file=$PATH_TO_ACCELERATE_CONFIG_FILE |
| 76 | + ``` |
| 77 | + Please make sure you have installed the accelerate package. We have tested our training scripts with the accelerate version 0.29.3. |
| 78 | +
|
| 79 | +2. Prepare your training data: |
| 80 | +- [Hypersim](https://github.com/apple/ml-hypersim): |
| 81 | + - Download this [script](https://github.com/apple/ml-hypersim/blob/main/contrib/99991/download.py) into your `$PATH_TO_RAW_HYPERSIM_DATA` directory for data downloading. |
| 82 | + - Run the following command to download the data: |
| 83 | + ``` |
| 84 | + cd $PATH_TO_RAW_HYPERSIM_DATA |
| 85 | +
|
| 86 | + # Download the tone-mapped images |
| 87 | + python ./download.py --contains scene_cam_ --contains final_preview --contains tonemap.jpg --silent |
| 88 | +
|
| 89 | + # Download the depth maps |
| 90 | + python ./download.py --contains scene_cam_ --contains geometry_hdf5 --contains depth_meters --silent |
| 91 | +
|
| 92 | + # Download the normal maps |
| 93 | + python ./download.py --contains scene_cam_ --contains geometry_hdf5 --contains normal --silent |
| 94 | + ``` |
| 95 | + - Download the split file from [here](https://github.com/apple/ml-hypersim/blob/main/evermotion_dataset/analysis/metadata_images_split_scene_v1.csv) and put it in the `$PATH_TO_RAW_HYPERSIM_DATA` directory. |
| 96 | + - Process the data with the command: `bash utils/process_hypersim.sh`. |
| 97 | +- [Virtual KITTI](https://europe.naverlabs.com/proxy-virtual-worlds-vkitti-2/): |
| 98 | + - Download the [rgb](https://download.europe.naverlabs.com//virtual_kitti_2.0.3/vkitti_2.0.3_rgb.tar), [depth](https://download.europe.naverlabs.com//virtual_kitti_2.0.3/vkitti_2.0.3_depth.tar), and [textgz](https://download.europe.naverlabs.com//virtual_kitti_2.0.3/vkitti_2.0.3_textgt.tar.gz) into the `$PATH_TO_VKITTI_DATA` directory and unzip them. |
| 99 | + - Make sure the directory structure is as follows: |
| 100 | + ``` |
| 101 | + SceneX/Y/frames/rgb/Camera_Z/rgb_%05d.jpg |
| 102 | + SceneX/Y/frames/depth/Camera_Z/depth_%05d.png |
| 103 | + SceneX/Y/colors.txt |
| 104 | + SceneX/Y/extrinsic.txt |
| 105 | + SceneX/Y/intrinsic.txt |
| 106 | + SceneX/Y/info.txt |
| 107 | + SceneX/Y/bbox.txt |
| 108 | + SceneX/Y/pose.txt |
| 109 | + ``` |
| 110 | + where $X \in \{01, 02, 06, 18, 20\}$ and represent one of 5 different locations. |
| 111 | + $Y \in \{\texttt{15-deg-left}, \texttt{15-deg-right}, \texttt{30-deg-left}, \texttt{30-deg-right}, \texttt{clone}, \texttt{fog}, \texttt{morning}, \texttt{overcast}, \texttt{rain}, \texttt{sunset}\}$ and represent the different variations. |
| 112 | + $Z \in [0, 1]$ and represent the left or right camera. |
| 113 | + Note that the indexes always start from 0. |
| 114 | + - Generate the normal maps with the command: `bash utils/depth2normal.sh`. |
| 115 | +3. Run the training command! 🚀 |
| 116 | + - `bash train_scripts/train_lotus_g_{$TASK}.sh` for training Lotus Generative models; |
| 117 | + - `bash train_scripts/train_lotus_d_{$TASK}.sh` for training Lotus Discriminative models. |
| 118 | +
|
| 119 | +## 🕹️ Inference |
72 | 120 | ### Testing on your images
|
73 | 121 | 1. Place your images in a directory, for example, under `assets/in-the-wild_example` (where we have prepared several examples).
|
74 | 122 | 2. Run the inference command: `bash infer.sh`.
|
|
0 commit comments