Skip to content

Commit cb30743

Browse files
committed
update README.md
1 parent a061298 commit cb30743

File tree

1 file changed

+50
-2
lines changed

1 file changed

+50
-2
lines changed

README.md

Lines changed: 50 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
[Leheng Li](https://len-li.github.io/)<sup>1</sup>,
1515
[Kaiqiang Zhou]()<sup>3</sup>,
1616
[Hongbo Zhang]()<sup>3</sup>,
17-
[Bingbing Liu](https://scholar.google.com/citations?user=-rCulKwAAAAJ&hl=en)<sup>3</sup>,<br>
17+
[Bingbing Liu]()<sup>3</sup>,<br>
1818
[Ying-Cong Chen](https://www.yingcong.me/)<sup>1,4&#9993;</sup>
1919

2020
<span class="author-block"><sup>1</sup>HKUST(GZ)</span>
@@ -32,6 +32,7 @@
3232
We present **Lotus**, a diffusion-based visual foundation model for dense geometry prediction. With minimal training data, Lotus achieves SoTA performance in two key geometry perception tasks, i.e., zero-shot depth and normal estimation. "Avg. Rank" indicates the average ranking across all metrics, where lower values are better. Bar length represents the amount of training data used.
3333

3434
## 📢 News
35+
- 2025-04-03: The training code of Lotus (Generative & Discriminative) is now available!
3536
- 2025-01-17: Please check out our latest models ([lotus-normal-g-v1-1](https://huggingface.co/jingheya/lotus-normal-g-v1-1), [lotus-normal-d-v1-1](https://huggingface.co/jingheya/lotus-normal-d-v1-1)), which were trained with aligned surface normals, leading to improved performance!
3637
- 2024-11-13: The demo now supports video depth estimation!
3738
- 2024-11-13: The Lotus disparity models ([Generative](https://huggingface.co/jingheya/lotus-depth-g-v2-0-disparity) & [Discriminative](https://huggingface.co/jingheya/lotus-depth-d-v2-0-disparity)) are now available, which achieve better performance!
@@ -68,7 +69,54 @@ pip install -r requirements.txt
6869
python app.py normal
6970
```
7071
71-
## 🕹️ Usage
72+
## 🔥 Training
73+
1. Initialize your Accelerate environment with:
74+
```
75+
accelerate config --config_file=$PATH_TO_ACCELERATE_CONFIG_FILE
76+
```
77+
Please make sure you have installed the accelerate package. We have tested our training scripts with the accelerate version 0.29.3.
78+
79+
2. Prepare your training data:
80+
- [Hypersim](https://github.com/apple/ml-hypersim):
81+
- Download this [script](https://github.com/apple/ml-hypersim/blob/main/contrib/99991/download.py) into your `$PATH_TO_RAW_HYPERSIM_DATA` directory for data downloading.
82+
- Run the following command to download the data:
83+
```
84+
cd $PATH_TO_RAW_HYPERSIM_DATA
85+
86+
# Download the tone-mapped images
87+
python ./download.py --contains scene_cam_ --contains final_preview --contains tonemap.jpg --silent
88+
89+
# Download the depth maps
90+
python ./download.py --contains scene_cam_ --contains geometry_hdf5 --contains depth_meters --silent
91+
92+
# Download the normal maps
93+
python ./download.py --contains scene_cam_ --contains geometry_hdf5 --contains normal --silent
94+
```
95+
- Download the split file from [here](https://github.com/apple/ml-hypersim/blob/main/evermotion_dataset/analysis/metadata_images_split_scene_v1.csv) and put it in the `$PATH_TO_RAW_HYPERSIM_DATA` directory.
96+
- Process the data with the command: `bash utils/process_hypersim.sh`.
97+
- [Virtual KITTI](https://europe.naverlabs.com/proxy-virtual-worlds-vkitti-2/):
98+
- Download the [rgb](https://download.europe.naverlabs.com//virtual_kitti_2.0.3/vkitti_2.0.3_rgb.tar), [depth](https://download.europe.naverlabs.com//virtual_kitti_2.0.3/vkitti_2.0.3_depth.tar), and [textgz](https://download.europe.naverlabs.com//virtual_kitti_2.0.3/vkitti_2.0.3_textgt.tar.gz) into the `$PATH_TO_VKITTI_DATA` directory and unzip them.
99+
- Make sure the directory structure is as follows:
100+
```
101+
SceneX/Y/frames/rgb/Camera_Z/rgb_%05d.jpg
102+
SceneX/Y/frames/depth/Camera_Z/depth_%05d.png
103+
SceneX/Y/colors.txt
104+
SceneX/Y/extrinsic.txt
105+
SceneX/Y/intrinsic.txt
106+
SceneX/Y/info.txt
107+
SceneX/Y/bbox.txt
108+
SceneX/Y/pose.txt
109+
```
110+
where $X \in \{01, 02, 06, 18, 20\}$ and represent one of 5 different locations.
111+
$Y \in \{\texttt{15-deg-left}, \texttt{15-deg-right}, \texttt{30-deg-left}, \texttt{30-deg-right}, \texttt{clone}, \texttt{fog}, \texttt{morning}, \texttt{overcast}, \texttt{rain}, \texttt{sunset}\}$ and represent the different variations.
112+
$Z \in [0, 1]$ and represent the left or right camera.
113+
Note that the indexes always start from 0.
114+
- Generate the normal maps with the command: `bash utils/depth2normal.sh`.
115+
3. Run the training command! 🚀
116+
- `bash train_scripts/train_lotus_g_{$TASK}.sh` for training Lotus Generative models;
117+
- `bash train_scripts/train_lotus_d_{$TASK}.sh` for training Lotus Discriminative models.
118+
119+
## 🕹️ Inference
72120
### Testing on your images
73121
1. Place your images in a directory, for example, under `assets/in-the-wild_example` (where we have prepared several examples).
74122
2. Run the inference command: `bash infer.sh`.

0 commit comments

Comments
 (0)