|
29 | 29 | <link rel="stylesheet" href="./static/css/index.css" /> |
30 | 30 | <link rel="icon" href="./static/images/favicon.svg" /> |
31 | 31 |
|
| 32 | + <script |
| 33 | + id="MathJax-script" |
| 34 | + async |
| 35 | + src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js" |
| 36 | + ></script> |
32 | 37 | <script defer src="./static/js/fontawesome.all.min.js"></script> |
33 | 38 | </head> |
34 | 39 | <body> |
@@ -170,15 +175,35 @@ <h1 class="title is-1 publication-title"> |
170 | 175 | style="font-size: 0.9em; margin-top: 10px; text-align: left" |
171 | 176 | > |
172 | 177 | <strong>DITR architecture overview.</strong> |
173 | | - We extract 2D image features from a frozen DINOv2 model (blue) |
| 178 | + We extract 2D image features from a frozen DINOv2 model |
| 179 | + <span |
| 180 | + style=" |
| 181 | + display: inline-block; |
| 182 | + width: 10px; |
| 183 | + height: 10px; |
| 184 | + background-color: #dbeafe; |
| 185 | + border: 1px solid #51a2ff; |
| 186 | + border-radius: 20%; |
| 187 | + " |
| 188 | + ></span> |
174 | 189 | and unproject them (2D-to-3D) onto the 3D point cloud. The |
175 | 190 | unprojected features are subsequently max-pooled to create a |
176 | 191 | multi-scale feature hierarchy. The raw point cloud is fed |
177 | | - through a 3D backbone (yellow) and the unprojected image |
178 | | - features are added to the skip connection between the encoder |
179 | | - $\mathcal{E}_l$ and decoder $\mathcal{D}_l$ block on each |
180 | | - level. The model is then trained with the regular segmentation |
181 | | - loss. |
| 192 | + through a 3D backbone |
| 193 | + <span |
| 194 | + style=" |
| 195 | + display: inline-block; |
| 196 | + width: 10px; |
| 197 | + height: 10px; |
| 198 | + background-color: #FEF3C6; |
| 199 | + border: 1px solid #FFB900; |
| 200 | + border-radius: 20%; |
| 201 | + " |
| 202 | + ></span> |
| 203 | + and the unprojected image features are added to the skip |
| 204 | + connection between the encoder \(\mathcal{E}_l\) and decoder |
| 205 | + \(\mathcal{D}_l\) block on each level. The model is then |
| 206 | + trained with the regular segmentation loss. |
182 | 207 | </figcaption> |
183 | 208 | </div> |
184 | 209 | </div> |
@@ -229,9 +254,7 @@ <h2 class="title is-3">Abstract</h2> |
229 | 254 | > |
230 | 255 | <strong>DITR (a) and D-DITR (b).</strong> In addition to our |
231 | 256 | DITR injection approach, we also present D-DITR to distill |
232 | | - DINOv2 features into 3D semantic segmentation models that yields |
233 | | - state-of-the-art results across indoor and outdoor 3D |
234 | | - benchmarks. |
| 257 | + DINOv2 features into 3D semantic segmentation models. |
235 | 258 | </figcaption> |
236 | 259 | </figure> |
237 | 260 | </div> |
|
0 commit comments