You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We learn a unified representation by predicting general occupancy, ego occupancy, and distilled high-level features from a vision foundation model in a continuous 4D field a self-supervised manner.
27
27
In doing so, we learn representation better aligned for multiple downstream tasks in autonomous driving.
@@ -42,12 +42,11 @@ Self-supervised pre-training based on next-token prediction has enabled large la
42
42
We build upon UnO <d-citekey="agro2024uno"></d-cite> and learn to predict 4D occupancy fields. While we can learn a lot about geometry and dynamics from solely lidar supervision, we also want to learn high-level semantic features as they are crucial for downstream tasks. We therefore distill high-level features from a vision foundation model and predict them in the same way as the occupancy fields. This way, we learn a unified representation that captures both the geometric and semantic structure of the environment.
<figcaption><b>Fig 2.:</b> Overview of GASP.</figcaption>
46
45
</figure>
47
46
48
47
# Results
49
48
## Qualitative Results
50
-
**Fun things first.** Lets look at some visualizations of the learned representation. In all of these visualizations we've reduced the 4D semantic features to 2D using PCA and then projected them to the camera view and holistic view.
49
+
**Fun things first.** Lets look at some visualizations of the learned representation. In all of these visualizations we've reduced the high-dimensional semantic features to RGB using PCA and then projected them to the camera view and holistic view.
<figcaption><b>Fig 4.:</b> Occupancy prediction overlayed with lidar input. Shows predictions into future time (3 seconds). Images and lidar input are shown for reference.</figcaption>
72
+
<figcaption><b>Fig 3.:</b> Semantic features probed around the ego-vehicle, shown from BEV. Images and lidar input are shown for reference.</figcaption>
79
73
</div>
74
+
---
80
75
76
+
We can also show the ego path probability in a three-way intersection together with the PCA reduced semantic features. This visualization shows how the model has learned to predict a multimodal ego path probability.
<figcaption><b>Fig 4.:</b> Ego path probability in a three-way intersection together with PCA reduced semantic features.</figcaption>
80
+
</figure>
81
+
---
82
+
Lastly, lets look when GASP predicts the evolution of the environment into the future. Here we show occupancy prediction and how it evolves 3 seconds into the future.
<figcaption><b>Fig 4.:</b> Semantic features probed around the ego-vehicle, shown from BEV. Images and lidar input are shown for reference.</figcaption>
88
+
<figcaption><b>Fig 5.:</b> Occupancy prediction overlayed with lidar input. Shows predictions into future time (3 seconds). Images and lidar input are shown for reference.</figcaption>
To show that semantic information is <b>indeed crucial</b> for downstream tasks, we evaluate GASP on several downstream autonomous driving tasks, such as semantic occupancy forecasting, online mapping, and ego trajectory prediction.
95
97
We compare GASP to UnO <d-cite key="agro2024uno"></d-cite> and to training from scratch. We show consistent improvements across all tasks, demonstrating the effectiveness of GASP.
@@ -103,24 +105,23 @@ We build upon UnO <d-cite key="agro2024uno"></d-cite> and learn to predict 4D oc
<figcaption><b>Fig 2.:</b> Green is GASP, blue is UnO, and yellow is from scratch.</figcaption>
111
+
<figcaption><b>Fig 6.:</b> Green is GASP, blue is UnO, and yellow is from scratch.</figcaption>
110
112
</figure>
111
113
</div>
112
114
</div>
113
115
114
116
115
-
116
117
---
117
118
118
119
# BibTeX
119
120
```bibtex
120
121
@article{ljungbergh2025gasp,
121
122
title = {GASP: Unifying Geometric and Semantic Self-Supervised Pre-training for Autonomous Driving},
122
123
author = {Ljungbergh, William and Lilja, Adam and Tonderski, Adam and Laveno Ling, Arvid and Lindstr{\"o}m, Carl and Verbeke, Willem and Fu, Junsheng and Petersson, Christoffer and Hammarstrand, Lars and Felsberg, Michael},
0 commit comments