Skip to content

Commit 03d0f10

Browse files
authored
Merge pull request #30 from zenseact/gasp
Update with gasp arxiv link
2 parents 4b26ae3 + edebf8b commit 03d0f10

File tree

1 file changed

+21
-20
lines changed

1 file changed

+21
-20
lines changed

_publications/gasp/gasp.md

Lines changed: 21 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -16,19 +16,19 @@ authors:
1616
- Felsberg
1717
code: https://github.com/LiljaAdam/gasp
1818
date: 2025-03-14 00:00:00 +00:00
19-
#arxiv: https://arxiv.org/abs/2411.16816
19+
arxiv: https://arxiv.org/abs/2503.15672
2020
n_equal_contrib: 2
2121
thumbnail-img: thumbnail.gif
2222
---
2323
<div style="text-align: center; margin-bottom: 1em;">
2424
<h1>TLDR</h1>
25-
<p style="font-weight: 500; width: 70%; margin: 0 auto;">
25+
<p style="font-weight: 500; width: 70%; margin: 0 auto; min-width: 400px;">
2626
We learn a unified representation by predicting general occupancy, ego occupancy, and distilled high-level features from a vision foundation model in a continuous 4D field a self-supervised manner.
2727
In doing so, we learn representation better aligned for multiple downstream tasks in autonomous driving.
2828
</p>
2929
</div>
3030

31-
<figure class="figure__background">
31+
<figure class="figure__background" style="margin: 0;">
3232
<img style="width: 100%; margin: 0 auto 1em auto; mix-blend-mode: multiply;" src="assets/frontfig.png"/>
3333
</figure>
3434

@@ -42,12 +42,11 @@ Self-supervised pre-training based on next-token prediction has enabled large la
4242
We build upon UnO <d-cite key="agro2024uno"></d-cite> and learn to predict 4D occupancy fields. While we can learn a lot about geometry and dynamics from solely lidar supervision, we also want to learn high-level semantic features as they are crucial for downstream tasks. We therefore distill high-level features from a vision foundation model and predict them in the same way as the occupancy fields. This way, we learn a unified representation that captures both the geometric and semantic structure of the environment.
4343
<figure class="figure__background">
4444
<img style="width: 100%; margin: 1em auto; mix-blend-mode: multiply;" src="assets/method.png"/>
45-
<figcaption><b>Fig 2.:</b> Overview of GASP.</figcaption>
4645
</figure>
4746

4847
# Results
4948
## Qualitative Results
50-
**Fun things first.** Lets look at some visualizations of the learned representation. In all of these visualizations we've reduced the 4D semantic features to 2D using PCA and then projected them to the camera view and holistic view.
49+
**Fun things first.** Lets look at some visualizations of the learned representation. In all of these visualizations we've reduced the high-dimensional semantic features to RGB using PCA and then projected them to the camera view and holistic view.
5150

5251
<div>
5352
<video controls autoplay loop muted style="width: 100%;">
@@ -65,31 +64,34 @@ We build upon UnO <d-cite key="agro2024uno"></d-cite> and learn to predict 4D oc
6564
<figcaption><b>Fig 2.:</b> PCA reduced 4D semantic features projected to camera view and holistic view. Images are shown for reference.</figcaption>
6665
</div>
6766

68-
<figure class="figure__background">
69-
<img style="width: 100%; margin: 1em auto; mix-blend-mode: multiply;" src="assets/qual-path.png"/>
70-
<figcaption><b>Fig 3.:</b> Ego path probability in a three-way intersection together with PCA reduced semantic features.</figcaption>
71-
</figure>
72-
7367
<div>
7468
<video controls autoplay loop muted style="width: 100%;">
75-
<source src="assets/gasp.mp4" type="video/mp4">
69+
<source src="assets/bev-view.mp4" type="video/mp4">
7670
Your browser does not support the video tag.
7771
</video>
78-
<figcaption><b>Fig 4.:</b> Occupancy prediction overlayed with lidar input. Shows predictions into future time (3 seconds). Images and lidar input are shown for reference.</figcaption>
72+
<figcaption><b>Fig 3.:</b> Semantic features probed around the ego-vehicle, shown from BEV. Images and lidar input are shown for reference.</figcaption>
7973
</div>
74+
---
8075

76+
We can also show the ego path probability in a three-way intersection together with the PCA reduced semantic features. This visualization shows how the model has learned to predict a multimodal ego path probability.
77+
<figure>
78+
<img style="width: 100%; margin: 0 auto; mix-blend-mode: multiply;" src="assets/qual-path.png"/>
79+
<figcaption><b>Fig 4.:</b> Ego path probability in a three-way intersection together with PCA reduced semantic features.</figcaption>
80+
</figure>
81+
---
82+
Lastly, lets look when GASP predicts the evolution of the environment into the future. Here we show occupancy prediction and how it evolves 3 seconds into the future.
8183
<div>
8284
<video controls autoplay loop muted style="width: 100%;">
83-
<source src="assets/bev-view.mp4" type="video/mp4">
85+
<source src="assets/gasp.mp4" type="video/mp4">
8486
Your browser does not support the video tag.
8587
</video>
86-
<figcaption><b>Fig 4.:</b> Semantic features probed around the ego-vehicle, shown from BEV. Images and lidar input are shown for reference.</figcaption>
88+
<figcaption><b>Fig 5.:</b> Occupancy prediction overlayed with lidar input. Shows predictions into future time (3 seconds). Images and lidar input are shown for reference.</figcaption>
8789
</div>
8890

8991

9092
## Quantitative Results
91-
<div style="display: flex; justify-content: space-between; width=100%; align-items: top;">
92-
<div style="flex: 1;margin-right: 2em;">
93+
<div style="display: flex; justify-content: space-between; width: 100%; align-items: top; flex-wrap: wrap;">
94+
<div style="flex: 1; margin-right: 2em; min-width: 300px;">
9395
<p>
9496
To show that semantic information is <b>indeed crucial</b> for downstream tasks, we evaluate GASP on several downstream autonomous driving tasks, such as semantic occupancy forecasting, online mapping, and ego trajectory prediction.
9597
We compare GASP to UnO <d-cite key="agro2024uno"></d-cite> and to training from scratch. We show consistent improvements across all tasks, demonstrating the effectiveness of GASP.
@@ -103,24 +105,23 @@ We build upon UnO <d-cite key="agro2024uno"></d-cite> and learn to predict 4D oc
103105
</ul>
104106
</div>
105107

106-
<div style="width:30%; display: flex; justify-content: center; min-width: 200px;">
108+
<div style="width: 100%; max-width: 300px; margin: 0 auto; text-align: center;">
107109
<figure class="figure__background" style="margin: 0 auto;">
108110
<img style="width: 100%; mix-blend-mode: multiply;" src="assets/radar-chart.png"/>
109-
<figcaption><b>Fig 2.:</b> Green is GASP, blue is UnO, and yellow is from scratch.</figcaption>
111+
<figcaption><b>Fig 6.:</b> Green is GASP, blue is UnO, and yellow is from scratch.</figcaption>
110112
</figure>
111113
</div>
112114
</div>
113115

114116

115-
116117
---
117118

118119
# BibTeX
119120
```bibtex
120121
@article{ljungbergh2025gasp,
121122
title = {GASP: Unifying Geometric and Semantic Self-Supervised Pre-training for Autonomous Driving},
122123
author = {Ljungbergh, William and Lilja, Adam and Tonderski, Adam and Laveno Ling, Arvid and Lindstr{\"o}m, Carl and Verbeke, Willem and Fu, Junsheng and Petersson, Christoffer and Hammarstrand, Lars and Felsberg, Michael},
123-
journal = {arXiv preprint arXiv:2411.16816},
124+
journal = {arXiv preprint arXiv:2503.15672},
124125
year = {2025}
125126
}
126127
```

0 commit comments

Comments
 (0)