Merge pull request #30 from zenseact/gasp

wljungbergh · web-flow · commit 03d0f10842d0 · 2025-03-20T19:36:28.000-07:00
Update with gasp arxiv link
diff --git a/_publications/gasp/gasp.md b/_publications/gasp/gasp.md
@@ -16,19 +16,19 @@ authors:
   - Felsberg
 code: https://github.com/LiljaAdam/gasp
 date: 2025-03-14 00:00:00 +00:00
-#arxiv: https://arxiv.org/abs/2411.16816
+arxiv: https://arxiv.org/abs/2503.15672
 n_equal_contrib: 2
 thumbnail-img: thumbnail.gif
 ---
 <div style="text-align: center; margin-bottom: 1em;">
 <h1>TLDR</h1>
-<p style="font-weight: 500; width: 70%; margin: 0 auto;">
+<p style="font-weight: 500; width: 70%; margin: 0 auto; min-width: 400px;">
 We learn a unified representation by predicting general occupancy, ego occupancy, and distilled high-level features from a vision foundation model in a continuous 4D field a self-supervised manner. 
 In doing so, we learn representation better aligned for multiple downstream tasks in autonomous driving.
 </p>
 </div>
 
-<figure class="figure__background">
+<figure class="figure__background" style="margin: 0;">
   <img style="width: 100%; margin: 0 auto 1em auto; mix-blend-mode: multiply;" src="assets/frontfig.png"/>
 </figure>
 
@@ -42,12 +42,11 @@ Self-supervised pre-training based on next-token prediction has enabled large la
 We build upon UnO <d-cite key="agro2024uno"></d-cite> and learn to predict 4D occupancy fields. While we can learn a lot about geometry and dynamics from solely lidar supervision, we also want to learn high-level semantic features as they are crucial for downstream tasks. We therefore distill high-level features from a vision foundation model and predict them in the same way as the occupancy fields. This way, we learn a unified representation that captures both the geometric and semantic structure of the environment.
 <figure class="figure__background">
   <img style="width: 100%; margin: 1em auto; mix-blend-mode: multiply;" src="assets/method.png"/>
-  <figcaption><b>Fig 2.:</b> Overview of GASP.</figcaption>
 </figure>
 
 # Results
 ## Qualitative Results
-**Fun things first.** Lets look at some visualizations of the learned representation. In all of these visualizations we've reduced the 4D semantic features to 2D using PCA and then projected them to the camera view and holistic view.
+**Fun things first.** Lets look at some visualizations of the learned representation. In all of these visualizations we've reduced the high-dimensional semantic features to RGB using PCA and then projected them to the camera view and holistic view.
 
 <div>
 <video controls autoplay loop muted style="width: 100%;">
@@ -65,31 +64,34 @@ We build upon UnO <d-cite key="agro2024uno"></d-cite> and learn to predict 4D oc
 <figcaption><b>Fig 2.:</b> PCA reduced 4D semantic features projected to camera view and holistic view. Images are shown for reference.</figcaption>
 </div>
 
-<figure class="figure__background">
-  <img style="width: 100%; margin: 1em auto; mix-blend-mode: multiply;" src="assets/qual-path.png"/>
-  <figcaption><b>Fig 3.:</b> Ego path probability in a three-way intersection together with PCA reduced semantic features.</figcaption>
-</figure>
-
 <div>
 <video controls autoplay loop muted style="width: 100%;">
-  <source src="assets/gasp.mp4" type="video/mp4">
+  <source src="assets/bev-view.mp4" type="video/mp4">
   Your browser does not support the video tag.
 </video>
-<figcaption><b>Fig 4.:</b> Occupancy prediction overlayed with lidar input. Shows predictions into future time (3 seconds). Images and lidar input are shown for reference.</figcaption>
+<figcaption><b>Fig 3.:</b> Semantic features probed around the ego-vehicle, shown from BEV. Images and lidar input are shown for reference.</figcaption>
 </div>
+---
 
+We can also show the ego path probability in a three-way intersection together with the PCA reduced semantic features. This visualization shows how the model has learned to predict a multimodal ego path probability.
+<figure>
+  <img style="width: 100%; margin: 0 auto; mix-blend-mode: multiply;" src="assets/qual-path.png"/>
+  <figcaption><b>Fig 4.:</b> Ego path probability in a three-way intersection together with PCA reduced semantic features.</figcaption>
+</figure>
+---
+Lastly, lets look when GASP predicts the evolution of the environment into the future. Here we show occupancy prediction and how it evolves 3 seconds into the future.
 <div>
 <video controls autoplay loop muted style="width: 100%;">
-  <source src="assets/bev-view.mp4" type="video/mp4">
+  <source src="assets/gasp.mp4" type="video/mp4">
   Your browser does not support the video tag.
 </video>
-<figcaption><b>Fig 4.:</b> Semantic features probed around the ego-vehicle, shown from BEV. Images and lidar input are shown for reference.</figcaption>
+<figcaption><b>Fig 5.:</b> Occupancy prediction overlayed with lidar input. Shows predictions into future time (3 seconds). Images and lidar input are shown for reference.</figcaption>
 </div>
 
 
 ## Quantitative Results
-<div style="display: flex; justify-content: space-between; width=100%; align-items: top;">
-  <div style="flex: 1;margin-right: 2em;">
+<div style="display: flex; justify-content: space-between; width: 100%; align-items: top; flex-wrap: wrap;">
+  <div style="flex: 1; margin-right: 2em; min-width: 300px;">
     <p>
     To show that semantic information is <b>indeed crucial</b> for downstream tasks, we evaluate GASP on several downstream autonomous driving tasks, such as semantic occupancy forecasting, online mapping, and ego trajectory prediction. 
     We compare GASP to UnO <d-cite key="agro2024uno"></d-cite> and to training from scratch. We show consistent improvements across all tasks, demonstrating the effectiveness of GASP.
@@ -103,24 +105,23 @@ We build upon UnO <d-cite key="agro2024uno"></d-cite> and learn to predict 4D oc
     </ul>
   </div>
 
-  <div style="width:30%; display: flex; justify-content: center; min-width: 200px;">
+  <div style="width: 100%; max-width: 300px; margin: 0 auto; text-align: center;">
     <figure class="figure__background" style="margin: 0 auto;">
       <img style="width: 100%; mix-blend-mode: multiply;" src="assets/radar-chart.png"/>
-      <figcaption><b>Fig 2.:</b> Green is GASP, blue is UnO, and yellow is from scratch.</figcaption>
+      <figcaption><b>Fig 6.:</b> Green is GASP, blue is UnO, and yellow is from scratch.</figcaption>
     </figure>
   </div>
 </div>
 
 
-
 ---
 
 # BibTeX
 ```bibtex
 @article{ljungbergh2025gasp,
   title        = {GASP: Unifying Geometric and Semantic Self-Supervised Pre-training for Autonomous Driving},
   author       = {Ljungbergh, William and Lilja, Adam and Tonderski, Adam and Laveno Ling, Arvid and Lindstr{\"o}m, Carl and Verbeke, Willem and Fu, Junsheng and Petersson, Christoffer and Hammarstrand, Lars and Felsberg, Michael},
-  journal      = {arXiv preprint arXiv:2411.16816},
+  journal      = {arXiv preprint arXiv:2503.15672},
   year         = {2025}
 }
 ```