Skip to content

Commit a9a3cfa

Browse files
committed
Exchanged image carousel for frontpage figure
1 parent 6e721df commit a9a3cfa

File tree

2 files changed

+7
-6
lines changed

2 files changed

+7
-6
lines changed

figures/pipeline.png

184 KB
Loading

index.md

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,8 @@ carousels:
4343
- image: figures/other/whale.png
4444
---
4545

46-
{% include carousel.html height="300" unit="px" number="1" %}
46+
![Tokenization pipeline](figures/pipeline.png)
47+
*Figure 1: (Left) Astandard ViT splits the image into a fixed grid of non-overlapping patches. (Right) With SPoT, an adaptively chosen subset of subpixel-precise patches are extracted.*
4748

4849
Sparsity - the fine art of doing more with less - is an attractive prospect in systems design and modeling.
4950
As models grow ever larger, sparse features alleviates the computational demands of a model to provide lower latency, lower memory overhead, and higher throughput - all indispensable properties for real-time applications.
@@ -81,7 +82,7 @@ Three specific issues arise from the ViT sparse sampling problem;
8182
These issues hinder efficient optimization of SFS under standard tokenization - in other words, we posit that **grids cannot align every salient region**.
8283

8384
![Issues with Grid Tokenization](/figures/nocover.png)
84-
*Figure 1: A $5 \times 5$ patch grid (gray) with three optimal region placements for sparse feature selection. **(a)** The green patch is well aligned (A), yellow straddles two cells (B), and red lies on a corner (C) and leaks into four cells. Translating the grid only swaps which peak is misaligned---one patch is always bad. **(b)** Our subpixel tokenizer drops fixed-size windows (\textcolor{ok}{green} squares) directly on each peak, eliminating the alignment trade-off while still allowing conventional grid tokens when they \emph{are} well aligned.*
85+
*Figure 2: A $5 \times 5$ patch grid (gray) with three optimal region placements for sparse feature selection. **(a)** The green patch is well aligned (A), yellow straddles two cells (B), and red lies on a corner (C) and leaks into four cells. Translating the grid only swaps which peak is misaligned---one patch is always bad. **(b)** Our subpixel tokenizer drops fixed-size windows (\textcolor{ok}{green} squares) directly on each peak, eliminating the alignment trade-off while still allowing conventional grid tokens when they \emph{are} well aligned.*
8586

8687

8788
## Methodology: SPoT in a Nutshell
@@ -120,7 +121,7 @@ We compare several spatial priors, each encoding different assumptions about fea
120121
- *Salient*: encodes object-centric bias by placing tokens based on regions identified as visually salient from a pretrained saliency model.
121122

122123
![Spatial Priors](/figures/spatialprior.png)
123-
*Figure 2: An illustration of different spatial priors investigated with SPoT.*
124+
*Figure 3: An illustration of different spatial priors investigated with SPoT.*
124125

125126
### Exploring Oracle Neighbourhoods with SPoT-ON
126127
In addition to investigating spatial different spatial priors, we also look to directly explore differentiable optimization for token placement.
@@ -141,7 +142,7 @@ Gradient optimization provides an *Oracle Neighborhood guided* (ON) adjustments
141142
SPoT-ON reveals locations are optimal for classifying each image, which allows us to ascertain the existence of an optimal set of positions $S$ for each image, and estimate an upper bound on performance gain from effective token sampling.
142143

143144
![Oracle Placements](/figures/placements.png)
144-
*Figure 3: Illustration of oracle placements with 25 tokens with SPoT-ON. By optimizing our oracle-neighborhood search through the model, the oracle discovers optimal placement of points, yielding an accuracy of $90.9\%$ on ImageNet1k with only $\sim12.5\%$ of the tokens. Trajectories are colored with dark purple for initial points, and endpoints colored bright yellow.*
145+
*Figure 4: Illustration of oracle placements with 25 tokens with SPoT-ON. By optimizing our oracle-neighborhood search through the model, the oracle discovers optimal placement of points, yielding an accuracy of $90.9\%$ on ImageNet1k with only $\sim12.5\%$ of the tokens. Trajectories are colored with dark purple for initial points, and endpoints colored bright yellow.*
145146

146147

147148

@@ -213,9 +214,9 @@ Our results show that center-bias in spatial priors is beneficial in sparse regi
213214
#### Performance Gap
214215

215216
![Performance Gap](/figures/gap.png)
216-
*Figure 4: We show ImageNet1k accuracy vs throughput with 5 models at four sparsity levels. The ceiling area denotes performance unlikely to be achieved given the intrinsic label noise in ImageNet. The gap highlights the margin between SPoT with optimal configuration and SPoT-ON, illustrating possible performance gain through better token placement.*
217+
*Figure 5: We show ImageNet1k accuracy vs throughput with 5 models at four sparsity levels. The ceiling area denotes performance unlikely to be achieved given the intrinsic label noise in ImageNet. The gap highlights the margin between SPoT with optimal configuration and SPoT-ON, illustrating possible performance gain through better token placement.*
217218

218-
Figure 4 shows image throughput versus accuracy, comparing SPoT with the baselines across varying sparsity levels.
219+
Figure 5 shows image throughput versus accuracy, comparing SPoT with the baselines across varying sparsity levels.
219220
As sparsity increases, throughput improves significantly, albeit with an associated trade-off in accuracy.
220221
Notably, SPoT achieves the most favorable trade-off, maintaining substantially more of the full-model accuracy while enabling higher throughput than competing approaches.
221222
We observe only slight variation in throughput between the models at each sparsity level, indicating that SPoT incurs very minimal computational overhead compared to baselines.

0 commit comments

Comments
 (0)