Post-Training to Remove Positional Bias

Using Franca's "Removal of Absolute Spatial Attributes" Post-Training (RASA) (Venkataramanan et al., 2025) to remove positional bias in of other SSL pretrained ViTs is simple and provides an increase in this downstream performance. I evaluated the performance with OverClustering (Ziegler & Asano, 2022) and obtain a 1-3% performance boost on the validation set for different ViT model sizes. Since the original codebase for RASA is not easy to reuse I adjusted it to easily put in any pre-trained encoder. I have observed an increase in performance for the DINOv2, DINOv3 encoders.

This image displays patch cosine similarity between a selected patch token and the other patches, like on page 4 of the DINOv3 paper (Siméoni et al., 2025). This quantative evaluation helps us see how well it can distinguish between object types in the image. In the visualization.ipynb I evaluate what encoder size and RASA post-training does to the performance of the model. Smaller models still struggle to produce good cosine similarities. See the visualization.ipynb notebook or test it for yourself in Google collab.

Setup

Install the packages using the requirements.txt file.

# using conda
conda create --name dino python=3.11
conda activate dino
# Run the code, adjust the ./configs/rasa.yml or argparse flags
python main.py --exp_name "rasa_vits"

Results

Pascal VOC2012
Performance on the validation set with the DINOv3 ViT-S encoder for OverClustering (Ziegler & Asano, 2022) with k={21, 100, 300} on 40/90 batches of the validation set with batch size 16 (due to compute constraints).

k	Validation mIoU	After RASA Post-Training Validation mIoU	Δ vs Original
21	15.67%	16.12%	+0.45%
100	46.56%	47.64%	+1.08%
300	59.14%	59.94%	+0.80%

Performance on the validation set with the DINOv3 ViT-B encoder for OverClustering (Ziegler & Asano, 2022) with k={21, 100, 300} on 40/90 batches of the validation set with batch size 16 (due to compute constraints).

k	Validation mIoU	After RASA Post-Training Validation mIoU	Δ vs Original
21	20.07%	21.56%	+1.56%
100	51.30%	54.22%	+2.92%
300	68.03%	66.60%	-1.43%

I picked the best weights based on the an intermediate evaluation with $k=21$. Therefore, it might work suboptimal for larger $k$.

References

Venkataramanan, S., Pariza, V., Salehi, M., Knobel, L., Gidaris, S., Ramzi, E., Bursuc, A., & Asano, Y. M. (2025). Franca: Nested Matryoshka Clustering for Scalable Visual Representation Learning (No. arXiv:2507.14137). arXiv. https://doi.org/10.48550/arXiv.2507.14137

Siméoni, O., Vo, H. V., Seitzer, M., Baldassarre, F., Oquab, M., Jose, C., Khalidov, V., Szafraniec, M., Yi, S., Ramamonjisoa, M., Massa, F., Haziza, D., Wehrstedt, L., Wang, J., Darcet, T., Moutakanni, T., Sentana, L., Roberts, C., Vedaldi, A., … Bojanowski, P. (2025). DINOv3 (No. arXiv:2508.10104). arXiv. https://doi.org/10.48550/arXiv.2508.10104

Ziegler, A., & Asano, Y. M. (2022). Self-Supervised Learning of Object Parts for Semantic Segmentation (No. arXiv:2204.13101). arXiv. https://doi.org/10.48550/arXiv.2204.13101

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
assets		assets
configs		configs
logs/output		logs/output
rasa		rasa
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
visualization.ipynb		visualization.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Post-Training to Remove Positional Bias

Setup

Results

References

About

Uh oh!

Releases

Packages

Languages

RobvanGastel/removing-pos-vit-bias

Folders and files

Latest commit

History

Repository files navigation

Post-Training to Remove Positional Bias

Setup

Results

References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages