Investigations into the information content and causal impact of CLS tokens

This work is motivated by an observation in my Test-Time-Training experiments: information in the CLS tokens of vision transformer models is not used by later layers. This can be seen by shuffling the CLS tokens in large test-set batches, which has no effect on the accuracy of classification.

Part 1:

Decodable vs causal information in the CLS tokens of ViTs https://lrast.github.io/science/2026/01/20/lab_notes.html

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
scripts		scripts
src		src
.gitignore		.gitignore
0.1-available_vs_used_info.ipynb		0.1-available_vs_used_info.ipynb
0.1.5-model_experiments.ipynb		0.1.5-model_experiments.ipynb
1.0-randomization_and_readout.ipynb		1.0-randomization_and_readout.ipynb
1.1-multiple_readouts.ipynb		1.1-multiple_readouts.ipynb
1.1.5-decoder_train.ipynb		1.1.5-decoder_train.ipynb
metadata.json		metadata.json
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Investigations into the information content and causal impact of CLS tokens

Part 1:

About

Uh oh!

Releases

Packages

Languages

lrast/CLSTokenAnalysis

Folders and files

Latest commit

History

Repository files navigation

Investigations into the information content and causal impact of CLS tokens

Part 1:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages