Skip to content

mattf1n/do-not-attend

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Plan

  1. ID multi-token words in a document [mul][ti][ple] For each layer, attention head.
  2. Find maximum attn score on mul over all tokens following ple.
  3. Find maximum attn score on ple over all token following ple.

We expect attn on ple to be higher. Is this true?

References

Feucht, Sheridan, David Atkinson, Byron C. Wallace, and David Bau. 2024. “Token Erasure as a Footprint of Implicit Vocabulary Items in LLMs.” EMNLP, 9727–39. https://aclanthology.org/2024.emnlp-main.543.

Kallini, Julie, Shikhar Murty, Christopher D Manning, Christopher Potts, and Róbert Csordás. 2025. “MrT5: Dynamic Token Merging for Efficient Byte-Level Language Models.” The Thirteenth International Conference on Learning Representations. https://openreview.net/forum?id=VYWBMq1L7H.

Kamoda, Go, Benjamin Heinzerling, Tatsuro Inaba, Keito Kudo, Keisuke Sakaguchi, and Kentaro Inui. 2025. “Weight-Based Analysis of Detokenization in Language Models: Understanding the First Stage of Inference Without Inference.” NAACL (Findings), 6324–43. https://aclanthology.org/2025.findings-naacl.355/.

Lad, Vedang, Jin Hwa Lee, Wes Gurnee, and Max Tegmark. 2025. The Remarkable Robustness of LLMs: Stages of Inference? https://arxiv.org/abs/2406.19384.

Liu, Alisa, Jonathan Hayase, Valentin Hofmann, Sewoong Oh, Noah A. Smith, and Yejin Choi. 2025. “SuperBPE: Space Travel for Language Models.” Second Conference on Language Modeling. https://openreview.net/forum?id=lcDRvffeNP.

Park, Kiho, Yo Joong Choe, Yibo Jiang, and Victor Veitch. 2025. “The Geometry of Categorical and Hierarchical Concepts in Large Language Models.” The Thirteenth International Conference on Learning Representations. https://openreview.net/forum?id=bVTM2QKYuA.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors