Linear_Reasoning_Features (ACL 2025 Findings)

This repository contains the data and code for the experiments in the paper titled "The Reasoning-Memorization Interplay in Language Models Is Mediated by a Single Direction".

Paper (Arxiv): https://arxiv.org/abs/2503.23084

Overview

This project provides code and datasets to reproduce the experiments from the paper. It investigates the interplay between reasoning and memorization in large language models (LLMs) by identifying linear features in the model's residual stream that mediate this balance.

How to Run

Unzip the dataset
```
unzip dataset.zip
```
Store Hidden States of Models on Certain Tasks
Run the notebook: ./reasoning_representation/LiReFs_storing_hs.ipynb
Create PCA and Other Figures
Run the notebook: ./reasoning_representation/Figures_Interp_Reason&Memory.ipynb

Run Intervention Experiments

cd Intervention
python features_intervention.py

How to Cite

@misc{hong2025reasoningmemorizationinterplaylanguagemodels,
      title={The Reasoning-Memorization Interplay in Language Models Is Mediated by a Single Direction}, 
      author={Yihuai Hong and Dian Zhou and Meng Cao and Lei Yu and Zhijing Jin},
      year={2025},
      eprint={2503.23084},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2503.23084}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
.clinerules		.clinerules
__MACOSX		__MACOSX
figs_tabs		figs_tabs
outputs		outputs
reasoning_representation		reasoning_representation
.gitignore		.gitignore
README.md		README.md
README_original.md		README_original.md
dataset.zip		dataset.zip
environment.yml		environment.yml
intervention_bash_script.sh		intervention_bash_script.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Linear_Reasoning_Features (ACL 2025 Findings)

Overview

How to Run

How to Cite

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Linear_Reasoning_Features (ACL 2025 Findings)

Overview

How to Run

How to Cite

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages