Most work interpreting reasoning models studies only a single chain-of-thought (CoT), yet these models define distributions over many possible CoTs. We argue that studying a single sample is inadequate for understanding causal influence and the underlying computation. We present case studies using resampling to investigate model decisions. Overall, studying distributions via resampling enables reliable causal analysis, clearer narratives of model reasoning, and principled CoT interventions.
See more:
- 📄 Paper: https://arxiv.org/abs/2510.27484
- 📊 Datasets: https://huggingface.co/datasets/uzaymacar/blackmail-rollouts and https://huggingface.co/datasets/uzaymacar/whistleblower-rollouts
You can download our blackmail rollouts dataset and whistleblower rollouts dataset or resample your own data.
Here's a quick rundown of the main scripts in this repository and what they do:
blackmail/generate_blackmail_rolloutsandwhistleblower/generate_whistleblower_rollouts.pyrespectively creates base rollouts for the blackmail and whistleblower scenarios. Our datasets were generated with them.blackmail/prompts.pyandwhistleblower/prompts.pyincludes the input prompts used andblackmail/utils.pyandwhistleblower/utils.pycontains helper functions.blackmail/analyze_rollouts.pyandwhistleblower/analyze_rollouts.pycreates thechunks_labeled.jsonfiles in the respective data folders.blackmail/onpolicy_chain_disruption.pyandwhistleblower/onpolicy_chain_disruption.pycreates on-policy chain-of-thought interventions via resampling.blackmail/measure_determination.pyandwhistleblower/measure_determination.pycreates off-policy chain-of-thought interventions via hand-written edits and same/cross-model insertions.faithfulness/andresume_analysis/folders respectively contains all experiments run in the paper for chain-of-thought faithfulness and resume analysis.
Please cite our work if you are using our code or datasets.
@misc{macar2025thoughtbranchesinterpretingllm,
title={Thought Branches: Interpreting LLM Reasoning Requires Resampling},
author={Uzay Macar and Paul C. Bogdan and Senthooran Rajamanoharan and Neel Nanda},
year={2025},
eprint={2510.27484},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2510.27484},
}
For any questions, thoughts, or feedback, please reach out to uzaymacar@gmail.com and paulcbogdan@gmail.com.