This is the official repository for Knowing When to Stop: Efficient Context Processing via Latent Sufficiency Signals (NeurIPS 2025). The repo contains the original implementation of the paper, including both the datasets and source code. Check out the project website for more information.
⭐ If you find our work helpful, please consider citing ⭐ :
@article{xie2025when,
title={Knowing When to Stop: Efficient Context Processing via Latent Sufficiency Signals},
author={Xie, Roy and Wang, Junlin and Rosu, Paul and Deng, Chunyuan and Sun, Bolun and Lin, Zihao and Dhingra, Bhuwan},
journal={Advances in Neural Information Processing Systems},
year={2025}
}Python 3.10 is recommended. You can create a virtual environment and install the required packages as follows:
python -m venv when-to-stop-env
source when-to-stop-env/bin/activate
pip install -r requirement.txtRunning the pipeline also requires an OpenAI API key for evaluation. You can export your OpenAI API key as follows:
export OPENAI_API_KEY="<your OpenAI api key>".You can adjust configuration parameters, such as the number of data points, number of heads, classification threshold, or evaluation methods, by editing the following file:
src/config.pyRun the full pipeline, which will first probe the model to select the most sufficient heads, then train the classifier on the selected heads, and finally evaluate the results. You can use a Llama 3.2 1B model as an example with the following command:
python src/run.py --model_name meta-llama/Llama-3.2-1B-Instruct --data_dir "data/short" --output_dir "./results" For questions or issues, please open an issue on GitHub or contact the authors directly.