Beyond Top Activations: Efficient and Reliable Crowdsourced Evaluation of Automated Interpretability
This is the official repo of our paper Beyond Top Activations: Efficient and Reliable Crowdsourced Evaluation of Automated Interpretability, full code and results will be released later, please stay tuned. In this work, we conduct a large-scale crowd-sourced evaluation of automated interpretability methods for describing neurons in vision models.
- While previous work has focused on only measuring whether the explanation matches the highest neuron activations, we instead measure the correlation between neuron activations and the explanation concept, giving us a more complete picture.
- To make our evaluation economically feasible, we introduce Model-Guided Importance Sampling (MG-IS) to select most important inputs to show raters, leading to ∼15× reduction in labeling cost over uniform sampling.
- We develop a Bayes Rater Aggregation (BRAgg) to aggregate predictions of different raters to deal with noisy labels, further reducing the number of ratings required to reach a certain accuracy by ∼3×.
T. Oikarinen, G. Yan, A. Kulkarni and T.-W. Weng, Beyond Top Activations: Efficient and Reliable Crowdsourced Evaluation of Automated Interpretability, arXiv preprint, 2025.
@misc{oikarinen2025rethinking,
title={Beyond Top Activations: Efficient and Reliable Crowdsourced Evaluation of Automated Interpretability},
author={Tuomas Oikarinen and Ge Yan and Akshay Kulkarni and Tsui-Wei Weng},
year={2025},
eprint={2506.07985},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2506.07985},
}
