Skip to content

Trustworthy-ML-Lab/Efficient-Interpretability-Eval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 

Repository files navigation

Beyond Top Activations: Efficient and Reliable Crowdsourced Evaluation of Automated Interpretability

This is the official repo of our paper Beyond Top Activations: Efficient and Reliable Crowdsourced Evaluation of Automated Interpretability, full code and results will be released later, please stay tuned. In this work, we conduct a large-scale crowd-sourced evaluation of automated interpretability methods for describing neurons in vision models.

  • While previous work has focused on only measuring whether the explanation matches the highest neuron activations, we instead measure the correlation between neuron activations and the explanation concept, giving us a more complete picture.
  • To make our evaluation economically feasible, we introduce Model-Guided Importance Sampling (MG-IS) to select most important inputs to show raters, leading to ∼15× reduction in labeling cost over uniform sampling.
  • We develop a Bayes Rater Aggregation (BRAgg) to aggregate predictions of different raters to deal with noisy labels, further reducing the number of ratings required to reach a certain accuracy by ∼3×.

Overview figure

Cite this work

T. Oikarinen, G. Yan, A. Kulkarni and T.-W. Weng, Beyond Top Activations: Efficient and Reliable Crowdsourced Evaluation of Automated Interpretability, arXiv preprint, 2025.

@misc{oikarinen2025rethinking,
  title={Beyond Top Activations: Efficient and Reliable Crowdsourced Evaluation of Automated Interpretability}, 
      author={Tuomas Oikarinen and Ge Yan and Akshay Kulkarni and Tsui-Wei Weng},
      year={2025},
      eprint={2506.07985},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2506.07985}, 
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published