Human Attention Maps for Text Classification: Do Humans and Neural Networks Focus on the Same Words?

Published in ACL, 2020

Attention mechanisms have dramatically improved the accuracy of recent RNN-based NLP methods. Recent claims of interpretability added through attention has sparked debates about how interpretable these mechanisms actually are. In this paper, we seek an answer to this question by incorporating human validation into the definition of interpretability. Thus, we collect a large human-annotation dataset, allowing us to evaluate to what degree do human attention maps agree with machine attention maps across a variety of attention mechanisms?. We ultimately find that different architectures compare more or less to human-annotations, lending insights into which mechanisms should be considered more or less interpretable. We also release our newly-collected dataset in conjunction with this paper so that future researchers may use it to evaluate their attention models. This work will appear at ACL this year.

@inproceedings{sen2020human,
  title={Human Attention Maps for Text Classification: Do Humans and Neural Networks Focus on the Same Words?},
  author={Sen, Cansu and Hartvigsen, Thomas and Yin, Biao and Kong, Xiangnan and Rundensteiner, Elke},
  booktitle={Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
  year={2020}
}