Abstract
Saliency methods — techniques to identify the importance of input features on a model’s output — are a common step in understanding neural network behavior.
However, interpreting saliency requires tedious manual inspection to identify and aggregate patterns in model behavior, resulting in ad hoc or cherry-picked analysis.
To address these concerns, we present Shared Interest: metrics for comparing model reasoning (via saliency) to human reasoning (via ground truth annotations).
By providing quantitative descriptors, Shared Interest enables ranking, sorting, and aggregating inputs, thereby facilitating large-scale systematic analysis of model behavior.
We use Shared Interest to identify eight recurring patterns in model behavior, such as cases where contextual features or a subset of ground truth features are most important to the model.
Working with representative real-world users, we show how Shared Interest can be used to decide if a model is trustworthy, uncover issues missed in manual analyses, and enable interactive probing.
Citation
@inproceedings{boggust2022shared,
title={Shared Interest: Measuring Human-AI Alignment to Identify Recurring Patterns in Model Behavior},
author={Boggust, Angie and Hoover, Benjamin and Satyanarayan, Arvind and Strobelt, Hendrik},
booktitle={{CHI} Conference on Human Factors in Computing Systems},
year={2022}
}