Topological representations of local explanations
A topology-based framework for comparing and understanding local explainability methods in machine learning.
Local explainability methods — those which seek to generate an explanation for each prediction — are becoming increasingly prevalent due to the need for practitioners to rationalize their model outputs. However, comparing local explainability methods is difficult since they each generate outputs in various scales and dimensions. Furthermore, due to the stochastic nature of some explainability methods, it is possible for different runs of a method to produce contradictory explanations for a given observation. In this paper, we propose a topology-based framework to extract a simplified representation from a set of local explanations. We do so by first modeling the relationship between the explanation space and the model predictions as a scalar function. Then, we compute the topological skeleton of this function. This topological skeleton acts as a signature for such functions, which we use to compare different explanation methods. We demonstrate that our framework can not only reliably identify differences between explainability techniques but also provides stable representations. Then, we show how our framework can be used to identify appropriate parameters for local explainability methods. Our framework is simple, does not require complex optimizations, and can be broadly applied to most local explanation methods. We believe the practicality and versatility of our approach will help promote topology-based approaches as a tool for understanding and comparing explanation methods.
Latest publications
Identifying interpretable subspaces in image representations
An interpretability framework for explaining features of image representations using contrasting concepts and captions.
ICMLTowards ground truth explainability on tabular data
Using copulas to generate synthetic tabular data with ground truth explanations for enhanced interpretability of AI models.
ICMLDynamic guardian models: realtime content moderation with user-defined policies
Specialized classifiers that evaluate text based on predefined trustworthiness objectives.
ICML