Towards ground truth explainability on tabular data
Using copulas to generate synthetic tabular data with ground truth explanations for enhanced interpretability of AI models.
In data science, there is a long history of using synthetic data for method development, feature selection and feature engineering. Our current interest in synthetic data comes from recent work in explainability. Today's datasets are typically larger and more complex - requiring less interpretable models. In the setting of \textit{post hoc} explainability, there is no ground truth for explanations. Inspired by recent work in explaining image classifiers that does provide ground truth, we propose a similar solution for tabular data. Using copulas, a concise specification of the desired statistical properties of a dataset, users can build intuition around explainability using controlled data sets and experimentation. The current capabilities are demonstrated on three use cases: one dimensional logistic regression, impact of correlation from informative features, impact of correlation from redundant variables.
Latest publications
Zero-shot meta-learning for tabular prediction tasks with adversarially pre-trained transformer
Introducing APT, an Adversarially Pre-trained Transformer achieving SOTA on small tabular tasks.
ICMLIdentifying interpretable subspaces in image representations
An interpretability framework for explaining features of image representations using contrasting concepts and captions.
ICMLTopological representations of local explanations
A topology-based framework for comparing and understanding local explainability methods in machine learning.
ICML