Abstract:
Deep Neural Networks (DNNs) constitute powerful models and have successfully been deployed in a wide range of tasks. However, it has proven difficult to extract explanations for DNN decisions that are both human-interpretable and faithful to the underlying model.
For example, while a single linear transform can accurately summarise a piece-wise linear model, these transforms are noisy and difficult to understand for humans. Similarly, while other methods’ explanations seem to be more human-interpretable (higher visual quality), gains in the visual quality of the explanations often came at the cost of their model-faithfulness (Adebayo et al., NeurIPS 2018).
Instead of optimising an explanation method for pretrained DNNs, in my work I explore how to integrate the goal of interpretability into DNN training and thus effectively optimise the DNNs to be inherently interpretable.
Specifically, in my talk I would like to discuss my most recent project, the B-cos Networks, which are based on a non-linear transform—the B-cos transform—that was specifically designed to increase DNN interpretability. This transform allows to faithfully summarise a DNN with a single linear transform and, crucially, introduces ‘alignment pressure’ on the model weights during training. As a result, the induced linear transforms which summarise the entire DNN become highly interpretable and align with task-relevant features. Finally, the B-cos transform is designed to be compatible with existing architectures and can easily be integrated into common models such as VGGs, ResNets, InceptionNets, and DenseNets. Importantly, the resulting models achieve similar performance as their respective baselines, but exhibit significantly higher interpretability.
Bio:
I am a PhD student in the Computer Vision and Machine Learning group at the Max Planck Institute for Informatics in Germany. I am supervised by Prof. Bernt Schiele, and in my research I focus on the interpretability of deep neural networks (DNNs). Specifically, I am interested in designing DNN architectures that are inherently interpretable. Before commencing my doctoral studies, I obtained a Bachelor’s degree in Physics at the Free University in Berlin, Germany, in 2016. I went on to study Computational Neuroscience at the Bernstein Center for Computational Neuroscience in Berlin, Germany, and obtained my Master’s degree in 2019.