Skip to main content

Reading notes: On the Connection Between Adversarial Robustness and Saliency Map Interpretability

Etmann et al. Connection between robustness and interpretability

On the Connection Between Adversarial Robustness and Saliency Map Interpretability

  • Advantage and Disadvantages of adversarial training?

    While this method – like all known approaches of defense – decreases the accuracy of the classifier, it is also successful in increasing the robustness to adversarial attacks

  • Connections between the interpretability of saliency maps and robustness?

    saliency maps of robustified classifiers tend to be far more interpretable, in that structures in the input image also emerge in the corresponding saliency map

  • How to obtain saliency maps for a non-robustified networks?

    In order to obtain a semantically meaningful visualization of the network’s classification decision in non-robustified networks, the saliency map has to be aggregated over many different points in the vicinity of the input image. This can be achieved either via averaging saliency maps of noisy versions of the image (Smilkov et al., 2017) or by integrating along a path (Sundararajan et al., 2017).

  • Definition of the (adversarial) robustness

    Etmann%20et%20al%20Connection%20robustness%20and%20interpretab%20e90a958646414d1c8d29a4a32d45fad1/2020-07-05_4.53.34.png

    Put differently, the robustness of a classifier in a point is nothing but the distance to its closest decision boundary.

  • Prove: for linear binary cases

    Etmann%20et%20al%20Connection%20robustness%20and%20interpretab%20e90a958646414d1c8d29a4a32d45fad1/2020-07-05_5.31.53.png

    See OneNote page for details.

  • Definition of Alignment

    Etmann%20et%20al%20Connection%20robustness%20and%20interpretab%20e90a958646414d1c8d29a4a32d45fad1/2020-07-05_5.36.50.png

    Etmann%20et%20al%20Connection%20robustness%20and%20interpretab%20e90a958646414d1c8d29a4a32d45fad1/2020-07-05_6.20.50.png

    • a.e. stands for almost everywhere
  • Relationship between alignment and linear robustness?

    For a linear binary classifier, the alignment trivially increases with the robustness of the classifier.

Etmann%20et%20al%20Connection%20robustness%20and%20interpretab%20e90a958646414d1c8d29a4a32d45fad1/2020-07-05_8.01.13.png

  • Prove Lemma 1

    Etmann%20et%20al%20Connection%20robustness%20and%20interpretab%20e90a958646414d1c8d29a4a32d45fad1/2020-07-05_8.02.56.png

  • Definition of Linearized Robustness

    Etmann%20et%20al%20Connection%20robustness%20and%20interpretab%20e90a958646414d1c8d29a4a32d45fad1/2020-07-05_6.28.21.png

  • Binarized respective alignment

    Etmann%20et%20al%20Connection%20robustness%20and%20interpretab%20e90a958646414d1c8d29a4a32d45fad1/2020-07-05_7.57.31.png

    where is the minimizer of Eq (9),

  • Positive one-homogeneous function

    Any such function satisfies for all and .

Etmann%20et%20al%20Connection%20robustness%20and%20interpretab%20e90a958646414d1c8d29a4a32d45fad1/2020-07-05_8.03.56.png

  • Prove Lemma 2

    Etmann%20et%20al%20Connection%20robustness%20and%20interpretab%20e90a958646414d1c8d29a4a32d45fad1/2020-07-05_8.07.37.png

    Lemma 2 is a direct extension of 3.

Etmann%20et%20al%20Connection%20robustness%20and%20interpretab%20e90a958646414d1c8d29a4a32d45fad1/2020-07-05_8.11.08.png

  • Prove Theorem 1

    Etmann%20et%20al%20Connection%20robustness%20and%20interpretab%20e90a958646414d1c8d29a4a32d45fad1/2020-07-05_8.58.07.png

    Proof of Theorem 1

    Etmann%20et%20al%20Connection%20robustness%20and%20interpretab%20e90a958646414d1c8d29a4a32d45fad1/2020-07-05_8.56.53.png

Etmann%20et%20al%20Connection%20robustness%20and%20interpretab%20e90a958646414d1c8d29a4a32d45fad1/2020-07-05_9.00.48.png

  • With and for , prove Theorem 2.

    Etmann%20et%20al%20Connection%20robustness%20and%20interpretab%20e90a958646414d1c8d29a4a32d45fad1/2020-07-05_9.09.47.png

Etmann%20et%20al%20Connection%20robustness%20and%20interpretab%20e90a958646414d1c8d29a4a32d45fad1/2020-07-05_9.16.19.png

  • Prove Theorem 3

    Etmann%20et%20al%20Connection%20robustness%20and%20interpretab%20e90a958646414d1c8d29a4a32d45fad1/2020-07-05_9.19.44.png

    Note that the last equality is a typo by the authors, it is actually

  • Connection between Alignment and Interpretability

    For most types of image data an increase in alignment in discriminative regions should coincide with an increase in interpretability.

Comments

Popular posts from this blog

Reading Notes: Probabilistic Model-Agnostic Meta-Learning

Probabilistic Model-Agnostic Meta-Learning Reading Notes: Probabilistic Model-Agnostic Meta-Learning This post is a reading note for the paper "Probabilistic Model-Agnostic Meta-Learning" by Finn et al. It is a successive work to the famous MAML paper , and can be viewed as the Bayesian version of the MAML model. Introduction When dealing with different tasks of the same family, for example, the image classification family, the neural language processing family, etc.. It is usually preferred to be able to acquire solutions to complex tasks from only a few samples given the past knowledge of other tasks as a prior (few shot learning). The idea of learning-to-learn, i.e., meta-learning, is such a framework. What is meta-learning? The model-agnostic meta-learning (MAML) [1] is a few shot meta-learning algorithm that uses gradient descent to adapt the model at meta-test time to a new few-shot task, and trains the model parameters at meta-training time to enable rapid adap...

Evaluation methods for recommender systems

There are plenty of recommender systems available, the question is, for a specific recommendation problem, which recommender system model to use? The prediction accuracy (ratio of correct predicted items) is a straightforward approach, however, this is in most cases doesn't give a good indication on how 'good' the model is? Because usually, the ratings are somehow ordinal, which means the ratings are ordered instead of categorical, a prediction of 4 star is better than prediction of 5 star for a ground truth of 3 star, while when evaluate with accuracy, 4-star prediction and 5-star are treated equal -- incorrect prediction. There are plenty of better evaluation methods available,  in this post, I will introduce some of them. But first of all, lets review some basic concepts in model evaluation. To simplify our settings, lets say that we have a binary classification model, it made predictions on a test dataset, the prediction result is shown in Figure 1. Then the pr...