Reading notes: On the Connection Between Adversarial Robustness and Saliency Map Interpretability

Etmann et al. Connection between robustness and interpretability

On the Connection Between Adversarial Robustness and Saliency Map Interpretability

Advantage and Disadvantages of adversarial training?

While this method – like all known approaches of defense – decreases the accuracy of the classiﬁer, it is also successful in increasing the robustness to adversarial attacks
Connections between the interpretability of saliency maps and robustness?

saliency maps of robustiﬁed classiﬁers tend to be far more interpretable, in that structures in the input image also emerge in the corresponding saliency map
How to obtain saliency maps for a non-robustified networks?

In order to obtain a semantically meaningful visualization of the network’s classiﬁcation decision in non-robustiﬁed networks, the saliency map has to be aggregated over many different points in the vicinity of the input image. This can be achieved either via averaging saliency maps of noisy versions of the image (Smilkov et al., 2017) or by integrating along a path (Sundararajan et al., 2017).
Definition of the (adversarial) robustness

Put differently, the robustness of a classiﬁer in a point is nothing but the distance to its closest decision boundary.
Prove: $\rho(x)=\frac{|\langle x, z\rangle|}{\|z\|}=\frac{\left|\left\langle x, \nabla \Psi_{z}(x)\right\rangle\right|}{\left\|\nabla \Psi_{z}(x)\right\|}$ for linear binary cases

See OneNote page for details.
Definition of Alignment
- a.e. stands for almost everywhere
Relationship between alignment and linear robustness?

For a linear binary classiﬁer, the alignment trivially increases with the robustness of the classiﬁer.

Etmann%20et%20al%20Connection%20robustness%20and%20interpretab%20e90a958646414d1c8d29a4a32d45fad1/2020-07-05_8.01.13.png

Prove Lemma 1
Definition of Linearized Robustness
Binarized respective alignment

where $j^*$ is the minimizer of Eq (9),
Positive one-homogeneous function

Any such function satisﬁes $\Psi(ax)=a\Psi(x)$ for all $a>0$ and $x$ .

Etmann%20et%20al%20Connection%20robustness%20and%20interpretab%20e90a958646414d1c8d29a4a32d45fad1/2020-07-05_8.03.56.png

Prove Lemma 2

Lemma 2 is a direct extension of 3.

Etmann%20et%20al%20Connection%20robustness%20and%20interpretab%20e90a958646414d1c8d29a4a32d45fad1/2020-07-05_8.11.08.png

Prove Theorem 1

Proof of Theorem 1

Etmann%20et%20al%20Connection%20robustness%20and%20interpretab%20e90a958646414d1c8d29a4a32d45fad1/2020-07-05_9.00.48.png

With $\beta^{i}(x):=\langle b, \nabla_{b} \Psi_{\Theta, b}^{i}(x)\rangle$ and $\bar{g}:=g/||g||$ for $g\not=0$ , prove Theorem 2.

Etmann%20et%20al%20Connection%20robustness%20and%20interpretab%20e90a958646414d1c8d29a4a32d45fad1/2020-07-05_9.16.19.png

Prove Theorem 3

Note that the last equality $\left\langle\xi, \bar{g}^{\dagger}-\bar{g}+\bar{g}\right\rangle$ is a typo by the authors, it is actually $\left\langle\xi, \bar{g}^{\dagger}-\bar{\gamma}+\bar{\gamma}\right\rangle$
Connection between Alignment and Interpretability

For most types of image data an increase in alignment in discriminative regions should coincide with an increase in interpretability.

PandaCid's Blog

Search This Blog

Reading notes: On the Connection Between Adversarial Robustness and Saliency Map Interpretability

Etmann et al. Connection between robustness and interpretability

Comments

Post a Comment

Popular posts from this blog

Reading Notes: Probabilistic Model-Agnostic Meta-Learning

Evaluation methods for recommender systems