Certified Adversarial Robustness via Randomized Smoothing

文章目录

概
主要内容
- 定理1
代码

Cohen J., Rosenfeld E., Kolter J. Certified Adversarial Robustness via Randomized Smoothing. International Conference on Machine Learning (ICML), 2019.

@article{cohen2019certified,
title={Certified Adversarial Robustness via Randomized Smoothing},
author={Cohen, Jeremy and Rosenfeld, Elan and Kolter, J Zico},
pages={1310–1320},
year={2019}}

概

Certified robustness 区别于一般的启发式的防御, 其在样本 x x x满足一定的条件下(往往是一个类似于置信度的保证), 可以证明在某个范数球(往往是 ℓ 2 \ell_2 ℓ2, 正如本文)内能够免疫攻击, 即
g ( x + ϵ ) = g ( x ) : = arg ⁡ max ⁡ c ∈ Y z ( x ) , ∀ ϵ ∈ B ( x ; R ) . g(x+\epsilon)=g(x):=\arg \max_{c \in \mathcal{Y}} \: z(x), \forall \epsilon \in \mathcal{B}(x;R). g(x+ϵ)=g(x):=argc∈Ymaxz(x),∀ϵ∈B(x;R).
这些方法给出了一种不同于adversarial training的思路, 虽然到目前为止, 这些方法往往局限于 ℓ 1 , ℓ 2 \ell_1, \ell_2 ℓ1,ℓ2攻击, 在更为常见的 ℓ ∞ \ell_{\infty} ℓ∞的表现并不是特别好.

主要内容

方法很简单, 训练的时候:

Given inputs x i x_i xi,
Generate gaussian noise ϵ i ∼ N ( 0 , σ 2 ) \epsilon_i \sim \mathcal{N}(0, \sigma^2) ϵi∼N(0,σ2);
Use x i + ϵ i x_i+\epsilon_i xi+ϵi to train.

实际上这个训练过程, 不从后面的理论的角度看, 可以把它和adversarial training做类比, 实际上都是一种在样本点周围试探性的训练过程. 大概这样子会让整个的loss landscape更加光滑?

测试的时候就不同了, 首先需要认为地设定一个采样次数 n n n,

Given input x x x
Generate n n n gaussian noise ϵ i , i = 1 , … , n \epsilon_i, i=1, \ldots, n ϵi,i=1,…,n.
For each x + ϵ i x+\epsilon_i x+ϵi, the neural network will give a prediction label c i c_i ci;
Count the prediction labels and find the most frequent one, denoted by c c c.

则 c c c就是最终的预测是输出, 简而言之, 就是在预测的时候需要统计频率, 这个实际上是寻找最大概率点.

定理1

定理1: 假设 f : R d → Y f:\mathbb{R}^d \rightarrow \mathcal{Y} f:Rd→Y 为一个任意的确定性或者随机的函数, ϵ ∼ N ( 0 , σ 2 I ) \epsilon \sim \mathcal{N}(0, \sigma^2I) ϵ∼N(0,σ2I). 定义 g g g为
g ( x ) : = arg ⁡ max ⁡ c ∈ Y P ( f ( x + ϵ ) = c ) . (1) \tag{1} g(x):= \arg \max_{c \in \mathcal{Y}} \mathbb{P}(f(x+\epsilon)=c). g(x):=argc∈YmaxP(f(x+ϵ)=c).(1)
假设 c A ∈ Y c_A \in \mathcal{Y} cA∈Y且 p A ‾ , p B ‾ ∈ [ 0 , 1 ] \underline{p_A}, \overline{p_B} \in [0, 1] pA,pB∈[0,1]满足
P ( f ( x + ϵ ) = c A ) ≥ p A ‾ ≥ p B ‾ ≥ max ⁡ c ≠ c A P ( f ( x + ϵ ) = c ) . (2) \tag{2} \mathbb{P}(f(x+\epsilon)=c_A) \ge \underline{p_A} \ge \overline{p_B} \ge \max_{c \not = c_{A}} \mathbb{P}(f(x+\epsilon)=c). P(f(x+ϵ)=cA)≥pA≥pB≥c=cAmaxP(f(x+ϵ)=c).(2)
则 g ( x + δ ) = c A g(x+\delta)=c_A g(x+δ)=cA 对于任意的 ∥ δ ∥ 2 < R \|\delta\|_2 < R ∥δ∥2<R, 其中
R = σ 2 ( Φ − 1 ( p A ‾ ) − Φ − 1 ( p B ‾ ) ) . (3) \tag{3} R=\frac{\sigma}{2}(\Phi^{-1}(\underline{p_A})- \Phi^{-1}(\overline{p_B})). R=2σ(Φ−1(pA)−Φ−1(pB)).(3)
定理1总结来说就是, 当你的 f ( x + ϵ ) = c A f(x+\epsilon)=c_A f(x+ϵ)=cA的概率比别的类别的概率大得多的时候, 由(1)式所得到的smooth版分类器 g g g就能够在某个半径内免疫 ℓ 2 \ell_2 ℓ2攻击.
但是需要注意的是, 普通的CNN的训练过程可以保证置信度很高但没法保证(2), 所以为了让(2)式成立这才有了上面的一个训练过程, 其中实际上有一个逼近的过程(虽然感觉有一点牵强):

测试过程中统计频率的行为也得到了解释, 实际上就是为了估计最大概率. 最后, 在作者的代码中, 或者说算法中, 测试的predict可能有点麻烦, 实际上这是作者引入了假设检验, 意图大概是为了有些时候没法判断到底哪个对干脆就不判断来保证安全(测试的时候感觉是没有必要的). 当然了, 在certify accuracy的估计中, α \alpha α就是相当有必要了.

代码

原作者代码.