《Towards Black-Box Membership Inference Attack for Diffusion Models》论文笔记

《Towards Black-Box Membership Inference Attack for Diffusion Models》

Abstract

识别艺术品是否用于训练扩散模型的挑战，重点是人工智能生成的艺术品中的成员推断攻击——copyright protection
不需要访问内部模型组件的新型黑盒攻击方法
展示了在评估 DALL-E 生成的数据集方面的卓越性能。

作者主张

previous methods are not yet ready for copyright protection in diffusion models.

Contributions（文章里有三点，我觉得只有两点）

ReDiffuse：using the model’s variation API to alter an image and compare it with the original one.
A new MIA evaluation dataset：use the image titles from LAION-5B as prompts for DALL-E’s API [31] to generate images of the same contents but different styles.

Algorithm Design

target model：DDIM

为什么要强行引入一个版权保护的概念？？？

定义black-box variation API

$\hat{x}=V_{\theta}(x,t)$

细节如下：

总结为： $x$ 加噪变为 $x_t$ ，再通过DDIM连续降噪变为 $\hat{x}$

intuition

Our key intuition comes from the reverse SDE dynamics in continuous diffusion models.

one simplified form of the reverse SDE (i.e., the denoise step)
$X_t=(X_t/2-\nabla_x\log p(X_t))+dW_t,t\in[0,T]\tag{3}$

The key guarantee is that when the score function is learned for a data point x, then the reconstructed image $\hat{x}_i$ is an unbiased estimator of $x$ .（算是过拟合的另一种说法吧）

Hence，averaging over multiple independent samples $\hat{x}_i$ would greatly reduce the estimation error (see Theorem 1).

On the other hand, for a non-member image $x^{'}$ , the unbiasedness of the denoised image is not guaranteed.

details of algorithm：

independently apply the black-box variation API n times with our target image x as input
average the output images
compare the average result $\hat{x}$ with the original image.

evaluate the difference between the images using an indicator function:
$f(x)=1[D(x,\hat{x})<\tau]$
A sample is classified to be in the training set if $D(x,\hat{x})$ is smaller than a threshold $\tau$ ( $D(x,\hat{x})$ represents the difference between the two images)

ReDiffuse

Theoretical Analysis

什么是sampling interval？？？

MIA on Latent Diffusion Models

泛化到latent diffusion model，即Stable Diffusion

ReDiffuse+

variation API for stable diffusion is different from DDIM, as it includes the encoder-decoder process.
$z={\rm Encoder}(x),\quad z_t=\sqrt{\overline{\alpha}_t}z+\sqrt{1-\overline{\alpha}_t}\epsilon,\quad \hat{z}=\Phi_{\theta}(z_t,0),\quad \hat{x}={\rm Decoder}(\hat{z})\tag{4}$
modification of the algorithm

independently adding random noise to the original image twice and then comparing the differences between the two restored images $\hat{x}_1$ and $\hat{x}_2$ :
$f(x)=1[D(\hat{x}_1,\hat{x}_2)<\tau]$

Experiments

Evaluation Metrics

AUC
ASR
TPR@1%FPR

same experiment’s setup in previous papers [5, 18].

target model	DDIM	Stable Diffusion
version	《Are diffusion models vulnerable to membership inference attacks?》	original：stable diffusion-v1-5 provided by Huggingface
dataset	CIFAR10/100，STL10-Unlabeled，Tiny-Imagenet	member set：LAION-5B，corresponding 500 images from LAION-5；non-member set：COCO2017-val，500 images from DALL-E3
T	1000	1000
k	100	10