目录
Kullback–Leibler and Jensen–Shannon Divergence
Generative Adversarial Network (GAN)
What is the optimal value for D?
What is the global optimal?
What does the loss function represent?
Problems in GANs
Hard to achieve Nash equilibrium
Low dimensional supports
Vanishing gradient
Mode collapse
Lack of a proper evaluation metric
Improved GAN Training
Wasserstein GAN (WGAN)
What is Wasserstein distance?
Why Wasserstein is better than JS or KL divergence?
Use Wasserstein distance as GAN loss function
Example: Create New Pokemons!
生成对抗网络 (GAN) 在许多生成任务中显示出出色的结果,以复制现实世界的丰富内容,如图像、人类语言和音乐。它受到博弈论的启发:两个模型,一个生成器和一个批评者,在相互竞争的同时使彼此变得更强大。然而,训练GAN模型是相当具有挑战性的,因为人们面临着训练不稳定或无法收敛等问题。
在这里,我想解释生成对抗网络框架背后的数学原理,为什么很难训练,最后介绍一个旨在解决训练困难的GAN的修改版本。
Kullback–Leibler and Jensen–Shannon Divergence
在我们开始仔细研究 GAN 之前,让我们首先回顾一下量化两个概率分布之间相似性的指标。
一些人认为(Huszar,2015)GANs取得巨大成功背后的一个原因是将损失函数从传统最大似然方法中的不对称KL散度转换为对称JS散度。
Generative Adversarial Network (GAN)
GAN由两个模型组成:
- 鉴别器D:估计给定样本来自真实数据集的概率。它充当评论家,并经过优化以区分假样品和真实样本。
- 发电机G:输出给定噪声变量输入的合成样本z (z带来潜在的产出多样性)。它被训练来捕获真实的数据分布,以便其生成样本可以尽可能真实,或者换句话说,可以欺骗鉴别器提供高概率。
这两个模型在训练过程中相互竞争:
生成器G极力欺骗鉴别者,而批评者模特D正在努力不被骗。
两种模型之间这种有趣的零和博弈激励双方改进其功能。
What is the optimal value for D?
What is the global optimal?
What does the loss function represent?
Problems in GANs
Hard to achieve Nash equilibrium
Low dimensional supports
Vanishing gradient
Mode collapse
Lack of a proper evaluation metric
Improved GAN Training
(1) Feature Matching
(2) Minibatch Discrimination
(3) Historical Averaging
(4) One-sided Label Smoothing
(5) Virtual Batch Normalization (VBN)
(6) Adding Noises.
(7) Use Better Metric of Distribution Similarity
Wasserstein GAN (WGAN)
What is Wasserstein distance?
Why Wasserstein is better than JS or KL divergence?
Use Wasserstein distance as GAN loss function
Example: Create New Pokemons!
笔记摘自Lil'Log
From GAN to WGANhttps://lilianweng.github.io/posts/2017-08-20-gan/