Introduction

为了缓解模型过拟合，作者提出 Class-wise self-knowledge distillation (CS-KD)，用同一类别的其他样本的预测类别概率去进行自蒸馏，使得模型输出更有意义和更加一致的预测结果

Class-wise self-knowledge distillation (CS-KD)

在这里插入图片描述

class-wise regularization loss. 使得属于同一类别样本的预测概率分布彼此接近，相当于对模型自身的 dark knowledge (i.e., the knowledge on wrong predictions) 进行蒸馏
其中， $\mathbf x,\mathbf x'$ 为属于同一类别的不同样本， $\mid \mathbf{x} ; \theta, T)=\frac{\exp \left(f_y(\mathbf{x} ; \theta) / T\right)}{\sum_{i=1}^C \exp \left(f_i(\mathbf{x} ; \theta) / T\right)}$ ， $T$ 为温度参数；注意到， $\tilde \theta$ 为 ﬁxed copy of the parameters $\theta$ ，梯度不会通过 $\tilde \theta$ 回传到模型参数，从而避免 model collapse (cf. Miyaeto et al.)
total training loss

在这里插入图片描述

Reducing the intra-class variations.
Preventing overconﬁdent predictions. CS-KD 通过将同一类别其他样本的预测类别分布作为软标签来避免 overconﬁdent predictions，这比一般的 label-smoothing 方法生成的软标签更真实 (more ‘realistic’)

Comparison with output regularization methods.
Comparison with self-distillation methods.
Evaluation on large-scale datasets.
Compatibility with other regularization methods.
Ablation study.
(1) Feature embedding analysis.
(2) Hierarchical image classification.
Calibration effects.

Yun, Sukmin, et al. “Regularizing class-wise predictions via self-knowledge distillation.” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020.
code: https://github.com/alinlab/cs-kd

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：/a/12583.html

如若内容造成侵权/违法违规/事实不符，请联系我们进行投诉反馈qq邮箱809451989@qq.com，一经查实，立即删除！