Hierarchical Consistent Contrastive Learning for Skeleton-Based Action Recognition with Growing Augmentations
Contribution
- 直接使用 strong augmentations 会导致图片/骨架点序列的结构变形和语义信息损失,从而导致训练过程的不稳定。于是本文提出了一种逐级融合(hierarchically integrates)strong augmentations 的设计,并提出逐级一致性对比学习的框架 hierarchical consistent contrastive learning framework, HiCLR
- 用不对称的逐级学习方式(asymmetric hierarchical learning)来约束正样本对特征的一致性,将 “应用了 strong augmentations 的骨架点特征” 往 “应用了 weak augmentations 的骨架点特征” 方向拉近(单向),以此利用 strong augmentations 带来的丰富信息提升模型的表征能力。
- 提出了三种 strong augmentation: Random Mask, Drop/Add Edges, 和 SkeleAdaIN. 需要注意的是:直接在经典的对比学习框架中使用这些 strong augmentations 会起反效果,但这些 strong augmentations 配合本文提出的逐级一致性对比学习的框架(HiCLR)可以带来明显提升。
Method
本文的 HiCLR 是基于经典对比学习框架 MoCo v2(对应最下面的两个分支)。
Strong Augmentation for Skeleton
本文定义了三个 augmentation sets
1. Basic Augmentation Set(BA):spatial transformation Shear and a temporal transformation Crop.
2. Normal Augmentation Set(NA):Spatial Flip, Rotation, Gaussian Noise, Gaussian Blur, and Channel Mask
3. Strong Augmentation Set:
- Random Mask
- A random mask for the spatial-temporal 3D coordinate data of the joints. It can be viewed as a random perturbation of the joint coordinates.
- 代码实现:https://github.com/JHang2020/HiCLR/blob/49ffdf85231f19c1c7795ec63fb8d25ea96d37cf/processor/utils.py#L42-L57
- Drop/Add Edges (DAE)
- We randomly drop/add connections between different joints in each information aggregation layer. The target to be augmented is the predefined or learnable adjacency matrix for the graph convolution layer and the attention map for the transformer block.
- 代码实现:https://github.com/JHang2020/HiCLR/blob/49ffdf85231f19c1c7795ec63fb8d25ea96d37cf/net/st_gcn.py#L71-L100
- SkeleAdaIN
- Inspired by the practice of style transfer (Huang and Belongie 2017; Karras, Laine, and Aila 2019), we exchange statistics of two skeleton samples on the spatial-temporal dimension, i.e., the mean and the variance of the style sample are transferred to the content sample, to generate the augmented views. Since this transformation does not change the relative order of joint coordinates, we maintain the semantics of skeleton sequences unchanged.
- 代码实现:https://github.com/JHang2020/HiCLR/blob/49ffdf85231f19c1c7795ec63fb8d25ea96d37cf/net/skeletonAdaIN.py
Gradual growing augmentation
图中第三~第一个分支的数据增强操作在前一个分支的基础上进行增加。
核心代码如下:
https://github.com/JHang2020/HiCLR/blob/49ffdf85231f19c1c7795ec63fb8d25ea96d37cf/feeder/ntu_feeder.py#L82-L118
https://github.com/JHang2020/HiCLR/blob/main/processor/pretrain_hiclr.py#L100
https://github.com/JHang2020/HiCLR/blob/49ffdf85231f19c1c7795ec63fb8d25ea96d37cf/net/hiclr.py#L113-L117
Asymmetric hierarchical learning
单方向将 “应用了 strong augmentations 的骨架点特征” 往 “应用了 weak augmentations 的骨架点特征” 方向拉近
sim() 可以是任何衡量相似性的函数,本文用的是 KL Divergence
核心代码:https://github.com/JHang2020/HiCLR/blob/49ffdf85231f19c1c7795ec63fb8d25ea96d37cf/net/hiclr.py#L180-L187
Total Loss
其中,InfoNCE Loss 只应用在第三和四分支生成的 pairs 上
代码:https://github.com/JHang2020/HiCLR/blob/main/processor/pretrain_hiclr.py#L101-L109