- cvpr2023
- https://github.com/XPixelGroup/HAT?tab=readme-ov-file
- 问题引入:
– 现在的transformer based的SR模型“感受野”不够;
– 分析:原本认为transformer-based的方法优于CNN-based的方法是因为可以利用更加long-range的信息,但是作者通过LAM分析得到SwinIR方法并没有使用更多的long-range信息,其表现更好的原因是因为能够更好的建模局部信息,所以需要扩展“感受野”;此外还发现中间特征具有块效应,这说明shift window mechanism不能很好的实现cross-window information interaction; - 本文方法:
– 对应“感受野”问题:Hybrid Attention Transformer (HAT): 综合使用channel attention & window-based self attention;
– 对应跨窗口信息交互,块效应问题:overlapping cross-attention module: 加强相邻window feature之间的交互;
– 预训练:同任务预训练策略,使用大规模数据集对同一任务进行预训练;
- 网络结构
– shallow feature extraction + deep feature extraction + image reconstruction
– I L R ∈ R H × W × C i n → s h a l l o w f e a t u r e e x t r a c t i o n c o n v × 1 F 0 ∈ R H × W × C → d e e p f e a t u r e e x t r a c t i o n r e s i d u a l h y b r i d a t t e n t i o n g r o u p s ( R H A G ) × n + c o n v × 1 F D ∈ R H × W × C + F 0 → r e c o n s t r u c t i o n m o d u l e I H R \mathcal{I}_{LR}\in \mathcal{R}^{H\times W\times C_{in}} \xrightarrow[shallow\ feature\ extraction]{conv\times 1} F_0\in \mathcal{R}^{H\times W\times C} \xrightarrow[deep\ feature\ extraction]{residual\ hybrid\ attention\ groups(RHAG)\times n + conv \times 1} F_D\in\mathcal{R}^{H\times W\times C} + F_0\xrightarrow[reconstruction\ module]{} \mathcal{I}_{HR} ILR∈RH×W×Cinconv×1shallow feature extractionF0∈RH×W×Cresidual hybrid attention groups(RHAG)×n+conv×1deep feature extractionFD∈RH×W×C+F0reconstruction moduleIHR
– reconstruction module的pixel shuffle是用来上采样用的;
– RHAG由HAB和OCAB组成
– HAB:
X N = L N ( X ) X M = ( S ) W − M S A ( X N ) + α C A B ( X N ) + X Y = M L P ( L N ( X M ) ) + X M X_N = LN(X) \\ X_M = (S)W-MSA(X_N)+\alpha CAB(X_N)+X \\ Y = MLP(LN(X_M)) + X_M XN=LN(X)XM=(S)W−MSA(XN)+αCAB(XN)+XY=MLP(LN(XM))+XM
– W_MSA:window-based multihead self attention: 首先将输入分为 M × M M\times M M×M个window,之后每个window中进行self attention,还每隔一段时间使用shift window partition approach;
– CAB是首先一个卷积,将通道数降为原来的 1 β \frac{1}{\beta} β1,再一个conv恢复到原来的通道数,之后是一个channel attention模块;
– Overlapping Cross-Attention Block (OCAB)
- 实验:
– 预训练:ImageNet
– training: DIV2K+Flicker2K