AdaLN
是一种 Adaptive Layer Normalization(自适应层归一化),通过组合多种归一化策略和自适应权重调整机制来实现更灵活的归一化。
核心思想:
- 对输入张量分别进行两种不同的 Layer Normalization。
- 利用一个
gamma
门控机制,动态调整两种归一化结果的权重。 - 添加一个跳跃连接(skip connection),以增强网络的表达能力。
源代码:
class AdaLN(nn.Module):
"""Adaptive Layer Normalization."""
def __init__(self, normalized_shape):
super(AdaLN, self).__init__()
# Layer norms
self.a_layer_norm = LayerNorm( # equivalent to scale=False, offset=False in Haiku
normalized_shape,
elementwise_affine=False,
bias=False
)
self.s_layer_norm = LayerNorm( # equivalent to scale=True, offset=False in Haiku
normalized_shape,
elementwise_affine=True,
bias=False
)
# Linear layers for gating and the skip connection
dim = normalized_shape if isinstance(normalized_shape, int) else normalized_shape[-1]
self.to_gamma = nn.Sequential(
Linear(dim, dim, init='gating'),
nn.Sigmoid()
)
self.skip_linear = LinearNoBias(dim, dim, init='final')
def forward(se