Positional Encoding 位置编码
flyfish
Transformer模型没有使用循环神经网络,无法从序列中学习到位置信息,并且它是并行结构,不是按位置来处理序列的,所以为输入序列加入了位置编码,将每个词的位置加入到了词向量中。
如果采用自然数列作为位置编码,编码就是线性的,相邻位置之间的差异就在整个序列中保持恒定。如果采用正弦余弦函数生成的位置嵌入变量具有周期性和正交性,就可以产生各个尺度上具有区分性的位置嵌入,这样在捕捉长距离依赖关系时会表现的更好一点。
PE
(
p
o
s
,
2
i
)
=
sin
(
p
o
s
/
1000
0
2
i
/
d
model
)
PE
(
p
o
s
,
2
i
+
1
)
=
cos
(
p
o
s
/
1000
0
2
i
/
d
model
)
{\Large \begin{aligned} \text{PE}(pos, 2i) = \sin(pos/10000^{2i/d_\text{model}}) \\ \text{PE}(pos, 2i+1) = \cos(pos/10000^{2i/d_\text{model}}) \\ \end{aligned} }
PE(pos,2i)=sin(pos/100002i/dmodel)PE(pos,2i+1)=cos(pos/100002i/dmodel)
from collections import Counter
import torch
import torch.nn as nn
import numpy as np
# 生成正弦位置编码表的函数,用于在 Transformer 中引入位置信息
def get_sin_enc_table(n_position, embedding_dim):
#------------------------- 维度信息 --------------------------------
# n_position: 输入序列的最大长度
# embedding_dim: 词嵌入向量的维度
#-----------------------------------------------------------------
# 根据位置和维度信息,初始化正弦位置编码表
sinusoid_table = np.zeros((n_position, embedding_dim))
# 遍历所有位置和维度,计算角度值
for pos_i in range(n_position):
for hid_j in range(embedding_dim):
angle = pos_i / np.power(10000, 2 * (hid_j // 2) / embedding_dim)
sinusoid_table[pos_i, hid_j] = angle
# 计算正弦和余弦值
sinusoid_table[:, 0::2] = np.sin(sinusoid_table[:, 0::2]) # dim 2i 偶数维
sinusoid_table[:, 1::2] = np.cos(sinusoid_table[:, 1::2]) # dim 2i+1 奇数维
#------------------------- 维度信息 --------------------------------
# sinusoid_table 的维度是 [n_position, embedding_dim]
#----------------------------------------------------------------
return torch.FloatTensor(sinusoid_table) # 返回正弦位置编码表
sentences = [
['like tree like fruit','羊毛 出在 羊身上'],
['East west home is best', '金窝 银窝 不如 自己的 草窝'],
]
for sentence in sentences:
r=sentence[0].split()
print(r)
# 计算源语言的最大句子长度,并加 1 以容纳填充符<pad>
src_len = max(len(sentence[0].split()) for sentence in sentences) + 1
print(src_len)
d_embedding = 3 # Embedding 的维度
r=get_sin_enc_table(src_len+1, d_embedding)
print(r)
结果
['like', 'tree', 'like', 'fruit']
['East', 'west', 'home', 'is', 'best']
6
tensor([[ 0.0000, 1.0000, 0.0000],
[ 0.8415, 0.5403, 0.0022],
[ 0.9093, -0.4161, 0.0043],
[ 0.1411, -0.9900, 0.0065],
[-0.7568, -0.6536, 0.0086],
[-0.9589, 0.2837, 0.0108],
[-0.2794, 0.9602, 0.0129]])
j假如 Embedding 的维度d_embedding = 512,这样就有了256对正弦值和余弦值
PE
(
pos
,
0
)
=
sin
(
pos
1000
0
0
512
)
PE
(
pos
,
1
)
=
cos
(
pos
1000
0
0
512
)
PE
(
pos
,
2
)
=
sin
(
pos
1000
0
2
512
)
PE
(
pos
,
3
)
=
cos
(
pos
1000
0
2
512
)
PE
(
pos
,
4
)
=
sin
(
pos
1000
0
4
512
)
PE
(
pos
,
5
)
=
cos
(
pos
1000
0
4
512
)
⋮
PE
(
pos
,
510
)
=
sin
(
pos
1000
0
510
512
)
PE
(
pos
,
511
)
=
cos
(
pos
1000
0
510
512
)
\begin{aligned} \text{PE}(\text{pos}, 0) &= \sin\left( \dfrac{\text{pos}}{10000^{\frac{0}{512}}} \right) \\ \text{PE}(\text{pos}, 1) &= \cos\left( \dfrac{\text{pos}}{10000^{\frac{0}{512}}} \right) \\ \text{PE}(\text{pos}, 2) &= \sin\left( \dfrac{\text{pos}}{10000^{\frac{2}{512}}} \right) \\ \text{PE}(\text{pos}, 3) &= \cos\left( \dfrac{\text{pos}}{10000^{\frac{2}{512}}} \right) \\ \text{PE}(\text{pos}, 4) &= \sin\left( \dfrac{\text{pos}}{10000^{\frac{4}{512}}} \right) \\ \text{PE}(\text{pos}, 5) &= \cos\left( \dfrac{\text{pos}}{10000^{\frac{4}{512}}} \right) \\ \vdots \\ \text{PE}(\text{pos}, 510) &= \sin\left( \dfrac{\text{pos}}{10000^{\frac{510}{512}}} \right) \\ \text{PE}(\text{pos}, 511) &= \cos\left( \dfrac{\text{pos}}{10000^{\frac{510}{512}}} \right) \\ \end{aligned}
PE(pos,0)PE(pos,1)PE(pos,2)PE(pos,3)PE(pos,4)PE(pos,5)⋮PE(pos,510)PE(pos,511)=sin(100005120pos)=cos(100005120pos)=sin(100005122pos)=cos(100005122pos)=sin(100005124pos)=cos(100005124pos)=sin(10000512510pos)=cos(10000512510pos)