240711_昇思学习打卡-Day23-LSTM+CRF序列标注(2)
今天记录LSTM+CRF序列标注的第二部分。仅作简单记录
Score计算
首先计算正确标签序列所对应的得分,这里需要注意,除了转移概率矩阵𝐏外,还需要维护两个大小为|𝑇|的向量,分别作为序列开始和结束时的转移概率。同时我们引入了一个掩码矩阵𝑚𝑎𝑠𝑘,将多个序列打包为一个Batch时填充的值忽略,使得Score计算仅包含有效的Token。
def compute_score(emissions, tags, seq_ends, mask, trans, start_trans, end_trans):
# emissions: (seq_length, batch_size, num_tags)
# tags: (seq_length, batch_size)
# mask: (seq_length, batch_size)
seq_length, batch_size = tags.shape
mask = mask.astype(emissions.dtype)
# 将score设置为初始转移概率
# shape: (batch_size,)
score = start_trans[tags[0]]
# score += 第一次发射概率
# shape: (batch_size,)
score += emissions[0, mnp.arange(batch_size), tags[0]]
for i in range(1, seq_length):
# 标签由i-1转移至i的转移概率(当mask == 1时有效)
# shape: (batch_size,)
score += trans[tags[i - 1], tags[i]] * mask[i]
# 预测tags[i]的发射概率(当mask == 1时有效)
# shape: (batch_size,)
score += emissions[i, mnp.arange(batch_size), tags[i]] * mask[i]
# 结束转移
# shape: (batch_size,)
last_tags = tags[seq_ends, mnp.arange(batch_size)]
# score += 结束转移概率
# shape: (batch_size,)
score += end_trans[last_tags]
return score
Normalizer计算
Normalizer是𝑥对应的所有可能的输出序列的Score的对数指数和(Log-Sum-Exp)。此时如果按穷举法进行计算,则需要将每个可能的输出序列Score都计算一遍,共有|𝑇|𝑛个结果。这里我们采用动态规划算法,通过复用计算结果来提高效率。
假设需要计算从第00至第𝑖𝑖个Token所有可能的输出序列得分Score𝑖,则可以先计算出从第0至第𝑖−1个Token所有可能的输出序列得分Score𝑖−1。因此,Normalizer可以改写为以下形式:
其中ℎ𝑖为第𝑖个Token的发射概率,𝐏是转移矩阵。由于发射概率矩阵ℎ和转移概率矩阵𝐏独立于𝑦的序列路径计算,可以将其提出,可得:
def compute_normalizer(emissions, mask, trans, start_trans, end_trans):
# emissions: (seq_length, batch_size, num_tags)
# mask: (seq_length, batch_size)
seq_length = emissions.shape[0]
# 将score设置为初始转移概率,并加上第一次发射概率
# shape: (batch_size, num_tags)
score = start_trans + emissions[0]
for i in range(1, seq_length):
# 扩展score的维度用于总score的计算
# shape: (batch_size, num_tags, 1)
broadcast_score = score.expand_dims(2)
# 扩展emission的维度用于总score的计算
# shape: (batch_size, 1, num_tags)
broadcast_emissions = emissions[i].expand_dims(1)
# 根据公式(7),计算score_i
# 此时broadcast_score是由第0个到当前Token所有可能路径
# 对应score的log_sum_exp
# shape: (batch_size, num_tags, num_tags)
next_score = broadcast_score + trans + broadcast_emissions
# 对score_i做log_sum_exp运算,用于下一个Token的score计算
# shape: (batch_size, num_tags)
next_score = ops.logsumexp(next_score, axis=1)
# 当mask == 1时,score才会变化
# shape: (batch_size, num_tags)
score = mnp.where(mask[i].expand_dims(1), next_score, score)
# 最后加结束转移概率
# shape: (batch_size, num_tags)
score += end_trans
# 对所有可能的路径得分求log_sum_exp
# shape: (batch_size,)
return ops.logsumexp(score, axis=1)
打卡图片: