对这些研究有点兴趣颇微。
文章目录
- Rethinking Dense Retrieval’s Few-Shot Ability
- Decoder-Only or Encoder-Decoder? Interpreting Language Model as a Regularized Encoder-Decoder
- PLOME: Pre-training with Misspelled Knowledge for Chinese Spelling Correction
- Read, Listen, and See: Leveraging Multimodal Information Helps Chinese Spell Checking
Rethinking Dense Retrieval’s Few-Shot Ability
我们定制了一个标准的FewDR数据集和评估协议,用于少量密集的检索。该数据集是在维基百科语料库上构建的,包含41,420个样本,有60个细粒度的类别。
具体内容上,和其他的dense retrieval方法,没有感觉到有太大的不同。
Decoder-Only or Encoder-Decoder? Interpreting Language Model as a Regularized Encoder-Decoder
传统上,大部分seq2seq任务是由编码器-解码器框架解决的,它需要一个编码器来编码源序列,一个解码器来生成目标文本。
This paper aims to address this gap by conducting a detailed comparison between the encoder-decoder architecture and the decoder-only language model framework through the analysis of a regularized encoder-decoder structure.
问题矛盾点:
1.encoder-decoder模型结构相比于decoder-ONLY结构,哪个更有优势?
2.我们揭示了语言模型中的注意力退化问题,即随着生成步骤数的增加,越来越少的注意力被集中在源序列上。
traditional ED structure named as Regularized Encoder-Decoder (RED) framework
1.为了避免注意力退化的问题,提出了单向交叉注意,单向的交叉注意同时关注源矩阵和目标矩阵;
2.连续位置编码,在target序列中的位置编码和source序列中的位置编码是连续,而不是在target中从头开始排序。
PLOME: Pre-training with Misspelled Knowledge for Chinese Spelling Correction
语音和视觉相似性知识对这项任务很重要。 PLOME 利用 GRU 网络根据字符的语音和笔画对此类知识进行建模。
所提出的模型将每个字符的笔画和拼音作为输入,这使得 PLOME 能够对任意字符之间的相似性进行建模。
PLOME 通过联合恢复掩码标记的真实字符和语音来学习字符和语音级别的拼写错误知识。
模型结构图
- we randomly mask some percentage of the input tokens and then recover them
- mask 15% of tokens in the corpus. In addition, we use dynamic masking strategy
- the final embedding of each character is the sum of character embedding, position embedding, phonic embedding and shape embedding
The probability of the character predicted for the i-th token in a given
sentence is defined as
The probability of pronunciation prediction
is defined as:
损失函数:
Read, Listen, and See: Leveraging Multimodal Information Helps Chinese Spell Checking
汉字中常见的错误类型如上文所述,一个是拼音,一个是字形。
模型结构图
The Semantic Encoder
The input tokens X = (x1, . . . , xN ) are first
projected into Ht0
through the input embedding.
Then the computation of Transformer (Vaswani
et al., 2017) encoder layers can be formulated as:
The Phonetic Encoder(拼音encoder)
The 5 kinds of tones (take
the final “a” as an example, { a,¯ a,´ a,ˇ a, a ` }) can be
mapped into numbers {1, 2, 3, 4, 0}
The Character-level Encoder
a single-layer
uni-directional GRU (Cho et al., 2014), which encodes the pinyin of the i-th character xi as:
The Graphic Encoder
**fused module **
采用的gate机制实现的embedding的融合。