🌻个人主页:相洋同学
🥇学习在于行动、总结和坚持,共勉!
目录
Chain Rule (链式法则)
Dimensionality Reduction (降维)
Long Short-Term Memory (LSTM) (长短期记忆网络)
Gradient Explosion (梯度爆炸)
Gradient Vanishing (梯度消失)
Dropout (Dropout)
Seq2Seq (Seq2Seq)
One-Hot Encoding (One-Hot 编码)
Self-Attention Mechanism (自注意力机制)
Multi-Head Attention Mechanism (多头注意力机制)
Chain Rule (链式法则)
The Chain Rule is a fundamental principle in calculus used to compute the derivative of a composite function. It states that if you have two functions, where one function is applied to the result of another function, the derivative of the composite function is the derivative of the outer function multiplied by the derivative of the inner function.
- fundamental(基本的、根本的)
- calculus (微积分)
- derivative (导数)
- composite function (复合函数)
- function (函数)
- multiplied (乘以)
Dimensionality Reduction (降维)
Dimensionality Reduction refers to the process of reducing the number of random variables under consideration by obtaining a set of principal variables. It's often used in the field of machine learning and statistics to simplify models, improve speed, and reduce noise in data.
- refers to(概念、指的是)
- random variables (随机变量)
- principal variables (主要变量)
- statistics (统计学)
- simplify (简化)
Long Short-Term Memory (LSTM) (长短期记忆网络)
Long Short-Term Memory networks, or LSTMs, are a special kind of Recurrent Neural Network (RNN) capable of learning long-term dependencies. LSTMs are designed to avoid the long-term dependency problem, allowing them to remember information for long periods.
- long-term dependencies (长期依赖)
- long-term dependency problem (长期依赖问题)
- periods (周期)
Gradient Explosion (梯度爆炸)
Gradient Explosion refers to a problem in training deep neural networks where gradients of the network's loss function become too large, causing updates to the network's weights to be so large that they overshoot the optimal values, leading to an unstable training process and divergence.
- overshoot (超过)
- optimal values (最优值)
- unstable (不稳定)
- divergence (发散)
Gradient Vanishing (梯度消失)
Gradient Vanishing is a problem encountered in training deep neural networks, where the gradients of the network's loss function become too small, significantly slowing down the training process or stopping it altogether, as the network weights fail to update in a meaningful way.
- encountered (遇到)
- significantly (显著地)
- altogether (完全)
- meaningful way (有意义的方式)
Dropout (Dropout)
Dropout is a regularization technique used in training neural networks to prevent overfitting. By randomly omitting a subset of neurons during the training process, dropout forces the network to learn more robust features that are not dependent on any single set of neurons.
- regularization technique (正则化技术)
- prevent (防止)
- omitting (省略)
- subset (子集)
- robust features (健壮的特征)
- dependent (依赖)
- single set (单一集合)
Seq2Seq (Seq2Seq)
Seq2Seq, or Sequence to Sequence, is a model used in machine learning that transforms a given sequence of elements, such as words in a sentence, into another sequence. This model is widely used in tasks like machine translation, where an input sentence in one language is converted into an output sentence in another language.
- Sequence to Sequence (序列到序列)
- transforms (转换)
- sequence (序列)
- elements (元素)
- converted into(将某物变换或转换成)
One-Hot Encoding (One-Hot 编码)
One-Hot Encoding is a process where categorical variables are converted into a form that could be provided to ML algorithms to do a better job in prediction. It represents each category with a vector that has one element set to 1 and all other elements set to 0.
- categorical variables (类别变量)
- converted (转换)
- ML algorithms (机器学习算法)
- represents (表示)
- category (类别)
- element (元素)
Self-Attention Mechanism (自注意力机制)
The Self-Attention Mechanism allows a model to weigh the importance of different parts of the input data differently. It is an essential component of Transformer models, enabling them to dynamically prioritize which parts of the input to focus on as they process data.
- weigh (权衡)
- essential component (重要组成部分)
- dynamically (动态地)
- prioritize (优先考虑)
- process data (处理数据)
Multi-Head Attention Mechanism (多头注意力机制)
The Multi-Head Attention Mechanism is a technique used in Transformer models that allows the model to attend to information from different representation subspaces at different positions. It performs multiple self-attention operations in parallel, enhancing the model's ability to focus on various aspects of the input data simultaneously.
- attend to (关注)
- representation subspaces (表示子空间)
- positions (位置)
- performs (执行)
- self-attention operations (自注意力操作)
- parallel (并行)
- enhancing (增强)
- various aspects (各个方面)
- simultaneously (同时)
以上
君子坐而论道,少年起而行之,共勉