大纲
- 矩阵微积分:多元微积分的一种特殊表达,尤其是在矩阵空间上进行讨论的时候
- 逆矩阵(inverse matrix)
- 矩阵分解:特征分解(Eigendecomposition),又称谱分解(Spectral decomposition);LU分解;奇异值分解(singular value decomposition);QR分解;科列斯基分解
- 矩阵行列式(Determinant):在欧几里得空间中,行列式描述的是一个线性变换对“体积”所造成的影响
- 特征向量(eigenvector): A v = λ v Av=\lambda v Av=λv,其中 λ \lambda λ为特征值, v v v为 A A A的特征向量, A A A的所有特征值的全体叫 A A A的谱,记为 λ ( A ) \lambda(A) λ(A)
- 迹(trance): tr ( A ) = A 1 , 1 + ⋯ + A n , n \operatorname{tr}(\mathbf{A}) = \mathbf{A}_{1, 1} + \cdots + \mathbf{A}_{n, n} tr(A)=A1,1+⋯+An,n,一个矩阵的迹是其特征值的总和
- 正交矩阵(orthogonal matrix):是一个方阵,其行向量與列向量皆為正交的单位向量,使得該矩陣的转置矩阵為其逆矩阵。 Q Q T = I QQ^T=I QQT=I
- 正定矩阵和半正定矩阵(positive semi-definite matrix):一个 n × n n\times n n×n 的实对称矩阵 M M M 是正定的,当且仅当对于所有的非零实系数向量 z \mathbf {z} z,都有 z T M z > 0 \mathbf {z} ^{T}M\mathbf {z} >0 zTMz>0。其中 z T \mathbf {z} ^{T} zT表示 z \mathbf {z} z 的转置
- 伴随矩阵(adjugate matrix):如果矩阵可逆,那么它的逆矩阵和它的伴随矩阵之间只差一个系数
- 共轭矩阵(又叫Hermite矩阵):矩阵本身先转置再把矩阵中每个元素取共轭(虚部变号的运算)得到的矩阵
- 共轭转置(conjugate transpose or Hermitian transpose): A ∗ = ( A ‾ ) T = A T ‾ A^* = (\overline{A})^\mathrm{T} = \overline{A^\mathrm{T}} A∗=(A)T=AT, A ‾ \overline{A} A表示对矩阵A元素取复共轭
- 酉矩阵(又叫幺正矩阵,unitary matrix):指其共轭转置恰为其逆矩阵的复数方阵, U ∗ U = U U ∗ = I n U^{*}U=UU^{*}=I_{n} U∗U=UU∗=In
- 实对称矩阵:元素都为实数的对称矩阵
- 对角矩阵(diagonal matrix):一个主对角线之外的元素皆为0的矩阵,常写为diag(a1,a2,…,an)
- 雅可比矩阵(Jacobian matrix): J = [ ∂ f ∂ x 1 ⋯ ∂ f ∂ x n ] = [ ∂ f 1 ∂ x 1 ⋯ ∂ f 1 ∂ x n ⋮ ⋱ ⋮ ∂ f m ∂ x 1 ⋯ ∂ f m ∂ x n ] \mathbf {J} ={\begin{bmatrix}{\dfrac {\partial \mathbf {f} }{\partial x_{1}}}&\cdots &{\dfrac {\partial \mathbf {f} }{\partial x_{n}}}\end{bmatrix}}={\begin{bmatrix}{\dfrac {\partial f_{1}}{\partial x_{1}}}&\cdots &{\dfrac {\partial f_{1}}{\partial x_{n}}}\\\vdots &\ddots &\vdots \\{\dfrac {\partial f_{m}}{\partial x_{1}}}&\cdots &{\dfrac {\partial f_{m}}{\partial x_{n}}}\end{bmatrix}} J=[∂x1∂f⋯∂xn∂f]= ∂x1∂f1⋮∂x1∂fm⋯⋱⋯∂xn∂f1⋮∂xn∂fm
- 黑塞矩阵(又叫海森矩阵,Hessian matrix):由多变量实值函数的所有二阶偏导数组成的方阵, H i j = ∂ 2 f ∂ x i ∂ x j \mathbf {H} _{ij}={\frac {\partial ^{2}f}{\partial x_{i}\partial x_{j}}} Hij=∂xi∂xj∂2f
- 矩阵范数(matrix norm)
一、矩阵微积分
向量对向量的偏导称 Jacobian Matrix:
J
=
∂
y
(
n
)
∂
x
(
m
)
=
(
∂
y
1
∂
x
1
⋯
∂
y
1
∂
x
m
⋮
⋱
⋮
∂
y
n
∂
x
1
⋯
∂
y
n
∂
x
m
)
n
×
m
J = \frac{\partial{y_{(n)}}}{\partial{x_{(m)}}} = \begin{pmatrix} \frac{\partial{y_1}}{\partial{x_1}} & \cdots & \frac{\partial{y_1}}{\partial{x_m}} \\ \vdots & \ddots & \vdots \\ \frac{\partial{y_n}}{\partial{x_1}} & \cdots & \frac{\partial{y_n}}{\partial{x_m}} \end{pmatrix}_{n \times m}
J=∂x(m)∂y(n)=
∂x1∂y1⋮∂x1∂yn⋯⋱⋯∂xm∂y1⋮∂xm∂yn
n×m
标量对向量的偏导、向量对标量的偏导都是相应向量为一维的情况。
这里采用了称为分子布局的表示方法,另外还有将矩阵(向量)微积分表示为这里这种形式的转置的,称为分母布局。但用分母布局表示时,下面的运算法则没有这么好记的形式。
与标量微积分对比:
-
加法法则不变 ∂ y + z ∂ x = ∂ y ∂ x + ∂ z ∂ x \frac{\partial{y + z}}{\partial{x}} = \frac{\partial{y}}{\partial{x}} + \frac{\partial{z}}{\partial{x}} ∂x∂y+z=∂x∂y+∂x∂z
-
链式法则不变 ∂ z ∂ x = ∂ z ∂ y ⋅ ∂ y ∂ x \frac{\partial{z}}{\partial{x}} = \frac{\partial{z}}{\partial{y}} \cdot \frac{\partial{y}}{\partial{x}} ∂x∂z=∂y∂z⋅∂x∂y
-
乘法法则形式不变 ∂ y ⊗ z ∂ x = y ⊗ ∂ z ∂ x + z ⊗ ∂ y ∂ x \frac{\partial{y \otimes z}}{\partial{x}} = y \otimes \frac{\partial{z}}{\partial{x}} + z \otimes \frac{\partial{y}}{\partial{x}} ∂x∂y⊗z=y⊗∂x∂z+z⊗∂x∂y
- 向量内积 ∂ y T z ∂ x = y T ⋅ ∂ z ∂ x + z T ⋅ ∂ y ∂ x \frac{\partial{y^Tz}}{\partial{x}} = y^T \cdot \frac{\partial{z}}{\partial{x}} + z^T \cdot \frac{\partial{y}}{\partial{x}} ∂x∂yTz=yT⋅∂x∂z+zT⋅∂x∂y
- 矩阵乘积(A 与 x 无关) ∂ A y ∂ x = A ⋅ ∂ y ∂ x \frac{\partial{Ay}}{\partial{x}} = A \cdot \frac{\partial{y}}{\partial{x}} ∂x∂Ay=A⋅∂x∂y
- 向量数乘(y 或 z 为标量) ∂ y z ∂ x = y ⋅ ∂ z ∂ x + z ⋅ ∂ y ∂ x \frac{\partial{yz}}{\partial{x}} = y \cdot \frac{\partial{z}}{\partial{x}} + z \cdot \frac{\partial{y}}{\partial{x}} ∂x∂yz=y⋅∂x∂z+z⋅∂x∂y
∑ i = 1 n i 2 = n ( n + 1 ) ( 2 n + 1 ) 6 \sum_{i=1}^n i^2 = \frac{n(n+1)(2n+1)}{6} ∑i=1ni2=6n(n+1)(2n+1)
1. 表示法
- A , X , Y \mathbf{A}, \mathbf{X}, \mathbf{Y} A,X,Y 等:粗体的大写字母,表示一个矩阵
- a , x , y \mathbf a, \mathbf x, \mathbf y a,x,y 等:粗体的小写字母,表示一个向量;
- a , x , y a, x, y a,x,y 等:斜体的小写字母,表示一个标量;
- X T \mathbf X^T XT:表示矩阵 X \mathbf X X 的转置;
- X H \mathbf X^H XH:表示矩阵 X \mathbf X X 的共轭转置;
- ∣ X ∣ | \mathbf X | ∣X∣:表示方阵 X \mathbf X X 的行列式;
- ∣ ∣ x ∣ ∣ || \mathbf x || ∣∣x∣∣:表示向量 x \mathbf x x 的范数;
- I \mathbf I I:表示单位矩阵。
2. 向量微分
2.1 向量-标量
列向量函数 y = [ y 1 y 2 ⋯ y m ] T \mathbf y = \begin{bmatrix} y_1 & y_2 & \cdots & y_m \end{bmatrix}^T y=[y1y2⋯ym]T 对标量 x x x 的导数称为 y \mathbf y y 的切向量,可以以 分子记法 表示为 ∂ y ∂ x = [ ∂ y 1 ∂ x ∂ y 2 ∂ x ⋮ ∂ y m ∂ x ] m × 1 \frac{\partial \mathbf y}{\partial x} = \begin{bmatrix} \frac{\partial y_1}{\partial x} \newline \frac{\partial y_2}{\partial x} \newline \vdots \newline \frac{\partial y_m}{\partial x}\end{bmatrix}_{m \times 1} ∂x∂y= ∂x∂y1∂x∂y2⋮∂x∂ym m×1
若以 分母记法 则可以表示为 ∂ y ∂ x = [ ∂ y 1 ∂ x ∂ y 2 ∂ x ⋯ ∂ y m ∂ x ] 1 × m \frac{\partial \mathbf y}{\partial x} = \begin{bmatrix} \frac{\partial y_1}{\partial x} & \frac{\partial y_2}{\partial x} & \cdots & \frac{\partial y_m}{\partial x}\end{bmatrix}_{1 \times m} ∂x∂y=[∂x∂y1∂x∂y2⋯∂x∂ym]1×m
2.2 标量-向量
标量函数 y y y 对列向量 x = [ x 1 x 2 ⋯ x n ] T \mathbf x = \begin{bmatrix} x_1 & x_2 & \cdots & x_n \end{bmatrix}^T x=[x1x2⋯xn]T 的导数可以以 分子记法 表示为 ∂ y ∂ x = [ ∂ y ∂ x 1 ∂ y ∂ x 2 ⋯ ∂ y ∂ x n ] 1 × n \frac{\partial y}{\partial \mathbf x} = \begin{bmatrix} \frac{\partial y}{\partial x_1} & \frac{\partial y}{\partial x_2} & \cdots & \frac{\partial y}{\partial x_n}\end{bmatrix}_{1 \times n} ∂x∂y=[∂x1∂y∂x2∂y⋯∂xn∂y]1×n
若以 分母记法 则可以表示为 ∂ y ∂ x = [ ∂ y ∂ x 1 ∂ y ∂ x 2 ⋮ ∂ y ∂ x n ] n × 1 \frac{\partial y}{\partial \mathbf x} = \begin{bmatrix} \frac{\partial y}{\partial x_1} \newline \frac{\partial y}{\partial x_2} \newline \vdots \newline \frac{\partial y}{\partial x_n}\end{bmatrix}_{n \times 1} ∂x∂y= ∂x1∂y∂x2∂y⋮∂xn∂y n×1
2.3 向量-向量
列向量函数
y
=
[
y
1
y
2
⋯
y
m
]
T
\mathbf y = \begin{bmatrix} y_1 & y_2 & \cdots & y_m \end{bmatrix}^T
y=[y1y2⋯ym]T 对列向量
x
=
[
x
1
x
2
⋯
x
n
]
T
\mathbf x = \begin{bmatrix} x_1 & x_2 & \cdots & x_n \end{bmatrix}^T
x=[x1x2⋯xn]T 的导数可以以 分子记法 表示为
∂
y
∂
x
=
[
∂
y
1
∂
x
1
∂
y
1
∂
x
2
⋯
∂
y
1
∂
x
n
∂
y
2
∂
x
1
∂
y
2
∂
x
2
⋯
∂
y
2
∂
x
n
⋮
⋮
⋱
⋮
∂
y
m
∂
x
1
∂
y
m
∂
x
2
⋯
∂
y
m
∂
x
n
]
m
×
n
\frac{\partial \mathbf y}{\partial \mathbf x} = \begin{bmatrix} \frac{\partial y_1}{\partial x_1} & \frac{\partial y_1}{\partial x_2} & \cdots & \frac{\partial y_1}{\partial x_n} \newline \frac{\partial y_2}{\partial x_1} & \frac{\partial y_2}{\partial x_2} & \cdots & \frac{\partial y_2}{\partial x_n} \newline \vdots & \vdots & \ddots & \vdots \newline \frac{\partial y_m}{\partial x_1} & \frac{\partial y_m}{\partial x_2} & \cdots & \frac{\partial y_m}{\partial x_n} \newline\end{bmatrix}_{m \times n}
∂x∂y=
∂x1∂y1∂x1∂y2⋮∂x1∂ym∂x2∂y1∂x2∂y2⋮∂x2∂ym⋯⋯⋱⋯∂xn∂y1∂xn∂y2⋮∂xn∂ym
m×n
若以 分母记法 则可以表示为
∂
y
∂
x
=
[
∂
y
1
∂
x
1
∂
y
2
∂
x
1
⋯
∂
y
m
∂
x
1
∂
y
1
∂
x
1
∂
y
2
∂
x
1
⋯
∂
y
m
∂
x
1
⋮
⋮
⋱
⋮
∂
y
1
∂
x
1
∂
y
2
∂
x
1
⋯
∂
y
m
∂
x
1
]
n
×
m
\frac{\partial \mathbf y}{\partial \mathbf x} = \begin{bmatrix} \frac{\partial y_1}{\partial x_1} & \frac{\partial y_2}{\partial x_1} &\cdots & \frac{\partial y_m}{\partial x_1} \newline \frac{\partial y_1}{\partial x_1} & \frac{\partial y_2}{\partial x_1} & \cdots &\frac{\partial y_m}{\partial x_1} \newline \vdots &\vdots & \ddots & \vdots \newline \frac{\partial y_1}{\partial x_1} &\frac{\partial y_2}{\partial x_1} & \cdots & \frac{\partial y_m}{\partial x_1} \newline\end{bmatrix}_{n \times m}
∂x∂y=
∂x1∂y1∂x1∂y1⋮∂x1∂y1∂x1∂y2∂x1∂y2⋮∂x1∂y2⋯⋯⋱⋯∂x1∂ym∂x1∂ym⋮∂x1∂ym
n×m
3. 矩阵微分
1. 矩阵-标量
形状为
m
×
n
m \times n
m×n 的矩阵函数
Y
\mathbf Y
Y 对标量
x
x
x 的导数称为
Y
\mathbf Y
Y 的切矩阵,可以以 分子记法 表示为
∂
Y
∂
x
=
[
∂
y
11
∂
x
∂
y
12
∂
x
⋯
∂
y
1
n
∂
x
∂
y
21
∂
x
∂
y
22
∂
x
⋯
∂
y
2
n
∂
x
⋮
⋮
⋱
⋮
∂
y
m
1
∂
x
∂
y
m
2
∂
x
⋯
∂
y
m
n
∂
x
]
m
×
n
\frac{\partial \mathbf Y}{\partial x} = \begin{bmatrix} \frac{\partial y_{11}}{\partial x} & \frac{\partial y_{12}}{\partial x} & \cdots & \frac{\partial y_{1n}}{\partial x} \newline \frac{\partial y_{21}}{\partial x} & \frac{\partial y_{22}}{\partial x} & \cdots & \frac{\partial y_{2n}}{\partial x} \newline \vdots & \vdots & \ddots & \vdots \newline \frac{\partial y_{m1}}{\partial x} & \frac{\partial y_{m2}}{\partial x} & \cdots & \frac{\partial y_{mn}}{\partial x} \newline\end{bmatrix}_{m \times n}
∂x∂Y=
∂x∂y11∂x∂y21⋮∂x∂ym1∂x∂y12∂x∂y22⋮∂x∂ym2⋯⋯⋱⋯∂x∂y1n∂x∂y2n⋮∂x∂ymn
m×n
2. 标量-矩阵
标量函数 y y y 对形状为 p × q p \times q p×q 的矩阵 X \mathbf X X 的导数可以 分子记法 表示为
∂
y
∂
X
=
[
∂
y
∂
x
11
∂
y
∂
x
21
⋯
∂
y
∂
x
p
1
∂
y
∂
x
12
∂
y
∂
x
22
⋯
∂
y
∂
x
p
2
⋮
⋮
⋱
⋮
∂
y
∂
x
1
q
∂
y
∂
x
2
q
⋯
∂
y
∂
x
p
q
]
q
×
p
\frac{\partial y}{\partial \mathbf X} = \begin{bmatrix} \frac{\partial y}{\partial x_{11}} & \frac{\partial y}{\partial x_{21}} & \cdots & \frac{\partial y}{\partial x_{p1}} \newline \frac{\partial y}{\partial x_{12}} & \frac{\partial y}{\partial x_{22}} & \cdots & \frac{\partial y}{\partial x_{p2}} \newline \vdots & \vdots & \ddots & \vdots \newline \frac{\partial y}{\partial x_{1q}} & \frac{\partial y}{\partial x_{2q}} & \cdots & \frac{\partial y}{\partial x_{pq}} \newline\end{bmatrix}_{q \times p}
∂X∂y=
∂x11∂y∂x12∂y⋮∂x1q∂y∂x21∂y∂x22∂y⋮∂x2q∂y⋯⋯⋱⋯∂xp1∂y∂xp2∂y⋮∂xpq∂y
q×p
若以 分母记法 则可以表示为
∂
y
∂
X
=
[
∂
y
∂
x
11
∂
y
∂
x
12
⋯
∂
y
∂
x
1
q
∂
y
∂
x
21
∂
y
∂
x
22
⋯
∂
y
∂
x
2
q
⋮
⋮
⋱
⋮
∂
y
∂
x
p
1
∂
y
∂
x
p
2
⋯
∂
y
∂
x
p
q
]
p
×
q
\frac{\partial y}{\partial \mathbf X} = \begin{bmatrix} \frac{\partial y}{\partial x_{11}} & \frac{\partial y}{\partial x_{12}} & \cdots & \frac{\partial y}{\partial x_{1q}} \newline \frac{\partial y}{\partial x_{21}} & \frac{\partial y}{\partial x_{22}} & \cdots & \frac{\partial y}{\partial x_{2q}} \newline \vdots & \vdots & \ddots & \vdots \newline \frac{\partial y}{\partial x_{p1}} & \frac{\partial y}{\partial x_{p2}} & \cdots & \frac{\partial y}{\partial x_{pq}} \newline\end{bmatrix}_{p \times q}
∂X∂y=
∂x11∂y∂x21∂y⋮∂xp1∂y∂x12∂y∂x22∂y⋮∂xp2∂y⋯⋯⋱⋯∂x1q∂y∂x2q∂y⋮∂xpq∂y
p×q
4. 恒等式
以下各式中,无特别备注,默认被求导的复合函数的各因式皆不是求导变量的函数。
4.1. 向量-向量
表达式 | 分子记法 | 分母记法 | 备注 |
---|---|---|---|
∂ a ∂ x = \frac{\partial \mathbf a}{\partial \mathbf x} = ∂x∂a= | 0 \mathbf 0 0 | 0 \mathbf 0 0 | |
∂ x ∂ x = \frac{\partial \mathbf x}{\partial \mathbf x} = ∂x∂x= | I \mathbf I I | I \mathbf I I | |
∂ A x ∂ x = \frac{\partial \mathbf A \mathbf x}{\partial \mathbf x} = ∂x∂Ax= | A \mathbf A A | A T \mathbf A^T AT | |
∂ x T A ∂ x = \frac{\partial \mathbf x^T \mathbf A}{\partial \mathbf x} = ∂x∂xTA= | A T \mathbf A^T AT | A \mathbf A A | |
∂ a u ∂ x = \frac{\partial a \mathbf u}{\partial \mathbf x} = ∂x∂au= | a ∂ u ∂ x a \frac{\partial \mathbf u}{\partial x} a∂x∂u | a ∂ u ∂ x a \frac{\partial \mathbf u}{\partial x} a∂x∂u | u = u ( x ) \mathbf u = \mathbf u(\mathbf x) u=u(x) |
∂ v u ∂ x = \frac{\partial v \mathbf u}{\partial \mathbf x} = ∂x∂vu= | v ∂ u ∂ x + u ∂ v ∂ x v \frac{\partial \mathbf u}{\partial \mathbf x} + \mathbf u \frac{\partial v}{\partial \mathbf x} v∂x∂u+u∂x∂v | v ∂ u ∂ x + ∂ v ∂ x u T v \frac{\partial \mathbf u}{\partial \mathbf x} + \frac{\partial v}{\partial \mathbf x} \mathbf u^T v∂x∂u+∂x∂vuT | v = v ( x ) , u = u ( x ) v = v(\mathbf x), \mathbf u = \mathbf u(\mathbf x) v=v(x),u=u(x) |
∂ A u ∂ x = \frac{\partial \mathbf A \mathbf u}{\partial \mathbf x} = ∂x∂Au= | A ∂ u ∂ x \mathbf A \frac{\partial \mathbf u}{\partial \mathbf x} A∂x∂u | ∂ u ∂ x A T \frac{\partial \mathbf u}{\partial \mathbf x} \mathbf A^T ∂x∂uAT | u = u ( x ) \mathbf u = \mathbf u(\mathbf x) u=u(x) |
∂ ( u + v ) ∂ x = \frac{\partial (\mathbf u + \mathbf v)}{\partial \mathbf x} = ∂x∂(u+v)= | ∂ u ∂ x + ∂ v ∂ x \frac{\partial \mathbf u}{\partial \mathbf x} + \frac{\partial \mathbf v}{\partial \mathbf x} ∂x∂u+∂x∂v | ∂ u ∂ x + ∂ v ∂ x \frac{\partial \mathbf u}{\partial \mathbf x} + \frac{\partial \mathbf v}{\partial \mathbf x} ∂x∂u+∂x∂v | u = u ( x ) , v = v ( x ) \mathbf u = \mathbf u(\mathbf x), \mathbf v = \mathbf v(\mathbf x) u=u(x),v=v(x) |
∂ f ( g ( u ) ) ∂ x = \frac{\partial \mathbf f(\mathbf g(\mathbf u))}{\partial \mathbf x} = ∂x∂f(g(u))= | ∂ f ( g ) ∂ g ∂ g ( u ) ∂ u ∂ u ∂ x \frac{\partial \mathbf f(\mathbf g)}{\partial \mathbf g} \frac{\partial \mathbf g(\mathbf u)}{\partial \mathbf u} \frac{\partial \mathbf u}{\partial \mathbf x} ∂g∂f(g)∂u∂g(u)∂x∂u | ∂ u ∂ x ∂ g ( u ) ∂ u ∂ f ( g ) ∂ g \frac{\partial \mathbf u}{\partial \mathbf x} \frac{\partial \mathbf g(\mathbf u)}{\partial \mathbf u} \frac{\partial \mathbf f(\mathbf g)}{\partial \mathbf g} ∂x∂u∂u∂g(u)∂g∂f(g) | u = u ( x ) \mathbf u = \mathbf u(\mathbf x) u=u(x) |
4.2. 标量-向量
表达式 | 分子记法 | 分母记法 | 备注 |
---|---|---|---|
∂ a ∂ x = \frac{\partial a}{\partial \mathbf x} = ∂x∂a= | 0 T \mathbf 0^T 0T | 0 \mathbf 0 0 | |
∂ a u ∂ x = \frac{\partial a u}{\partial \mathbf x} = ∂x∂au= | a ∂ u ∂ x a \frac{\partial \mathbf u}{\partial \mathbf x} a∂x∂u | a ∂ u ∂ x a \frac{\partial \mathbf u}{\partial \mathbf x} a∂x∂u | u = u ( x ) u = u(\mathbf x) u=u(x) |
∂ ( u + v ) ∂ x = \frac{\partial (u + v)}{\partial \mathbf x} = ∂x∂(u+v)= | ∂ u ∂ x + ∂ v ∂ x \frac{\partial u}{\partial \mathbf x} + \frac{\partial v}{\partial \mathbf x} ∂x∂u+∂x∂v | ∂ u ∂ x + ∂ v ∂ x \frac{\partial u}{\partial \mathbf x} + \frac{\partial v}{\partial \mathbf x} ∂x∂u+∂x∂v | u = u ( x ) , v = v ( x ) u = u(\mathbf x), v = v(\mathbf x) u=u(x),v=v(x) |
∂ u v ∂ x = \frac{\partial u v}{\partial \mathbf x} = ∂x∂uv= | u ∂ v ∂ x + v ∂ u ∂ x u \frac{\partial v}{\partial \mathbf x} + v \frac{\partial u}{\partial \mathbf x} u∂x∂v+v∂x∂u | u ∂ v ∂ x + v ∂ u ∂ x u \frac{\partial v}{\partial \mathbf x} + v \frac{\partial u}{\partial \mathbf x} u∂x∂v+v∂x∂u | u = u ( x ) , v = v ( x ) u = u(\mathbf x), v = v(\mathbf x) u=u(x),v=v(x) |
∂ f ( g ( u ) ) ∂ x = \frac{\partial f(g(u))}{\partial \mathbf x} = ∂x∂f(g(u))= | ∂ f ( g ) ∂ g ∂ g ( u ) ∂ u ∂ u ∂ x \frac{\partial f(g)}{\partial g} \frac{\partial g(u)}{\partial u} \frac{\partial u}{\partial \mathbf x} ∂g∂f(g)∂u∂g(u)∂x∂u | ∂ f ( g ) ∂ g ∂ g ( u ) ∂ u ∂ u ∂ x \frac{\partial f(g)}{\partial g} \frac{\partial g(u)}{\partial u} \frac{\partial u}{\partial \mathbf x} ∂g∂f(g)∂u∂g(u)∂x∂u | u = u ( x ) u = u(\mathbf x) u=u(x) |
∂ ( u ⋅ v ) ∂ x = ∂ u T v ∂ x = \frac{\partial (\mathbf u \cdot \mathbf v)}{\partial \mathbf x} = \frac{\partial \mathbf u^T \mathbf v}{\partial \mathbf x} = ∂x∂(u⋅v)=∂x∂uTv= | u T ∂ v ∂ x + v T ∂ u ∂ x \mathbf u^T \frac{\partial \mathbf v}{\partial \mathbf x} + \mathbf v^T \frac{\partial \mathbf u}{\partial \mathbf x} uT∂x∂v+vT∂x∂u | ∂ v ∂ x u + ∂ u ∂ x v \frac{\partial \mathbf v}{\partial \mathbf x} \mathbf u + \frac{\partial \mathbf u}{\partial \mathbf x} \mathbf v ∂x∂vu+∂x∂uv | u = u ( x ) , v = v ( x ) \mathbf u = \mathbf u(\mathbf x), \mathbf v = \mathbf v(\mathbf x) u=u(x),v=v(x) |
∂ ( u ⋅ A v ) ∂ x = ∂ u T A v ∂ x = \frac{\partial (\mathbf u \cdot \mathbf A \mathbf v)}{\partial \mathbf x} = \frac{\partial \mathbf u^T \mathbf A \mathbf v}{\partial \mathbf x} = ∂x∂(u⋅Av)=∂x∂uTAv= | u T A ∂ v ∂ x + v T A T ∂ u ∂ x \mathbf u^T \mathbf A \frac{\partial \mathbf v}{\partial \mathbf x} + \mathbf v^T \mathbf A^T \frac{\partial \mathbf u}{\partial \mathbf x} uTA∂x∂v+vTAT∂x∂u | ∂ v ∂ x A T u + ∂ u ∂ x A v \frac{\partial \mathbf v}{\partial \mathbf x} \mathbf A^T \mathbf u + \frac{\partial \mathbf u}{\partial \mathbf x} \mathbf A \mathbf v ∂x∂vATu+∂x∂uAv | u = u ( x ) , v = v ( x ) \mathbf u = \mathbf u(\mathbf x), \mathbf v = \mathbf v(\mathbf x) u=u(x),v=v(x) |
∂ ( a ⋅ u ) ∂ x = ∂ a T u ∂ x = \frac{\partial (\mathbf a \cdot \mathbf u)}{\partial \mathbf x} = \frac{\partial \mathbf a^T \mathbf u}{\partial \mathbf x} = ∂x∂(a⋅u)=∂x∂aTu= | a T ∂ u ∂ x \mathbf a^T \frac{\partial \mathbf u}{\partial \mathbf x} aT∂x∂u | ∂ u ∂ x a \frac{\partial \mathbf u}{\partial \mathbf x} \mathbf a ∂x∂ua | u = u ( x ) \mathbf u = \mathbf u(\mathbf x) u=u(x) |
∂ b T A x ∂ x = \frac{\partial \mathbf b^T \mathbf A \mathbf x}{\partial \mathbf x} = ∂x∂bTAx= | b T A \mathbf b^T \mathbf A bTA | A T b \mathbf A^T \mathbf b ATb | |
∂ x T A x ∂ x = \frac{\partial \mathbf x^T \mathbf A \mathbf x}{\partial \mathbf x} = ∂x∂xTAx= | x T ( A + A T ) \mathbf x^T (\mathbf A + \mathbf A^T) xT(A+AT) | ( A + A T ) x (\mathbf A + \mathbf A^T) \mathbf x (A+AT)x | |
∂ 2 x T A x ∂ x ∂ x T = \frac{\partial^2 \mathbf x^T \mathbf A \mathbf x}{\partial \mathbf x \partial \mathbf x^T} = ∂x∂xT∂2xTAx= | A + A T \mathbf A + \mathbf A^T A+AT | A + A T \mathbf A + \mathbf A^T A+AT | |
∂ a T x x T b ∂ x = \frac{\partial \mathbf a^T \mathbf x \mathbf x^T \mathbf b}{\partial \mathbf x} = ∂x∂aTxxTb= | x T ( a b T + b a T ) \mathbf x^T (\mathbf a \mathbf b^T + \mathbf b \mathbf a^T) xT(abT+baT) | ( a b T + b a T ) x (\mathbf a \mathbf b^T + \mathbf b \mathbf a^T) \mathbf x (abT+baT)x | |
∂ ( A x + b ) T C ( D x + e ) ∂ x = \frac{\partial (\mathbf A \mathbf x + \mathbf b)^T \mathbf C (\mathbf D \mathbf x + \mathbf e)}{\partial \mathbf x} = ∂x∂(Ax+b)TC(Dx+e)= | ( A x + b ) T C D + ( D x + e ) T C T A (\mathbf A \mathbf x + \mathbf b)^T \mathbf C \mathbf D + (\mathbf D \mathbf x + \mathbf e)^T \mathbf C^T \mathbf A (Ax+b)TCD+(Dx+e)TCTA | D T C T ( A x + b ) + A T C ( D x + e ) T \mathbf D^T \mathbf C^T(\mathbf A \mathbf x + \mathbf b) + \mathbf A^T \mathbf C (\mathbf D \mathbf x + \mathbf e)^T DTCT(Ax+b)+ATC(Dx+e)T | |
∂ ∣ ∣ x ∣ ∣ 2 ∂ x = ∂ ( x ⋅ x ) ∂ x = \frac{\partial || \mathbf x ||^2}{\partial \mathbf x} = \frac{\partial (\mathbf x \cdot \mathbf x)}{\partial \mathbf x} = ∂x∂∣∣x∣∣2=∂x∂(x⋅x)= | 2 x T 2 \mathbf x^T 2xT | 2 x 2 \mathbf x 2x | |
∂ ∣ ∣ x − a ∣ ∣ ∂ x = \frac{\partial || \mathbf x - \mathbf a || }{\partial \mathbf x} = ∂x∂∣∣x−a∣∣= | ( x − a ) T ∣ ∣ x − a ∣ ∣ \frac{(\mathbf x - \mathbf a)^T}{ || \mathbf x - \mathbf a || } ∣∣x−a∣∣(x−a)T | ( x − a ) ∣ ∣ x − a ∣ ∣ \frac{(\mathbf x - \mathbf a)}{ || \mathbf x - \mathbf a || } ∣∣x−a∣∣(x−a) |
4.3. 向量-标量
表达式 | 分子记法 | 分母记法 | 备注 |
---|---|---|---|
∂ a ∂ x = \frac{\partial \mathbf a}{\partial x} = ∂x∂a= | 0 \mathbf 0 0 | 0 \mathbf 0 0 | |
∂ a u ∂ x = \frac{\partial a \mathbf u}{\partial x} = ∂x∂au= | a ∂ u ∂ x a \frac{\partial \mathbf u}{\partial x} a∂x∂u | a ∂ u ∂ x a \frac{\partial \mathbf u}{\partial x} a∂x∂u | u = u ( x ) \mathbf u = \mathbf u(\mathbf x) u=u(x) |
∂ A u ∂ x = \frac{\partial \mathbf A \mathbf u}{\partial x} = ∂x∂Au= | A ∂ u ∂ x \mathbf A \frac{\partial \mathbf u}{\partial x} A∂x∂u | ∂ u ∂ x A T \frac{\partial \mathbf u}{\partial x} \mathbf A^T ∂x∂uAT | u = u ( x ) \mathbf u = \mathbf u(\mathbf x) u=u(x) |
∂ u T ∂ x = \frac{\partial \mathbf u^T}{\partial x} = ∂x∂uT= | ( ∂ u ∂ x ) T \left( \frac{\partial \mathbf u}{\partial x} \right)^T (∂x∂u)T | ( ∂ u ∂ x ) T \left( \frac{\partial \mathbf u}{\partial x} \right)^T (∂x∂u)T | u = u ( x ) \mathbf u = \mathbf u(\mathbf x) u=u(x) |
∂ ( u + v ) ∂ x = \frac{\partial (\mathbf u + \mathbf v)}{\partial x} = ∂x∂(u+v)= | ∂ u ∂ x + ∂ v ∂ x \frac{\partial \mathbf u}{\partial x} + \frac{\partial \mathbf v}{\partial x} ∂x∂u+∂x∂v | ∂ u ∂ x + ∂ v ∂ x \frac{\partial \mathbf u}{\partial x} + \frac{\partial \mathbf v}{\partial x} ∂x∂u+∂x∂v | u = u ( x ) , v = v ( x ) \mathbf u = \mathbf u(\mathbf x), \mathbf v = \mathbf v(\mathbf x) u=u(x),v=v(x) |
∂ ( u T × v ) ∂ x = \frac{\partial (\mathbf u^T \times \mathbf v)}{\partial x} = ∂x∂(uT×v)= | ( ∂ u ∂ x ) T × v + u T × ∂ v ∂ x \left( \frac{\partial \mathbf u}{\partial x} \right)^T \times \mathbf v + \mathbf u^T \times \frac{\partial \mathbf v}{\partial x} (∂x∂u)T×v+uT×∂x∂v | ∂ u ∂ x × v + u T × ( ∂ v ∂ x ) T \frac{\partial \mathbf u}{\partial x} \times \mathbf v + \mathbf u^T \times \left( \frac{\partial \mathbf v}{\partial x} \right)^T ∂x∂u×v+uT×(∂x∂v)T | u = u ( x ) , v = v ( x ) \mathbf u = \mathbf u(\mathbf x), \mathbf v = \mathbf v(\mathbf x) u=u(x),v=v(x) |
∂ f ( g ( u ) ) ∂ x = \frac{\partial \mathbf f(\mathbf g(\mathbf u))}{\partial x} = ∂x∂f(g(u))= | ∂ f ( g ) ∂ g ∂ g ( u ) ∂ u ∂ u ∂ x \frac{\partial \mathbf f(\mathbf g)}{\partial \mathbf g} \frac{\partial \mathbf g(\mathbf u)}{\partial \mathbf u} \frac{\partial \mathbf u}{\partial x} ∂g∂f(g)∂u∂g(u)∂x∂u | ∂ u ∂ x ∂ g ( u ) ∂ u ∂ f ( g ) ∂ g \frac{\partial \mathbf u}{\partial x}\frac{\partial \mathbf g(\mathbf u)}{\partial \mathbf u} \frac{\partial \mathbf f(\mathbf g)}{\partial \mathbf g} ∂x∂u∂u∂g(u)∂g∂f(g) | u = u ( x ) \mathbf u = \mathbf u(\mathbf x) u=u(x) |
∂ ( U × v ) ∂ x = \frac{\partial (\mathbf U \times \mathbf v)}{\partial x} = ∂x∂(U×v)= | ∂ U ∂ x × v + U × ∂ v ∂ x \frac{\partial \mathbf U}{\partial x} \times \mathbf v + \mathbf U \times \frac{\partial \mathbf v}{\partial x} ∂x∂U×v+U×∂x∂v | v T × ∂ U ∂ x + ∂ v ∂ x × U T \mathbf v^T \times \frac{\partial \mathbf U}{\partial x} + \frac{\partial \mathbf v}{\partial x} \times \mathbf U^T vT×∂x∂U+∂x∂v×UT | U = U ( x ) , v = v ( x ) \mathbf U = \mathbf U(\mathbf x), \mathbf v = \mathbf v(\mathbf x) U=U(x),v=v(x) |
4.4. 标量-矩阵
表达式 | 分子记法 | 分母记法 | 备注 |
---|---|---|---|
∂ a ∂ X = \frac{\partial a}{\partial \mathbf X} = ∂X∂a= | 0 T \mathbf 0^T 0T | 0 \mathbf 0 0 | |
∂ a u ∂ X = \frac{\partial a u}{\partial \mathbf X} = ∂X∂au= | a ∂ u ∂ X a \frac{\partial u}{\partial \mathbf X} a∂X∂u | a ∂ u ∂ X a \frac{\partial u}{\partial \mathbf X} a∂X∂u | u = u ( X ) u = u(\mathbf X) u=u(X) |
∂ ( u + v ) ∂ X = \frac{\partial (u + v)}{\partial \mathbf X} = ∂X∂(u+v)= | ∂ u ∂ X + ∂ v ∂ X \frac{\partial u}{\partial \mathbf X} + \frac{\partial v}{\partial \mathbf X} ∂X∂u+∂X∂v | ∂ u ∂ X + ∂ v ∂ X \frac{\partial u}{\partial \mathbf X} + \frac{\partial v}{\partial \mathbf X} ∂X∂u+∂X∂v | u = u ( X ) , v = v ( X ) u = u(\mathbf X), v = v(\mathbf X) u=u(X),v=v(X) |
∂ u v ∂ X = \frac{\partial u v}{\partial \mathbf X} = ∂X∂uv= | u ∂ v ∂ X + v ∂ u ∂ X u \frac{\partial v}{\partial \mathbf X} + v \frac{\partial u}{\partial \mathbf X} u∂X∂v+v∂X∂u | u ∂ v ∂ X + v ∂ u ∂ X u \frac{\partial v}{\partial \mathbf X} + v \frac{\partial u}{\partial \mathbf X} u∂X∂v+v∂X∂u | u = u ( X ) , v = v ( X ) u = u(\mathbf X), v = v(\mathbf X) u=u(X),v=v(X) |
∂ f ( g ( u ) ) ∂ X = \frac{\partial f(g(u))}{\partial \mathbf X} = ∂X∂f(g(u))= | ∂ f ( g ) ∂ g ∂ g ( u ) ∂ u ∂ u ∂ X \frac{\partial f(g)}{\partial g} \frac{\partial g(u)}{\partial u} \frac{\partial u}{\partial \mathbf X} ∂g∂f(g)∂u∂g(u)∂X∂u | ∂ f ( g ) ∂ g ∂ g ( u ) ∂ u ∂ u ∂ X \frac{\partial f(g)}{\partial g} \frac{\partial g(u)}{\partial u} \frac{\partial u}{\partial \mathbf X} ∂g∂f(g)∂u∂g(u)∂X∂u | u = u ( X ) u = u(\mathbf X) u=u(X) |
∂ a T X b ∂ X = \frac{\partial \mathbf a^T \mathbf X \mathbf b}{\partial \mathbf X} = ∂X∂aTXb= | b a T \mathbf b \mathbf a^T baT | a b T \mathbf a \mathbf b^T abT | |
∂ a T X T b ∂ X = \frac{\partial \mathbf a^T \mathbf X^T \mathbf b}{\partial \mathbf X} = ∂X∂aTXTb= | a b T \mathbf a \mathbf b^T abT | b a T \mathbf b \mathbf a^T baT | |
∂ ( X a + b ) T C ( X a + b ) ∂ X = \frac{\partial (\mathbf X \mathbf a + \mathbf b)^T \mathbf C (\mathbf X \mathbf a + \mathbf b)}{\partial \mathbf X} = ∂X∂(Xa+b)TC(Xa+b)= | [ ( C + C T ) ( X a + b ) a T ] T [ (\mathbf C + \mathbf C^T) (\mathbf X \mathbf a + \mathbf b) \mathbf a^T ]^T [(C+CT)(Xa+b)aT]T | ( C + C T ) ( X a + b ) a T (\mathbf C + \mathbf C^T) (\mathbf X \mathbf a + \mathbf b) \mathbf a^T (C+CT)(Xa+b)aT | |
∂ ( X a ) T C ( X b ) ∂ X = \frac{\partial (\mathbf X \mathbf a)^T \mathbf C (\mathbf X \mathbf b)}{\partial \mathbf X} = ∂X∂(Xa)TC(Xb)= | ( C X b a T + C T X a b T ) T ( \mathbf C \mathbf X \mathbf b \mathbf a^T + \mathbf C^T \mathbf X \mathbf a \mathbf b^T )^T (CXbaT+CTXabT)T | C X b a T + C T X a b T \mathbf C \mathbf X \mathbf b \mathbf a^T + \mathbf C^T \mathbf X \mathbf a \mathbf b^T CXbaT+CTXabT | |
∂ ∣ X ∣ ∂ X = \frac{\partial | \mathbf X | }{\partial \mathbf X} = ∂X∂∣X∣= | ∣ X ∣ X − 1 | \mathbf X | \mathbf X^{ - 1} ∣X∣X−1 | ∣ X ∣ ( X − 1 ) T | \mathbf X | (\mathbf X^{ - 1})^T ∣X∣(X−1)T | |
∂ ln ∣ a X ∣ ∂ X = \frac{\partial \ln | a \mathbf X | }{\partial \mathbf X} = ∂X∂ln∣aX∣= | X − 1 \mathbf X^{ - 1} X−1 | ( X − 1 ) T (\mathbf X^{ - 1})^T (X−1)T | |
∂ ∣ A X B ∣ ∂ X = \frac{ \partial | \mathbf A \mathbf X \mathbf B | }{\partial \mathbf X} = ∂X∂∣AXB∣= | ∣ A X B ∣ X − 1 | \mathbf A \mathbf X \mathbf B | \mathbf X^{ - 1} ∣AXB∣X−1 | ∣ A X B ∣ ( X − 1 ) T | \mathbf A \mathbf X \mathbf B | (\mathbf X^{ - 1})^T ∣AXB∣(X−1)T | |
∂ ∣ X n ∣ ∂ X = \frac{ \partial | \mathbf X^n | }{\partial \mathbf X} = ∂X∂∣Xn∣= | n ∣ X n ∣ X − 1 n | \mathbf X^n | \mathbf X^{ - 1} n∣Xn∣X−1 | n ∣ X n ∣ ( X − 1 ) T n | \mathbf X^n | (\mathbf X^{ - 1})^T n∣Xn∣(X−1)T | |
∂ ln ∣ X T X ∣ ∂ X = \frac{ \partial \ln | \mathbf X^T \mathbf X | }{\partial \mathbf X} = ∂X∂ln∣XTX∣= | 2 X + 2 \mathbf X^+ 2X+ | 2 ( X + ) T 2 (\mathbf X^+)^T 2(X+)T | X + \mathbf X^+ X+ 为 X \mathbf X X 的广义逆 |
∂ ln ∣ X T X ∣ ∂ X + = \frac{\partial \ln | \mathbf X^T \mathbf X | }{\partial \mathbf X^+} = ∂X+∂ln∣XTX∣= | − 2 X - 2 \mathbf X −2X | − 2 X T - 2 \mathbf X^T −2XT | X + \mathbf X^+ X+ 为 X \mathbf X X 的广义逆 |
∂ ∣ X T A X ∣ ∂ X = \frac{\partial | \mathbf X^T \mathbf A \mathbf X | }{\partial \mathbf X} = ∂X∂∣XTAX∣= | 2 ∣ X T A X ∣ X − 1 = 2 ∣ X T ∣ ∣ A ∣ ∣ X ∣ X − 1 2 | \mathbf X^T \mathbf A \mathbf X | \mathbf X^{ - 1} = 2 | \mathbf X^T | | \mathbf A | | \mathbf X | \mathbf X^{ - 1} 2∣XTAX∣X−1=2∣XT∣∣A∣∣X∣X−1 | 2 ∣ X T A X ∣ ( X − 1 ) T 2 | \mathbf X^T \mathbf A \mathbf X | (\mathbf X^{ - 1})^T 2∣XTAX∣(X−1)T | X \mathbf X X 为方阵且可逆 |
∂ ∣ X T A X ∣ ∂ X = \frac{\partial | \mathbf X^T \mathbf A \mathbf X | }{\partial \mathbf X} = ∂X∂∣XTAX∣= | 2 ∣ X T A X ∣ ( X T A T X ) − 1 X T A T 2 | \mathbf X^T \mathbf A \mathbf X | ( \mathbf X^T \mathbf A^T \mathbf X )^{ - 1} \mathbf X^T \mathbf A^T 2∣XTAX∣(XTATX)−1XTAT | 2 ∣ X T A X ∣ A X ( X T A X ) − 1 2 | \mathbf X^T \mathbf A \mathbf X | \mathbf A \mathbf X ( \mathbf X^T \mathbf A \mathbf X )^{ - 1} 2∣XTAX∣AX(XTAX)−1 | A \mathbf A A 对称 |
∂ ∣ X T A X ∣ ∂ X = \frac{\partial | \mathbf X^T \mathbf A \mathbf X | }{\partial \mathbf X} = ∂X∂∣XTAX∣= | ∣ X T A X ∣ [ ( X T A X ) − 1 X T A + ( X T A T X ) − 1 X T A T ] | \mathbf X^T \mathbf A \mathbf X | [ ( \mathbf X^T \mathbf A \mathbf X)^{ - 1} \mathbf X^T \mathbf A + ( \mathbf X^T \mathbf A^T \mathbf X )^{ - 1} \mathbf X^T \mathbf A^T ] ∣XTAX∣[(XTAX)−1XTA+(XTATX)−1XTAT] | ∣ X T A X ∣ [ A X ( X T A X ) − 1 + A T X ( X T A T X ) − 1 ] | \mathbf X^T \mathbf A \mathbf X | [ \mathbf A \mathbf X ( \mathbf X^T \mathbf A \mathbf X )^{ - 1} + \mathbf A^T \mathbf X ( \mathbf X^T \mathbf A^T \mathbf X )^{ - 1} ] ∣XTAX∣[AX(XTAX)−1+ATX(XTATX)−1] |
4.5. 矩阵-标量
表达式 | 分子记法 | 备注 |
---|---|---|
∂ a U ∂ x = \frac{\partial a \mathbf U}{\partial x} = ∂x∂aU= | a ∂ U ∂ x a \frac{\partial \mathbf U}{\partial x} a∂x∂U | U = U ( x ) \mathbf U = \mathbf U(x) U=U(x) |
∂ A U B ∂ x = \frac{\partial \mathbf A \mathbf U \mathbf B}{\partial x} = ∂x∂AUB= | A ∂ U ∂ x B \mathbf A \frac{\partial \mathbf U}{\partial x} \mathbf B A∂x∂UB | U = U ( x ) \mathbf U = \mathbf U(x) U=U(x) |
∂ ( U + V ) ∂ x = \frac{\partial (\mathbf U + \mathbf V)}{\partial x} = ∂x∂(U+V)= | ∂ U ∂ x + ∂ V ∂ x \frac{\partial \mathbf U}{\partial x} + \frac{\partial \mathbf V}{\partial x} ∂x∂U+∂x∂V | U = U ( x ) , V = V ( x ) \mathbf U = \mathbf U(x), \mathbf V = \mathbf V(x) U=U(x),V=V(x) |
∂ ( U V ) ∂ x = \frac{\partial (\mathbf U \mathbf V)}{\partial x} = ∂x∂(UV)= | U ∂ V ∂ x + ∂ U ∂ x V \mathbf U \frac{\partial \mathbf V}{\partial x} + \frac{\partial \mathbf U}{\partial x} \mathbf V U∂x∂V+∂x∂UV | U = U ( x ) , V = V ( x ) \mathbf U = \mathbf U(x), \mathbf V = \mathbf V(x) U=U(x),V=V(x) |
∂ ( U ⊗ V ) ∂ x = \frac{\partial (\mathbf U \otimes \mathbf V)}{\partial x} = ∂x∂(U⊗V)= | U ⊗ ∂ V ∂ x + ∂ U ∂ x ⊗ V \mathbf U \otimes \frac{\partial \mathbf V}{\partial x} + \frac{\partial \mathbf U}{\partial x} \otimes \mathbf V U⊗∂x∂V+∂x∂U⊗V | U = U ( x ) , V = V ( x ) \mathbf U = \mathbf U(x), \mathbf V = \mathbf V(x) U=U(x),V=V(x); ⊗ \otimes ⊗ 表示 Kronecker 乘积 |
∂ ( U ∘ V ) ∂ x = \frac{\partial (\mathbf U \circ \mathbf V)}{\partial x} = ∂x∂(U∘V)= | U ∘ ∂ V ∂ x + ∂ U ∂ x ∘ V \mathbf U \circ \frac{\partial \mathbf V}{\partial x} + \frac{\mathbf \partial U}{\partial x} \circ \mathbf V U∘∂x∂V+∂x∂U∘V | U = U ( x ) , V = V ( x ) \mathbf U = \mathbf U(x), \mathbf V = \mathbf V(x) U=U(x),V=V(x); ∘ \circ ∘ 表示 Hadamard 乘积 |
∂ U − 1 ∂ x = \frac{\partial \mathbf U^{ - 1}}{\partial x} = ∂x∂U−1= | − U − 1 ∂ U ∂ x U − 1 -\mathbf U^{ - 1} \frac{\partial \mathbf U}{\partial x} \mathbf U^{ - 1} −U−1∂x∂UU−1 | U = U ( x ) \mathbf U = \mathbf U(x) U=U(x) |
∂ 2 U − 1 ∂ x ∂ y = \frac{\partial^2 \mathbf U^{ - 1}}{\partial x \partial y} = ∂x∂y∂2U−1= | U − 1 ( ∂ U ∂ x U − 1 ∂ U ∂ y − ∂ 2 U ∂ x ∂ y + ∂ U ∂ y U − 1 ∂ U ∂ x ) U − 1 \mathbf U^{ - 1} \left( \frac{\partial \mathbf U}{\partial x} \mathbf U^{ - 1} \frac{\partial \mathbf U}{\partial y} - \frac{\partial^2 \mathbf U}{\partial x \partial y} + \frac{\partial \mathbf U}{\partial y} \mathbf U^{ - 1} \frac{\partial \mathbf U}{\partial x} \right) \mathbf U^{ - 1} U−1(∂x∂UU−1∂y∂U−∂x∂y∂2U+∂y∂UU−1∂x∂U)U−1 | U = U ( x , y ) \mathbf U = \mathbf U(x, y) U=U(x,y) |
∂ g ( x A ) ∂ x = \frac{\partial g (x \mathbf A)}{\partial x} = ∂x∂g(xA)= | A g ′ ( x A ) = g ′ ( x A ) A \mathbf A g' (x \mathbf A) = g' (x \mathbf A) \mathbf A Ag′(xA)=g′(xA)A | 应为 Hadamard 乘积; g ( ⋅ ) g (\cdot) g(⋅) 为逐元函数,如下例 |
∂ e x A ∂ x = \frac{\partial e^{x \mathbf A}}{\partial x} = ∂x∂exA= | A e x A = e x A A \mathbf A e^{x \mathbf A} = e^{x \mathbf A} \mathbf A AexA=exAA |
二、矩阵分解
- QR分解: M = Q R M = QR M=QR, Q正交,R上三角。
- 奇异值分解(Singular Value Decomposition,SVD): M = U Σ V T M = UΣV^T M=UΣVT, U和V正交,Σ非负对角。
- 特征分解(Eigendecomposition),又叫谱分解(Spectral decomposition): S = Q Λ Q T S =QΛQ^T S=QΛQT, S对称,Q正交,Λ对角。
- 极分解: M = Q S M = QS M=QS, Q正交,S对称半正定。
- 科列斯基分解(Cholesky decomposition): A = L L ∗ \mathbf {A} =\mathbf {LL} ^{*} A=LL∗, L \mathbf{L} L 下三角矩阵且所有对角元素均为正实数, L ∗ \mathbf {L} ^{*} L∗表示 L \mathbf {L} L 的共轭转置。每一个正定埃尔米特矩阵都有一个唯一的科列斯基分解
- LU分解: A = L U A=LU A=LU,L下三角, U上三角
1. 科列斯基分解
科列斯基分解主要被用于线性方程组
A
x
=
b
\mathbf {Ax} =\mathbf {b}
Ax=b 的求解。如果
A
A
A 是对称正定的,我们可以先求出
A
=
L
L
T
\mathbf {A} =\mathbf {LL} ^{\mathbf {T} }
A=LLT,随后借向后替换法对
y
y
y 求解
L
y
=
b
\mathbf {Ly} =\mathbf {b}
Ly=b,再以向前替换法对
x
x
x 求解
L
T
x
=
y
\mathbf {L} ^{\mathbf {T} }\mathbf {x} =\mathbf {y}
LTx=y即得最终解。
另一种可避免在计算
L
L
T
\mathbf {LL} ^{\mathbf {T} }
LLT时需要解平方根的方法就是计算
A
=
L
D
L
T
\mathbf {A} =\mathbf {LDL} ^{\mathrm {T} }
A=LDLT,然后对
y
y
y 求解
L
y
=
b
\mathbf {Ly} =\mathbf {b}
Ly=b,最后求解
D
L
T
x
=
y
\mathbf {DL} ^{\mathrm {T} }\mathbf {x} =\mathbf {y}
DLTx=y
对于可以被改写成对称矩阵的线性方程组,科列斯基分解及其LDL变形是一个较高效率及较高数值稳定性的求解方法。相比之下,其效率几近为LU分解的两倍
2. SGD分解
三、矩阵种类
1.「正定矩阵」和「半正定矩阵」
案例:多元正态分布的协方差矩阵要求是半正定的
【定义1】 给定一个大小为
n
×
n
n\times n
n×n 的实对称矩阵
A
A
A,若对于任意长度为
n
n
n 的非零向量
x
\boldsymbol{x}
x,有
x
T
A
x
>
0
\boldsymbol{x}^TA\boldsymbol{x}>0
xTAx>0 恒成立,则矩阵
A
A
A是一个正定矩阵。
【定义2】 给定一个大小为
n
×
n
n\times n
n×n 的实对称矩阵
A
A
A ,若对于任意长度为
n
n
n 的向量
x
\boldsymbol{x}
x ,有
x
T
A
x
≥
0
\boldsymbol{x}^TA\boldsymbol{x}\geq0
xTAx≥0 恒成立,则矩阵
A
A
A 是一个半正定矩阵。
直观解释:
若给定任意一个正定矩阵
A
∈
R
n
×
n
A\in\mathbb{R}^{n\times n}
A∈Rn×n 和一个非零向量
x
∈
R
n
\boldsymbol{x}\in\mathbb{R}^{n}
x∈Rn ,则两者相乘得到的向量
y
=
A
x
∈
R
n
\boldsymbol{y}=A\boldsymbol{x}\in\mathbb{R}^{n}
y=Ax∈Rn 与向量
x
\boldsymbol{x}
x 的夹角恒小于
π
2
\frac{\pi}{2}
2π . (等价于:
x
T
A
x
>
0
\boldsymbol{x}^TA\boldsymbol{x}>0
xTAx>0 .)
若给定任意一个半正定矩阵
A
∈
R
n
×
n
A\in\mathbb{R}^{n\times n}
A∈Rn×n 和一个向量
x
∈
R
n
\boldsymbol{x}\in\mathbb{R}^{n}
x∈Rn ,则两者相乘得到的向量
y
=
A
x
∈
R
n
\boldsymbol{y}=A\boldsymbol{x}\in\mathbb{R}^{n}
y=Ax∈Rn 与向量
x
\boldsymbol{x}
x 的夹角恒小于或等于
π
2
\frac{\pi}{2}
2π . (等价于:
x
T
A
x
≥
0
\boldsymbol{x}^TA\boldsymbol{x}\geq0
xTAx≥0 .)
1.1 为什么协方差矩阵是半正定的
对于任意多元随机变量 t \boldsymbol{t} t ,协方差矩阵为
C = E [ ( t − t ˉ ) ( t − t ˉ ) T ] C=\mathbb{E}\left[(\boldsymbol{t}-\bar{\boldsymbol{t}})(\boldsymbol{t}-\bar{\boldsymbol{t}})^T\right] C=E[(t−tˉ)(t−tˉ)T]
现给定任意一个向量
x
\boldsymbol{x}
x ,则
x
T
C
x
=
x
T
E
[
(
t
−
t
ˉ
)
(
t
−
t
ˉ
)
T
]
x
=
E
[
x
T
(
t
−
t
ˉ
)
(
t
−
t
ˉ
)
T
x
]
=
E
(
s
2
)
=
σ
s
2
\boldsymbol{x}^TC\boldsymbol{x}=\boldsymbol{x}^T\mathbb{E}\left[(\boldsymbol{t}-\bar{\boldsymbol{t}})(\boldsymbol{t}-\bar{\boldsymbol{t}})^T\right]\boldsymbol{x} =\mathbb{E}\left[\boldsymbol{x}^T(\boldsymbol{t}-\bar{\boldsymbol{t}})(\boldsymbol{t}-\bar{\boldsymbol{t}})^T\boldsymbol{x}\right]=\mathbb{E}(s^2)=\sigma_{s}^2
xTCx=xTE[(t−tˉ)(t−tˉ)T]x=E[xT(t−tˉ)(t−tˉ)Tx]=E(s2)=σs2
其中,
σ
s
=
x
T
(
t
−
t
ˉ
)
=
(
t
−
t
ˉ
)
T
x
\sigma_s=\boldsymbol{x}^T(\boldsymbol{t}-\bar{\boldsymbol{t}})=(\boldsymbol{t}-\bar{\boldsymbol{t}})^T\boldsymbol{x}
σs=xT(t−tˉ)=(t−tˉ)Tx。由于
σ
s
2
≥
0
\sigma_s^2\geq0
σs2≥0 ,因此,
x
T
C
x
≥
0
\boldsymbol{x}^TC\boldsymbol{x}\geq0
xTCx≥0 ,协方差矩阵
C
C
C 是半正定的。
2. 逆矩阵
分块矩阵(Block matrix) 的逆矩阵恒等式:
(
A
B
C
D
)
−
1
=
(
M
−
M
B
D
−
1
−
D
−
1
C
M
D
−
1
+
D
−
1
C
M
B
D
−
1
)
\begin{pmatrix}A&B\\C&D\end{pmatrix}^{-1}=\begin{pmatrix}M&-MBD^{-1}\\-D^{-1}CM&D^{-1}{+D^{-1}CMBD^{-1}}\end{pmatrix}
(ACBD)−1=(M−D−1CM−MBD−1D−1+D−1CMBD−1)
其中
M
=
(
A
−
B
D
−
1
C
)
−
1
M=(A-BD^{-1}C)^{-1}
M=(A−BD−1C)−1
若A,C为可逆方阵,则有 ( A + B C D ) − 1 = A − 1 − A − 1 B ( D A − 1 B + C − 1 ) − 1 D A − 1 (A+BCD)^{-1}=A^{-1}-A^{-1}B(DA^{-1}B+C^{-1})^{-1}DA^{-1} (A+BCD)−1=A−1−A−1B(DA−1B+C−1)−1DA−1
工具网站
- Matrix Calculus:在线计算矩阵导数
References
矩阵微积分 | Here4U