矩阵论(Matrix)

大纲

  • 矩阵微积分:多元微积分的一种特殊表达,尤其是在矩阵空间上进行讨论的时候
  • 逆矩阵(inverse matrix)
  • 矩阵分解:特征分解(Eigendecomposition),又称谱分解(Spectral decomposition);LU分解;奇异值分解(singular value decomposition);QR分解;科列斯基分解
  • 矩阵行列式(Determinant):在欧几里得空间中,行列式描述的是一个线性变换对“体积”所造成的影响
  • 特征向量(eigenvector) A v = λ v Av=\lambda v Av=λv,其中 λ \lambda λ特征值 v v v A A A的特征向量, A A A的所有特征值的全体叫 A A A的谱,记为 λ ( A ) \lambda(A) λ(A)
  • 迹(trance) tr ⁡ ( A ) = A 1 , 1 + ⋯ + A n , n \operatorname{tr}(\mathbf{A}) = \mathbf{A}_{1, 1} + \cdots + \mathbf{A}_{n, n} tr(A)=A1,1++An,n,一个矩阵的迹是其特征值的总和
  • 正交矩阵(orthogonal matrix):是一个方阵,其行向量與列向量皆為正交的单位向量,使得該矩陣的转置矩阵為其逆矩阵。 Q Q T = I QQ^T=I QQT=I
  • 正定矩阵和半正定矩阵(positive semi-definite matrix):一个 n × n n\times n n×n 的实对称矩阵 M M M 是正定的,当且仅当对于所有的非零实系数向量 z \mathbf {z} z,都有 z T M z > 0 \mathbf {z} ^{T}M\mathbf {z} >0 zTMz>0。其中 z T \mathbf {z} ^{T} zT表示 z \mathbf {z} z 的转置
  • 伴随矩阵(adjugate matrix):如果矩阵可逆,那么它的逆矩阵和它的伴随矩阵之间只差一个系数
  • 共轭矩阵(又叫Hermite矩阵):矩阵本身先转置再把矩阵中每个元素取共轭(虚部变号的运算)得到的矩阵
  • 共轭转置(conjugate transpose or Hermitian transpose) A ∗ = ( A ‾ ) T = A T ‾ A^* = (\overline{A})^\mathrm{T} = \overline{A^\mathrm{T}} A=(A)T=AT, A ‾ \overline{A} A表示对矩阵A元素取复共轭
  • 酉矩阵(又叫幺正矩阵,unitary matrix):指其共轭转置恰为其逆矩阵的复数方阵, U ∗ U = U U ∗ = I n U^{*}U=UU^{*}=I_{n} UU=UU=In
  • 实对称矩阵:元素都为实数的对称矩阵
  • 对角矩阵(diagonal matrix):一个主对角线之外的元素皆为0的矩阵,常写为diag(a1,a2,…,an)
  • 雅可比矩阵(Jacobian matrix) J = [ ∂ f ∂ x 1 ⋯ ∂ f ∂ x n ] = [ ∂ f 1 ∂ x 1 ⋯ ∂ f 1 ∂ x n ⋮ ⋱ ⋮ ∂ f m ∂ x 1 ⋯ ∂ f m ∂ x n ] \mathbf {J} ={\begin{bmatrix}{\dfrac {\partial \mathbf {f} }{\partial x_{1}}}&\cdots &{\dfrac {\partial \mathbf {f} }{\partial x_{n}}}\end{bmatrix}}={\begin{bmatrix}{\dfrac {\partial f_{1}}{\partial x_{1}}}&\cdots &{\dfrac {\partial f_{1}}{\partial x_{n}}}\\\vdots &\ddots &\vdots \\{\dfrac {\partial f_{m}}{\partial x_{1}}}&\cdots &{\dfrac {\partial f_{m}}{\partial x_{n}}}\end{bmatrix}} J=[x1fxnf]= x1f1x1fmxnf1xnfm
  • 黑塞矩阵(又叫海森矩阵,Hessian matrix):由多变量实值函数的所有二阶偏导数组成的方阵, H i j = ∂ 2 f ∂ x i ∂ x j \mathbf {H} _{ij}={\frac {\partial ^{2}f}{\partial x_{i}\partial x_{j}}} Hij=xixj2f
  • 矩阵范数(matrix norm)

一、矩阵微积分

向量对向量的偏导称 Jacobian Matrix:
J = ∂ y ( n ) ∂ x ( m ) = ( ∂ y 1 ∂ x 1 ⋯ ∂ y 1 ∂ x m ⋮ ⋱ ⋮ ∂ y n ∂ x 1 ⋯ ∂ y n ∂ x m ) n × m J = \frac{\partial{y_{(n)}}}{\partial{x_{(m)}}} = \begin{pmatrix} \frac{\partial{y_1}}{\partial{x_1}} & \cdots & \frac{\partial{y_1}}{\partial{x_m}} \\ \vdots & \ddots & \vdots \\ \frac{\partial{y_n}}{\partial{x_1}} & \cdots & \frac{\partial{y_n}}{\partial{x_m}} \end{pmatrix}_{n \times m} J=x(m)y(n)= x1y1x1ynxmy1xmyn n×m
标量对向量的偏导、向量对标量的偏导都是相应向量为一维的情况。
这里采用了称为分子布局的表示方法,另外还有将矩阵(向量)微积分表示为这里这种形式的转置的,称为分母布局。但用分母布局表示时,下面的运算法则没有这么好记的形式。

与标量微积分对比:

  • 加法法则不变 ∂ y + z ∂ x = ∂ y ∂ x + ∂ z ∂ x \frac{\partial{y + z}}{\partial{x}} = \frac{\partial{y}}{\partial{x}} + \frac{\partial{z}}{\partial{x}} xy+z=xy+xz

  • 链式法则不变 ∂ z ∂ x = ∂ z ∂ y ⋅ ∂ y ∂ x \frac{\partial{z}}{\partial{x}} = \frac{\partial{z}}{\partial{y}} \cdot \frac{\partial{y}}{\partial{x}} xz=yzxy

  • 乘法法则形式不变 ∂ y ⊗ z ∂ x = y ⊗ ∂ z ∂ x + z ⊗ ∂ y ∂ x \frac{\partial{y \otimes z}}{\partial{x}} = y \otimes \frac{\partial{z}}{\partial{x}} + z \otimes \frac{\partial{y}}{\partial{x}} xyz=yxz+zxy

    • 向量内积 ∂ y T z ∂ x = y T ⋅ ∂ z ∂ x + z T ⋅ ∂ y ∂ x \frac{\partial{y^Tz}}{\partial{x}} = y^T \cdot \frac{\partial{z}}{\partial{x}} + z^T \cdot \frac{\partial{y}}{\partial{x}} xyTz=yTxz+zTxy
    • 矩阵乘积(A 与 x 无关) ∂ A y ∂ x = A ⋅ ∂ y ∂ x \frac{\partial{Ay}}{\partial{x}} = A \cdot \frac{\partial{y}}{\partial{x}} xAy=Axy
    • 向量数乘(y 或 z 为标量) ∂ y z ∂ x = y ⋅ ∂ z ∂ x + z ⋅ ∂ y ∂ x \frac{\partial{yz}}{\partial{x}} = y \cdot \frac{\partial{z}}{\partial{x}} + z \cdot \frac{\partial{y}}{\partial{x}} xyz=yxz+zxy

∑ i = 1 n i 2 = n ( n + 1 ) ( 2 n + 1 ) 6 \sum_{i=1}^n i^2 = \frac{n(n+1)(2n+1)}{6} i=1ni2=6n(n+1)(2n+1)

1. 表示法

  • A , X , Y \mathbf{A}, \mathbf{X}, \mathbf{Y} A,X,Y 等:粗体的大写字母,表示一个矩阵
  • a , x , y \mathbf a, \mathbf x, \mathbf y a,x,y 等:粗体的小写字母,表示一个向量;
  • a , x , y a, x, y a,x,y 等:斜体的小写字母,表示一个标量;
  • X T \mathbf X^T XT:表示矩阵 X \mathbf X X 的转置;
  • X H \mathbf X^H XH:表示矩阵 X \mathbf X X 的共轭转置;
  • ∣ X ∣ | \mathbf X | X:表示方阵 X \mathbf X X 的行列式;
  • ∣ ∣ x ∣ ∣ || \mathbf x || ∣∣x∣∣:表示向量 x \mathbf x x 的范数;
  • I \mathbf I I:表示单位矩阵。

2. 向量微分

2.1 向量-标量

列向量函数 y = [ y 1 y 2 ⋯ y m ] T \mathbf y = \begin{bmatrix} y_1 & y_2 & \cdots & y_m \end{bmatrix}^T y=[y1y2ym]T 对标量 x x x 的导数称为 y \mathbf y y 的切向量,可以以 分子记法 表示为 ∂ y ∂ x = [ ∂ y 1 ∂ x ∂ y 2 ∂ x ⋮ ∂ y m ∂ x ] m × 1 \frac{\partial \mathbf y}{\partial x} = \begin{bmatrix} \frac{\partial y_1}{\partial x} \newline \frac{\partial y_2}{\partial x} \newline \vdots \newline \frac{\partial y_m}{\partial x}\end{bmatrix}_{m \times 1} xy= xy1xy2xym m×1

若以 分母记法 则可以表示为 ∂ y ∂ x = [ ∂ y 1 ∂ x ∂ y 2 ∂ x ⋯ ∂ y m ∂ x ] 1 × m \frac{\partial \mathbf y}{\partial x} = \begin{bmatrix} \frac{\partial y_1}{\partial x} & \frac{\partial y_2}{\partial x} & \cdots & \frac{\partial y_m}{\partial x}\end{bmatrix}_{1 \times m} xy=[xy1xy2xym]1×m

2.2 标量-向量

标量函数 y y y 对列向量 x = [ x 1 x 2 ⋯ x n ] T \mathbf x = \begin{bmatrix} x_1 & x_2 & \cdots & x_n \end{bmatrix}^T x=[x1x2xn]T 的导数可以以 分子记法 表示为 ∂ y ∂ x = [ ∂ y ∂ x 1 ∂ y ∂ x 2 ⋯ ∂ y ∂ x n ] 1 × n \frac{\partial y}{\partial \mathbf x} = \begin{bmatrix} \frac{\partial y}{\partial x_1} & \frac{\partial y}{\partial x_2} & \cdots & \frac{\partial y}{\partial x_n}\end{bmatrix}_{1 \times n} xy=[x1yx2yxny]1×n

若以 分母记法 则可以表示为 ∂ y ∂ x = [ ∂ y ∂ x 1 ∂ y ∂ x 2 ⋮ ∂ y ∂ x n ] n × 1 \frac{\partial y}{\partial \mathbf x} = \begin{bmatrix} \frac{\partial y}{\partial x_1} \newline \frac{\partial y}{\partial x_2} \newline \vdots \newline \frac{\partial y}{\partial x_n}\end{bmatrix}_{n \times 1} xy= x1yx2yxny n×1

2.3 向量-向量

列向量函数 y = [ y 1 y 2 ⋯ y m ] T \mathbf y = \begin{bmatrix} y_1 & y_2 & \cdots & y_m \end{bmatrix}^T y=[y1y2ym]T 对列向量 x = [ x 1 x 2 ⋯ x n ] T \mathbf x = \begin{bmatrix} x_1 & x_2 & \cdots & x_n \end{bmatrix}^T x=[x1x2xn]T 的导数可以以 分子记法 表示为
∂ y ∂ x = [ ∂ y 1 ∂ x 1 ∂ y 1 ∂ x 2 ⋯ ∂ y 1 ∂ x n ∂ y 2 ∂ x 1 ∂ y 2 ∂ x 2 ⋯ ∂ y 2 ∂ x n ⋮ ⋮ ⋱ ⋮ ∂ y m ∂ x 1 ∂ y m ∂ x 2 ⋯ ∂ y m ∂ x n ] m × n \frac{\partial \mathbf y}{\partial \mathbf x} = \begin{bmatrix} \frac{\partial y_1}{\partial x_1} & \frac{\partial y_1}{\partial x_2} & \cdots & \frac{\partial y_1}{\partial x_n} \newline \frac{\partial y_2}{\partial x_1} & \frac{\partial y_2}{\partial x_2} & \cdots & \frac{\partial y_2}{\partial x_n} \newline \vdots & \vdots & \ddots & \vdots \newline \frac{\partial y_m}{\partial x_1} & \frac{\partial y_m}{\partial x_2} & \cdots & \frac{\partial y_m}{\partial x_n} \newline\end{bmatrix}_{m \times n} xy= x1y1x1y2x1ymx2y1x2y2x2ymxny1xny2xnym m×n

若以 分母记法 则可以表示为
∂ y ∂ x = [ ∂ y 1 ∂ x 1 ∂ y 2 ∂ x 1 ⋯ ∂ y m ∂ x 1 ∂ y 1 ∂ x 1 ∂ y 2 ∂ x 1 ⋯ ∂ y m ∂ x 1 ⋮ ⋮ ⋱ ⋮ ∂ y 1 ∂ x 1 ∂ y 2 ∂ x 1 ⋯ ∂ y m ∂ x 1 ] n × m \frac{\partial \mathbf y}{\partial \mathbf x} = \begin{bmatrix} \frac{\partial y_1}{\partial x_1} & \frac{\partial y_2}{\partial x_1} &\cdots & \frac{\partial y_m}{\partial x_1} \newline \frac{\partial y_1}{\partial x_1} & \frac{\partial y_2}{\partial x_1} & \cdots &\frac{\partial y_m}{\partial x_1} \newline \vdots &\vdots & \ddots & \vdots \newline \frac{\partial y_1}{\partial x_1} &\frac{\partial y_2}{\partial x_1} & \cdots & \frac{\partial y_m}{\partial x_1} \newline\end{bmatrix}_{n \times m} xy= x1y1x1y1x1y1x1y2x1y2x1y2x1ymx1ymx1ym n×m

3. 矩阵微分

1. 矩阵-标量

形状为 m × n m \times n m×n 的矩阵函数 Y \mathbf Y Y 对标量 x x x 的导数称为 Y \mathbf Y Y 的切矩阵,可以以 分子记法 表示为
∂ Y ∂ x = [ ∂ y 11 ∂ x ∂ y 12 ∂ x ⋯ ∂ y 1 n ∂ x ∂ y 21 ∂ x ∂ y 22 ∂ x ⋯ ∂ y 2 n ∂ x ⋮ ⋮ ⋱ ⋮ ∂ y m 1 ∂ x ∂ y m 2 ∂ x ⋯ ∂ y m n ∂ x ] m × n \frac{\partial \mathbf Y}{\partial x} = \begin{bmatrix} \frac{\partial y_{11}}{\partial x} & \frac{\partial y_{12}}{\partial x} & \cdots & \frac{\partial y_{1n}}{\partial x} \newline \frac{\partial y_{21}}{\partial x} & \frac{\partial y_{22}}{\partial x} & \cdots & \frac{\partial y_{2n}}{\partial x} \newline \vdots & \vdots & \ddots & \vdots \newline \frac{\partial y_{m1}}{\partial x} & \frac{\partial y_{m2}}{\partial x} & \cdots & \frac{\partial y_{mn}}{\partial x} \newline\end{bmatrix}_{m \times n} xY= xy11xy21xym1xy12xy22xym2xy1nxy2nxymn m×n

2. 标量-矩阵

标量函数 y y y 对形状为 p × q p \times q p×q 的矩阵 X \mathbf X X 的导数可以 分子记法 表示为

∂ y ∂ X = [ ∂ y ∂ x 11 ∂ y ∂ x 21 ⋯ ∂ y ∂ x p 1 ∂ y ∂ x 12 ∂ y ∂ x 22 ⋯ ∂ y ∂ x p 2 ⋮ ⋮ ⋱ ⋮ ∂ y ∂ x 1 q ∂ y ∂ x 2 q ⋯ ∂ y ∂ x p q ] q × p \frac{\partial y}{\partial \mathbf X} = \begin{bmatrix} \frac{\partial y}{\partial x_{11}} & \frac{\partial y}{\partial x_{21}} & \cdots & \frac{\partial y}{\partial x_{p1}} \newline \frac{\partial y}{\partial x_{12}} & \frac{\partial y}{\partial x_{22}} & \cdots & \frac{\partial y}{\partial x_{p2}} \newline \vdots & \vdots & \ddots & \vdots \newline \frac{\partial y}{\partial x_{1q}} & \frac{\partial y}{\partial x_{2q}} & \cdots & \frac{\partial y}{\partial x_{pq}} \newline\end{bmatrix}_{q \times p} Xy= x11yx12yx1qyx21yx22yx2qyxp1yxp2yxpqy q×p
若以 分母记法 则可以表示为
∂ y ∂ X = [ ∂ y ∂ x 11 ∂ y ∂ x 12 ⋯ ∂ y ∂ x 1 q ∂ y ∂ x 21 ∂ y ∂ x 22 ⋯ ∂ y ∂ x 2 q ⋮ ⋮ ⋱ ⋮ ∂ y ∂ x p 1 ∂ y ∂ x p 2 ⋯ ∂ y ∂ x p q ] p × q \frac{\partial y}{\partial \mathbf X} = \begin{bmatrix} \frac{\partial y}{\partial x_{11}} & \frac{\partial y}{\partial x_{12}} & \cdots & \frac{\partial y}{\partial x_{1q}} \newline \frac{\partial y}{\partial x_{21}} & \frac{\partial y}{\partial x_{22}} & \cdots & \frac{\partial y}{\partial x_{2q}} \newline \vdots & \vdots & \ddots & \vdots \newline \frac{\partial y}{\partial x_{p1}} & \frac{\partial y}{\partial x_{p2}} & \cdots & \frac{\partial y}{\partial x_{pq}} \newline\end{bmatrix}_{p \times q} Xy= x11yx21yxp1yx12yx22yxp2yx1qyx2qyxpqy p×q

4. 恒等式

以下各式中,无特别备注,默认被求导的复合函数的各因式皆不是求导变量的函数。

4.1. 向量-向量

表达式分子记法分母记法备注
∂ a ∂ x = \frac{\partial \mathbf a}{\partial \mathbf x} = xa= 0 \mathbf 0 0 0 \mathbf 0 0
∂ x ∂ x = \frac{\partial \mathbf x}{\partial \mathbf x} = xx= I \mathbf I I I \mathbf I I
∂ A x ∂ x = \frac{\partial \mathbf A \mathbf x}{\partial \mathbf x} = xAx= A \mathbf A A A T \mathbf A^T AT
∂ x T A ∂ x = \frac{\partial \mathbf x^T \mathbf A}{\partial \mathbf x} = xxTA= A T \mathbf A^T AT A \mathbf A A
∂ a u ∂ x = \frac{\partial a \mathbf u}{\partial \mathbf x} = xau= a ∂ u ∂ x a \frac{\partial \mathbf u}{\partial x} axu a ∂ u ∂ x a \frac{\partial \mathbf u}{\partial x} axu u = u ( x ) \mathbf u = \mathbf u(\mathbf x) u=u(x)
∂ v u ∂ x = \frac{\partial v \mathbf u}{\partial \mathbf x} = xvu= v ∂ u ∂ x + u ∂ v ∂ x v \frac{\partial \mathbf u}{\partial \mathbf x} + \mathbf u \frac{\partial v}{\partial \mathbf x} vxu+uxv v ∂ u ∂ x + ∂ v ∂ x u T v \frac{\partial \mathbf u}{\partial \mathbf x} + \frac{\partial v}{\partial \mathbf x} \mathbf u^T vxu+xvuT v = v ( x ) , u = u ( x ) v = v(\mathbf x), \mathbf u = \mathbf u(\mathbf x) v=v(x),u=u(x)
∂ A u ∂ x = \frac{\partial \mathbf A \mathbf u}{\partial \mathbf x} = xAu= A ∂ u ∂ x \mathbf A \frac{\partial \mathbf u}{\partial \mathbf x} Axu ∂ u ∂ x A T \frac{\partial \mathbf u}{\partial \mathbf x} \mathbf A^T xuAT u = u ( x ) \mathbf u = \mathbf u(\mathbf x) u=u(x)
∂ ( u + v ) ∂ x = \frac{\partial (\mathbf u + \mathbf v)}{\partial \mathbf x} = x(u+v)= ∂ u ∂ x + ∂ v ∂ x \frac{\partial \mathbf u}{\partial \mathbf x} + \frac{\partial \mathbf v}{\partial \mathbf x} xu+xv ∂ u ∂ x + ∂ v ∂ x \frac{\partial \mathbf u}{\partial \mathbf x} + \frac{\partial \mathbf v}{\partial \mathbf x} xu+xv u = u ( x ) , v = v ( x ) \mathbf u = \mathbf u(\mathbf x), \mathbf v = \mathbf v(\mathbf x) u=u(x),v=v(x)
∂ f ( g ( u ) ) ∂ x = \frac{\partial \mathbf f(\mathbf g(\mathbf u))}{\partial \mathbf x} = xf(g(u))= ∂ f ( g ) ∂ g ∂ g ( u ) ∂ u ∂ u ∂ x \frac{\partial \mathbf f(\mathbf g)}{\partial \mathbf g} \frac{\partial \mathbf g(\mathbf u)}{\partial \mathbf u} \frac{\partial \mathbf u}{\partial \mathbf x} gf(g)ug(u)xu ∂ u ∂ x ∂ g ( u ) ∂ u ∂ f ( g ) ∂ g \frac{\partial \mathbf u}{\partial \mathbf x} \frac{\partial \mathbf g(\mathbf u)}{\partial \mathbf u} \frac{\partial \mathbf f(\mathbf g)}{\partial \mathbf g} xuug(u)gf(g) u = u ( x ) \mathbf u = \mathbf u(\mathbf x) u=u(x)

4.2. 标量-向量

表达式分子记法分母记法备注
∂ a ∂ x = \frac{\partial a}{\partial \mathbf x} = xa= 0 T \mathbf 0^T 0T 0 \mathbf 0 0
∂ a u ∂ x = \frac{\partial a u}{\partial \mathbf x} = xau= a ∂ u ∂ x a \frac{\partial \mathbf u}{\partial \mathbf x} axu a ∂ u ∂ x a \frac{\partial \mathbf u}{\partial \mathbf x} axu u = u ( x ) u = u(\mathbf x) u=u(x)
∂ ( u + v ) ∂ x = \frac{\partial (u + v)}{\partial \mathbf x} = x(u+v)= ∂ u ∂ x + ∂ v ∂ x \frac{\partial u}{\partial \mathbf x} + \frac{\partial v}{\partial \mathbf x} xu+xv ∂ u ∂ x + ∂ v ∂ x \frac{\partial u}{\partial \mathbf x} + \frac{\partial v}{\partial \mathbf x} xu+xv u = u ( x ) , v = v ( x ) u = u(\mathbf x), v = v(\mathbf x) u=u(x),v=v(x)
∂ u v ∂ x = \frac{\partial u v}{\partial \mathbf x} = xuv= u ∂ v ∂ x + v ∂ u ∂ x u \frac{\partial v}{\partial \mathbf x} + v \frac{\partial u}{\partial \mathbf x} uxv+vxu u ∂ v ∂ x + v ∂ u ∂ x u \frac{\partial v}{\partial \mathbf x} + v \frac{\partial u}{\partial \mathbf x} uxv+vxu u = u ( x ) , v = v ( x ) u = u(\mathbf x), v = v(\mathbf x) u=u(x),v=v(x)
∂ f ( g ( u ) ) ∂ x = \frac{\partial f(g(u))}{\partial \mathbf x} = xf(g(u))= ∂ f ( g ) ∂ g ∂ g ( u ) ∂ u ∂ u ∂ x \frac{\partial f(g)}{\partial g} \frac{\partial g(u)}{\partial u} \frac{\partial u}{\partial \mathbf x} gf(g)ug(u)xu ∂ f ( g ) ∂ g ∂ g ( u ) ∂ u ∂ u ∂ x \frac{\partial f(g)}{\partial g} \frac{\partial g(u)}{\partial u} \frac{\partial u}{\partial \mathbf x} gf(g)ug(u)xu u = u ( x ) u = u(\mathbf x) u=u(x)
∂ ( u ⋅ v ) ∂ x = ∂ u T v ∂ x = \frac{\partial (\mathbf u \cdot \mathbf v)}{\partial \mathbf x} = \frac{\partial \mathbf u^T \mathbf v}{\partial \mathbf x} = x(uv)=xuTv= u T ∂ v ∂ x + v T ∂ u ∂ x \mathbf u^T \frac{\partial \mathbf v}{\partial \mathbf x} + \mathbf v^T \frac{\partial \mathbf u}{\partial \mathbf x} uTxv+vTxu ∂ v ∂ x u + ∂ u ∂ x v \frac{\partial \mathbf v}{\partial \mathbf x} \mathbf u + \frac{\partial \mathbf u}{\partial \mathbf x} \mathbf v xvu+xuv u = u ( x ) , v = v ( x ) \mathbf u = \mathbf u(\mathbf x), \mathbf v = \mathbf v(\mathbf x) u=u(x),v=v(x)
∂ ( u ⋅ A v ) ∂ x = ∂ u T A v ∂ x = \frac{\partial (\mathbf u \cdot \mathbf A \mathbf v)}{\partial \mathbf x} = \frac{\partial \mathbf u^T \mathbf A \mathbf v}{\partial \mathbf x} = x(uAv)=xuTAv= u T A ∂ v ∂ x + v T A T ∂ u ∂ x \mathbf u^T \mathbf A \frac{\partial \mathbf v}{\partial \mathbf x} + \mathbf v^T \mathbf A^T \frac{\partial \mathbf u}{\partial \mathbf x} uTAxv+vTATxu ∂ v ∂ x A T u + ∂ u ∂ x A v \frac{\partial \mathbf v}{\partial \mathbf x} \mathbf A^T \mathbf u + \frac{\partial \mathbf u}{\partial \mathbf x} \mathbf A \mathbf v xvATu+xuAv u = u ( x ) , v = v ( x ) \mathbf u = \mathbf u(\mathbf x), \mathbf v = \mathbf v(\mathbf x) u=u(x),v=v(x)
∂ ( a ⋅ u ) ∂ x = ∂ a T u ∂ x = \frac{\partial (\mathbf a \cdot \mathbf u)}{\partial \mathbf x} = \frac{\partial \mathbf a^T \mathbf u}{\partial \mathbf x} = x(au)=xaTu= a T ∂ u ∂ x \mathbf a^T \frac{\partial \mathbf u}{\partial \mathbf x} aTxu ∂ u ∂ x a \frac{\partial \mathbf u}{\partial \mathbf x} \mathbf a xua u = u ( x ) \mathbf u = \mathbf u(\mathbf x) u=u(x)
∂ b T A x ∂ x = \frac{\partial \mathbf b^T \mathbf A \mathbf x}{\partial \mathbf x} = xbTAx= b T A \mathbf b^T \mathbf A bTA A T b \mathbf A^T \mathbf b ATb
∂ x T A x ∂ x = \frac{\partial \mathbf x^T \mathbf A \mathbf x}{\partial \mathbf x} = xxTAx= x T ( A + A T ) \mathbf x^T (\mathbf A + \mathbf A^T) xT(A+AT) ( A + A T ) x (\mathbf A + \mathbf A^T) \mathbf x (A+AT)x
∂ 2 x T A x ∂ x ∂ x T = \frac{\partial^2 \mathbf x^T \mathbf A \mathbf x}{\partial \mathbf x \partial \mathbf x^T} = xxT2xTAx= A + A T \mathbf A + \mathbf A^T A+AT A + A T \mathbf A + \mathbf A^T A+AT
∂ a T x x T b ∂ x = \frac{\partial \mathbf a^T \mathbf x \mathbf x^T \mathbf b}{\partial \mathbf x} = xaTxxTb= x T ( a b T + b a T ) \mathbf x^T (\mathbf a \mathbf b^T + \mathbf b \mathbf a^T) xT(abT+baT) ( a b T + b a T ) x (\mathbf a \mathbf b^T + \mathbf b \mathbf a^T) \mathbf x (abT+baT)x
∂ ( A x + b ) T C ( D x + e ) ∂ x = \frac{\partial (\mathbf A \mathbf x + \mathbf b)^T \mathbf C (\mathbf D \mathbf x + \mathbf e)}{\partial \mathbf x} = x(Ax+b)TC(Dx+e)= ( A x + b ) T C D + ( D x + e ) T C T A (\mathbf A \mathbf x + \mathbf b)^T \mathbf C \mathbf D + (\mathbf D \mathbf x + \mathbf e)^T \mathbf C^T \mathbf A (Ax+b)TCD+(Dx+e)TCTA D T C T ( A x + b ) + A T C ( D x + e ) T \mathbf D^T \mathbf C^T(\mathbf A \mathbf x + \mathbf b) + \mathbf A^T \mathbf C (\mathbf D \mathbf x + \mathbf e)^T DTCT(Ax+b)+ATC(Dx+e)T
∂ ∣ ∣ x ∣ ∣ 2 ∂ x = ∂ ( x ⋅ x ) ∂ x = \frac{\partial || \mathbf x ||^2}{\partial \mathbf x} = \frac{\partial (\mathbf x \cdot \mathbf x)}{\partial \mathbf x} = x∣∣x2=x(xx)= 2 x T 2 \mathbf x^T 2xT 2 x 2 \mathbf x 2x
∂ ∣ ∣ x − a ∣ ∣ ∂ x = \frac{\partial || \mathbf x - \mathbf a || }{\partial \mathbf x} = x∣∣xa∣∣= ( x − a ) T ∣ ∣ x − a ∣ ∣ \frac{(\mathbf x - \mathbf a)^T}{ || \mathbf x - \mathbf a || } ∣∣xa∣∣(xa)T ( x − a ) ∣ ∣ x − a ∣ ∣ \frac{(\mathbf x - \mathbf a)}{ || \mathbf x - \mathbf a || } ∣∣xa∣∣(xa)

4.3. 向量-标量

表达式分子记法分母记法备注
∂ a ∂ x = \frac{\partial \mathbf a}{\partial x} = xa= 0 \mathbf 0 0 0 \mathbf 0 0
∂ a u ∂ x = \frac{\partial a \mathbf u}{\partial x} = xau= a ∂ u ∂ x a \frac{\partial \mathbf u}{\partial x} axu a ∂ u ∂ x a \frac{\partial \mathbf u}{\partial x} axu u = u ( x ) \mathbf u = \mathbf u(\mathbf x) u=u(x)
∂ A u ∂ x = \frac{\partial \mathbf A \mathbf u}{\partial x} = xAu= A ∂ u ∂ x \mathbf A \frac{\partial \mathbf u}{\partial x} Axu ∂ u ∂ x A T \frac{\partial \mathbf u}{\partial x} \mathbf A^T xuAT u = u ( x ) \mathbf u = \mathbf u(\mathbf x) u=u(x)
∂ u T ∂ x = \frac{\partial \mathbf u^T}{\partial x} = xuT= ( ∂ u ∂ x ) T \left( \frac{\partial \mathbf u}{\partial x} \right)^T (xu)T ( ∂ u ∂ x ) T \left( \frac{\partial \mathbf u}{\partial x} \right)^T (xu)T u = u ( x ) \mathbf u = \mathbf u(\mathbf x) u=u(x)
∂ ( u + v ) ∂ x = \frac{\partial (\mathbf u + \mathbf v)}{\partial x} = x(u+v)= ∂ u ∂ x + ∂ v ∂ x \frac{\partial \mathbf u}{\partial x} + \frac{\partial \mathbf v}{\partial x} xu+xv ∂ u ∂ x + ∂ v ∂ x \frac{\partial \mathbf u}{\partial x} + \frac{\partial \mathbf v}{\partial x} xu+xv u = u ( x ) , v = v ( x ) \mathbf u = \mathbf u(\mathbf x), \mathbf v = \mathbf v(\mathbf x) u=u(x),v=v(x)
∂ ( u T × v ) ∂ x = \frac{\partial (\mathbf u^T \times \mathbf v)}{\partial x} = x(uT×v)= ( ∂ u ∂ x ) T × v + u T × ∂ v ∂ x \left( \frac{\partial \mathbf u}{\partial x} \right)^T \times \mathbf v + \mathbf u^T \times \frac{\partial \mathbf v}{\partial x} (xu)T×v+uT×xv ∂ u ∂ x × v + u T × ( ∂ v ∂ x ) T \frac{\partial \mathbf u}{\partial x} \times \mathbf v + \mathbf u^T \times \left( \frac{\partial \mathbf v}{\partial x} \right)^T xu×v+uT×(xv)T u = u ( x ) , v = v ( x ) \mathbf u = \mathbf u(\mathbf x), \mathbf v = \mathbf v(\mathbf x) u=u(x),v=v(x)
∂ f ( g ( u ) ) ∂ x = \frac{\partial \mathbf f(\mathbf g(\mathbf u))}{\partial x} = xf(g(u))= ∂ f ( g ) ∂ g ∂ g ( u ) ∂ u ∂ u ∂ x \frac{\partial \mathbf f(\mathbf g)}{\partial \mathbf g} \frac{\partial \mathbf g(\mathbf u)}{\partial \mathbf u} \frac{\partial \mathbf u}{\partial x} gf(g)ug(u)xu ∂ u ∂ x ∂ g ( u ) ∂ u ∂ f ( g ) ∂ g \frac{\partial \mathbf u}{\partial x}\frac{\partial \mathbf g(\mathbf u)}{\partial \mathbf u} \frac{\partial \mathbf f(\mathbf g)}{\partial \mathbf g} xuug(u)gf(g) u = u ( x ) \mathbf u = \mathbf u(\mathbf x) u=u(x)
∂ ( U × v ) ∂ x = \frac{\partial (\mathbf U \times \mathbf v)}{\partial x} = x(U×v)= ∂ U ∂ x × v + U × ∂ v ∂ x \frac{\partial \mathbf U}{\partial x} \times \mathbf v + \mathbf U \times \frac{\partial \mathbf v}{\partial x} xU×v+U×xv v T × ∂ U ∂ x + ∂ v ∂ x × U T \mathbf v^T \times \frac{\partial \mathbf U}{\partial x} + \frac{\partial \mathbf v}{\partial x} \times \mathbf U^T vT×xU+xv×UT U = U ( x ) , v = v ( x ) \mathbf U = \mathbf U(\mathbf x), \mathbf v = \mathbf v(\mathbf x) U=U(x),v=v(x)

4.4. 标量-矩阵

表达式分子记法分母记法备注
∂ a ∂ X = \frac{\partial a}{\partial \mathbf X} = Xa= 0 T \mathbf 0^T 0T 0 \mathbf 0 0
∂ a u ∂ X = \frac{\partial a u}{\partial \mathbf X} = Xau= a ∂ u ∂ X a \frac{\partial u}{\partial \mathbf X} aXu a ∂ u ∂ X a \frac{\partial u}{\partial \mathbf X} aXu u = u ( X ) u = u(\mathbf X) u=u(X)
∂ ( u + v ) ∂ X = \frac{\partial (u + v)}{\partial \mathbf X} = X(u+v)= ∂ u ∂ X + ∂ v ∂ X \frac{\partial u}{\partial \mathbf X} + \frac{\partial v}{\partial \mathbf X} Xu+Xv ∂ u ∂ X + ∂ v ∂ X \frac{\partial u}{\partial \mathbf X} + \frac{\partial v}{\partial \mathbf X} Xu+Xv u = u ( X ) , v = v ( X ) u = u(\mathbf X), v = v(\mathbf X) u=u(X),v=v(X)
∂ u v ∂ X = \frac{\partial u v}{\partial \mathbf X} = Xuv= u ∂ v ∂ X + v ∂ u ∂ X u \frac{\partial v}{\partial \mathbf X} + v \frac{\partial u}{\partial \mathbf X} uXv+vXu u ∂ v ∂ X + v ∂ u ∂ X u \frac{\partial v}{\partial \mathbf X} + v \frac{\partial u}{\partial \mathbf X} uXv+vXu u = u ( X ) , v = v ( X ) u = u(\mathbf X), v = v(\mathbf X) u=u(X),v=v(X)
∂ f ( g ( u ) ) ∂ X = \frac{\partial f(g(u))}{\partial \mathbf X} = Xf(g(u))= ∂ f ( g ) ∂ g ∂ g ( u ) ∂ u ∂ u ∂ X \frac{\partial f(g)}{\partial g} \frac{\partial g(u)}{\partial u} \frac{\partial u}{\partial \mathbf X} gf(g)ug(u)Xu ∂ f ( g ) ∂ g ∂ g ( u ) ∂ u ∂ u ∂ X \frac{\partial f(g)}{\partial g} \frac{\partial g(u)}{\partial u} \frac{\partial u}{\partial \mathbf X} gf(g)ug(u)Xu u = u ( X ) u = u(\mathbf X) u=u(X)
∂ a T X b ∂ X = \frac{\partial \mathbf a^T \mathbf X \mathbf b}{\partial \mathbf X} = XaTXb= b a T \mathbf b \mathbf a^T baT a b T \mathbf a \mathbf b^T abT
∂ a T X T b ∂ X = \frac{\partial \mathbf a^T \mathbf X^T \mathbf b}{\partial \mathbf X} = XaTXTb= a b T \mathbf a \mathbf b^T abT b a T \mathbf b \mathbf a^T baT
∂ ( X a + b ) T C ( X a + b ) ∂ X = \frac{\partial (\mathbf X \mathbf a + \mathbf b)^T \mathbf C (\mathbf X \mathbf a + \mathbf b)}{\partial \mathbf X} = X(Xa+b)TC(Xa+b)= [ ( C + C T ) ( X a + b ) a T ] T [ (\mathbf C + \mathbf C^T) (\mathbf X \mathbf a + \mathbf b) \mathbf a^T ]^T [(C+CT)(Xa+b)aT]T ( C + C T ) ( X a + b ) a T (\mathbf C + \mathbf C^T) (\mathbf X \mathbf a + \mathbf b) \mathbf a^T (C+CT)(Xa+b)aT
∂ ( X a ) T C ( X b ) ∂ X = \frac{\partial (\mathbf X \mathbf a)^T \mathbf C (\mathbf X \mathbf b)}{\partial \mathbf X} = X(Xa)TC(Xb)= ( C X b a T + C T X a b T ) T ( \mathbf C \mathbf X \mathbf b \mathbf a^T + \mathbf C^T \mathbf X \mathbf a \mathbf b^T )^T (CXbaT+CTXabT)T C X b a T + C T X a b T \mathbf C \mathbf X \mathbf b \mathbf a^T + \mathbf C^T \mathbf X \mathbf a \mathbf b^T CXbaT+CTXabT
∂ ∣ X ∣ ∂ X = \frac{\partial | \mathbf X | }{\partial \mathbf X} = XX= ∣ X ∣ X − 1 | \mathbf X | \mathbf X^{ - 1} XX1 ∣ X ∣ ( X − 1 ) T | \mathbf X | (\mathbf X^{ - 1})^T X(X1)T
∂ ln ⁡ ∣ a X ∣ ∂ X = \frac{\partial \ln | a \mathbf X | }{\partial \mathbf X} = XlnaX= X − 1 \mathbf X^{ - 1} X1 ( X − 1 ) T (\mathbf X^{ - 1})^T (X1)T
∂ ∣ A X B ∣ ∂ X = \frac{ \partial | \mathbf A \mathbf X \mathbf B | }{\partial \mathbf X} = XAXB= ∣ A X B ∣ X − 1 | \mathbf A \mathbf X \mathbf B | \mathbf X^{ - 1} AXBX1 ∣ A X B ∣ ( X − 1 ) T | \mathbf A \mathbf X \mathbf B | (\mathbf X^{ - 1})^T AXB(X1)T
∂ ∣ X n ∣ ∂ X = \frac{ \partial | \mathbf X^n | }{\partial \mathbf X} = XXn= n ∣ X n ∣ X − 1 n | \mathbf X^n | \mathbf X^{ - 1} nXnX1 n ∣ X n ∣ ( X − 1 ) T n | \mathbf X^n | (\mathbf X^{ - 1})^T nXn(X1)T
∂ ln ⁡ ∣ X T X ∣ ∂ X = \frac{ \partial \ln | \mathbf X^T \mathbf X | }{\partial \mathbf X} = XlnXTX= 2 X + 2 \mathbf X^+ 2X+ 2 ( X + ) T 2 (\mathbf X^+)^T 2(X+)T X + \mathbf X^+ X+ X \mathbf X X 的广义逆
∂ ln ⁡ ∣ X T X ∣ ∂ X + = \frac{\partial \ln | \mathbf X^T \mathbf X | }{\partial \mathbf X^+} = X+lnXTX= − 2 X - 2 \mathbf X 2X − 2 X T - 2 \mathbf X^T 2XT X + \mathbf X^+ X+ X \mathbf X X 的广义逆
∂ ∣ X T A X ∣ ∂ X = \frac{\partial | \mathbf X^T \mathbf A \mathbf X | }{\partial \mathbf X} = XXTAX= 2 ∣ X T A X ∣ X − 1 = 2 ∣ X T ∣ ∣ A ∣ ∣ X ∣ X − 1 2 | \mathbf X^T \mathbf A \mathbf X | \mathbf X^{ - 1} = 2 | \mathbf X^T | | \mathbf A | | \mathbf X | \mathbf X^{ - 1} 2∣XTAXX1=2∣XT∣∣A∣∣XX1 2 ∣ X T A X ∣ ( X − 1 ) T 2 | \mathbf X^T \mathbf A \mathbf X | (\mathbf X^{ - 1})^T 2∣XTAX(X1)T X \mathbf X X 为方阵且可逆
∂ ∣ X T A X ∣ ∂ X = \frac{\partial | \mathbf X^T \mathbf A \mathbf X | }{\partial \mathbf X} = XXTAX= 2 ∣ X T A X ∣ ( X T A T X ) − 1 X T A T 2 | \mathbf X^T \mathbf A \mathbf X | ( \mathbf X^T \mathbf A^T \mathbf X )^{ - 1} \mathbf X^T \mathbf A^T 2∣XTAX(XTATX)1XTAT 2 ∣ X T A X ∣ A X ( X T A X ) − 1 2 | \mathbf X^T \mathbf A \mathbf X | \mathbf A \mathbf X ( \mathbf X^T \mathbf A \mathbf X )^{ - 1} 2∣XTAXAX(XTAX)1 A \mathbf A A 对称
∂ ∣ X T A X ∣ ∂ X = \frac{\partial | \mathbf X^T \mathbf A \mathbf X | }{\partial \mathbf X} = XXTAX= ∣ X T A X ∣ [ ( X T A X ) − 1 X T A + ( X T A T X ) − 1 X T A T ] | \mathbf X^T \mathbf A \mathbf X | [ ( \mathbf X^T \mathbf A \mathbf X)^{ - 1} \mathbf X^T \mathbf A + ( \mathbf X^T \mathbf A^T \mathbf X )^{ - 1} \mathbf X^T \mathbf A^T ] XTAX[(XTAX)1XTA+(XTATX)1XTAT] ∣ X T A X ∣ [ A X ( X T A X ) − 1 + A T X ( X T A T X ) − 1 ] | \mathbf X^T \mathbf A \mathbf X | [ \mathbf A \mathbf X ( \mathbf X^T \mathbf A \mathbf X )^{ - 1} + \mathbf A^T \mathbf X ( \mathbf X^T \mathbf A^T \mathbf X )^{ - 1} ] XTAX[AX(XTAX)1+ATX(XTATX)1]

4.5. 矩阵-标量

表达式分子记法备注
∂ a U ∂ x = \frac{\partial a \mathbf U}{\partial x} = xaU= a ∂ U ∂ x a \frac{\partial \mathbf U}{\partial x} axU U = U ( x ) \mathbf U = \mathbf U(x) U=U(x)
∂ A U B ∂ x = \frac{\partial \mathbf A \mathbf U \mathbf B}{\partial x} = xAUB= A ∂ U ∂ x B \mathbf A \frac{\partial \mathbf U}{\partial x} \mathbf B AxUB U = U ( x ) \mathbf U = \mathbf U(x) U=U(x)
∂ ( U + V ) ∂ x = \frac{\partial (\mathbf U + \mathbf V)}{\partial x} = x(U+V)= ∂ U ∂ x + ∂ V ∂ x \frac{\partial \mathbf U}{\partial x} + \frac{\partial \mathbf V}{\partial x} xU+xV U = U ( x ) , V = V ( x ) \mathbf U = \mathbf U(x), \mathbf V = \mathbf V(x) U=U(x),V=V(x)
∂ ( U V ) ∂ x = \frac{\partial (\mathbf U \mathbf V)}{\partial x} = x(UV)= U ∂ V ∂ x + ∂ U ∂ x V \mathbf U \frac{\partial \mathbf V}{\partial x} + \frac{\partial \mathbf U}{\partial x} \mathbf V UxV+xUV U = U ( x ) , V = V ( x ) \mathbf U = \mathbf U(x), \mathbf V = \mathbf V(x) U=U(x),V=V(x)
∂ ( U ⊗ V ) ∂ x = \frac{\partial (\mathbf U \otimes \mathbf V)}{\partial x} = x(UV)= U ⊗ ∂ V ∂ x + ∂ U ∂ x ⊗ V \mathbf U \otimes \frac{\partial \mathbf V}{\partial x} + \frac{\partial \mathbf U}{\partial x} \otimes \mathbf V UxV+xUV U = U ( x ) , V = V ( x ) \mathbf U = \mathbf U(x), \mathbf V = \mathbf V(x) U=U(x),V=V(x) ⊗ \otimes 表示 Kronecker 乘积
∂ ( U ∘ V ) ∂ x = \frac{\partial (\mathbf U \circ \mathbf V)}{\partial x} = x(UV)= U ∘ ∂ V ∂ x + ∂ U ∂ x ∘ V \mathbf U \circ \frac{\partial \mathbf V}{\partial x} + \frac{\mathbf \partial U}{\partial x} \circ \mathbf V UxV+xUV U = U ( x ) , V = V ( x ) \mathbf U = \mathbf U(x), \mathbf V = \mathbf V(x) U=U(x),V=V(x) ∘ \circ 表示 Hadamard 乘积
∂ U − 1 ∂ x = \frac{\partial \mathbf U^{ - 1}}{\partial x} = xU1= − U − 1 ∂ U ∂ x U − 1 -\mathbf U^{ - 1} \frac{\partial \mathbf U}{\partial x} \mathbf U^{ - 1} U1xUU1 U = U ( x ) \mathbf U = \mathbf U(x) U=U(x)
∂ 2 U − 1 ∂ x ∂ y = \frac{\partial^2 \mathbf U^{ - 1}}{\partial x \partial y} = xy2U1= U − 1 ( ∂ U ∂ x U − 1 ∂ U ∂ y − ∂ 2 U ∂ x ∂ y + ∂ U ∂ y U − 1 ∂ U ∂ x ) U − 1 \mathbf U^{ - 1} \left( \frac{\partial \mathbf U}{\partial x} \mathbf U^{ - 1} \frac{\partial \mathbf U}{\partial y} - \frac{\partial^2 \mathbf U}{\partial x \partial y} + \frac{\partial \mathbf U}{\partial y} \mathbf U^{ - 1} \frac{\partial \mathbf U}{\partial x} \right) \mathbf U^{ - 1} U1(xUU1yUxy2U+yUU1xU)U1 U = U ( x , y ) \mathbf U = \mathbf U(x, y) U=U(x,y)
∂ g ( x A ) ∂ x = \frac{\partial g (x \mathbf A)}{\partial x} = xg(xA)= A g ′ ( x A ) = g ′ ( x A ) A \mathbf A g' (x \mathbf A) = g' (x \mathbf A) \mathbf A Ag(xA)=g(xA)A应为 Hadamard 乘积; g ( ⋅ ) g (\cdot) g() 为逐元函数,如下例
∂ e x A ∂ x = \frac{\partial e^{x \mathbf A}}{\partial x} = xexA= A e x A = e x A A \mathbf A e^{x \mathbf A} = e^{x \mathbf A} \mathbf A AexA=exAA

二、矩阵分解

  • QR分解: M = Q R M = QR M=QR, Q正交,R上三角。
  • 奇异值分解(Singular Value Decomposition,SVD) M = U Σ V T M = UΣV^T M=UΣVT, U和V正交,Σ非负对角。
  • 特征分解(Eigendecomposition),又叫谱分解(Spectral decomposition) S = Q Λ Q T S =QΛQ^T S=QΛQT, S对称,Q正交,Λ对角。
  • 极分解: M = Q S M = QS M=QS, Q正交,S对称半正定。
  • 科列斯基分解(Cholesky decomposition) A = L L ∗ \mathbf {A} =\mathbf {LL} ^{*} A=LL L \mathbf{L} L 下三角矩阵且所有对角元素均为正实数, L ∗ \mathbf {L} ^{*} L表示 L \mathbf {L} L 的共轭转置。每一个正定埃尔米特矩阵都有一个唯一的科列斯基分解
  • LU分解: A = L U A=LU A=LU,L下三角, U上三角

1. 科列斯基分解

科列斯基分解主要被用于线性方程组 A x = b \mathbf {Ax} =\mathbf {b} Ax=b 的求解。如果 A A A 是对称正定的,我们可以先求出 A = L L T \mathbf {A} =\mathbf {LL} ^{\mathbf {T} } A=LLT,随后借向后替换法对 y y y 求解 L y = b \mathbf {Ly} =\mathbf {b} Ly=b,再以向前替换法对 x x x 求解 L T x = y \mathbf {L} ^{\mathbf {T} }\mathbf {x} =\mathbf {y} LTx=y即得最终解。
另一种可避免在计算 L L T \mathbf {LL} ^{\mathbf {T} } LLT时需要解平方根的方法就是计算 A = L D L T \mathbf {A} =\mathbf {LDL} ^{\mathrm {T} } A=LDLT,然后对 y y y 求解 L y = b \mathbf {Ly} =\mathbf {b} Ly=b,最后求解 D L T x = y \mathbf {DL} ^{\mathrm {T} }\mathbf {x} =\mathbf {y} DLTx=y
对于可以被改写成对称矩阵的线性方程组,科列斯基分解及其LDL变形是一个较高效率及较高数值稳定性的求解方法。相比之下,其效率几近为LU分解的两倍

2. SGD分解

在这里插入图片描述

三、矩阵种类

1.「正定矩阵」和「半正定矩阵」

案例:多元正态分布的协方差矩阵要求是半正定的

【定义1】 给定一个大小为 n × n n\times n n×n 的实对称矩阵 A A A,若对于任意长度为 n n n 的非零向量 x \boldsymbol{x} x,有 x T A x > 0 \boldsymbol{x}^TA\boldsymbol{x}>0 xTAx>0 恒成立,则矩阵 A A A是一个正定矩阵


【定义2】 给定一个大小为 n × n n\times n n×n 的实对称矩阵 A A A ,若对于任意长度为 n n n 的向量 x \boldsymbol{x} x ,有 x T A x ≥ 0 \boldsymbol{x}^TA\boldsymbol{x}\geq0 xTAx0 恒成立,则矩阵 A A A 是一个半正定矩阵

直观解释:
若给定任意一个正定矩阵 A ∈ R n × n A\in\mathbb{R}^{n\times n} ARn×n 和一个非零向量 x ∈ R n \boldsymbol{x}\in\mathbb{R}^{n} xRn ,则两者相乘得到的向量 y = A x ∈ R n \boldsymbol{y}=A\boldsymbol{x}\in\mathbb{R}^{n} y=AxRn 与向量 x \boldsymbol{x} x 的夹角恒小于 π 2 \frac{\pi}{2} 2π . (等价于: x T A x > 0 \boldsymbol{x}^TA\boldsymbol{x}>0 xTAx>0 .)
若给定任意一个半正定矩阵 A ∈ R n × n A\in\mathbb{R}^{n\times n} ARn×n 和一个向量 x ∈ R n \boldsymbol{x}\in\mathbb{R}^{n} xRn ,则两者相乘得到的向量 y = A x ∈ R n \boldsymbol{y}=A\boldsymbol{x}\in\mathbb{R}^{n} y=AxRn 与向量 x \boldsymbol{x} x 的夹角恒小于或等于 π 2 \frac{\pi}{2} 2π . (等价于: x T A x ≥ 0 \boldsymbol{x}^TA\boldsymbol{x}\geq0 xTAx0 .)

1.1 为什么协方差矩阵是半正定的

对于任意多元随机变量 t \boldsymbol{t} t ,协方差矩阵为
C = E [ ( t − t ˉ ) ( t − t ˉ ) T ] C=\mathbb{E}\left[(\boldsymbol{t}-\bar{\boldsymbol{t}})(\boldsymbol{t}-\bar{\boldsymbol{t}})^T\right] C=E[(ttˉ)(ttˉ)T]

现给定任意一个向量 x \boldsymbol{x} x ,则 x T C x = x T E [ ( t − t ˉ ) ( t − t ˉ ) T ] x = E [ x T ( t − t ˉ ) ( t − t ˉ ) T x ] = E ( s 2 ) = σ s 2 \boldsymbol{x}^TC\boldsymbol{x}=\boldsymbol{x}^T\mathbb{E}\left[(\boldsymbol{t}-\bar{\boldsymbol{t}})(\boldsymbol{t}-\bar{\boldsymbol{t}})^T\right]\boldsymbol{x} =\mathbb{E}\left[\boldsymbol{x}^T(\boldsymbol{t}-\bar{\boldsymbol{t}})(\boldsymbol{t}-\bar{\boldsymbol{t}})^T\boldsymbol{x}\right]=\mathbb{E}(s^2)=\sigma_{s}^2 xTCx=xTE[(ttˉ)(ttˉ)T]x=E[xT(ttˉ)(ttˉ)Tx]=E(s2)=σs2
其中, σ s = x T ( t − t ˉ ) = ( t − t ˉ ) T x \sigma_s=\boldsymbol{x}^T(\boldsymbol{t}-\bar{\boldsymbol{t}})=(\boldsymbol{t}-\bar{\boldsymbol{t}})^T\boldsymbol{x} σs=xT(ttˉ)=(ttˉ)Tx。由于 σ s 2 ≥ 0 \sigma_s^2\geq0 σs20 ,因此, x T C x ≥ 0 \boldsymbol{x}^TC\boldsymbol{x}\geq0 xTCx0 ,协方差矩阵 C C C 是半正定的。

2. 逆矩阵

分块矩阵(Block matrix) 的逆矩阵恒等式:
( A B C D ) − 1 = ( M − M B D − 1 − D − 1 C M D − 1 + D − 1 C M B D − 1 ) \begin{pmatrix}A&B\\C&D\end{pmatrix}^{-1}=\begin{pmatrix}M&-MBD^{-1}\\-D^{-1}CM&D^{-1}{+D^{-1}CMBD^{-1}}\end{pmatrix} (ACBD)1=(MD1CMMBD1D1+D1CMBD1)
其中 M = ( A − B D − 1 C ) − 1 M=(A-BD^{-1}C)^{-1} M=(ABD1C)1

若A,C为可逆方阵,则有 ( A + B C D ) − 1 = A − 1 − A − 1 B ( D A − 1 B + C − 1 ) − 1 D A − 1 (A+BCD)^{-1}=A^{-1}-A^{-1}B(DA^{-1}B+C^{-1})^{-1}DA^{-1} (A+BCD)1=A1A1B(DA1B+C1)1DA1


工具网站

  • Matrix Calculus:在线计算矩阵导数

References

矩阵微积分 | Here4U

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:/a/185504.html

如若内容造成侵权/违法违规/事实不符,请联系我们进行投诉反馈qq邮箱809451989@qq.com,一经查实,立即删除!

相关文章

Spark SQL 时间格式处理

初始化Spark Sql package pbcp_2023.clear_dataimport org.apache.spark.SparkConf import org.apache.spark.sql.SparkSession import org.apache.spark.sql.functions.{current_date, current_timestamp}object twe_2 {def main(args: Array[String]): Unit {val con new …

优秀的时间追踪软件Timemator for Mac轻松管理时间!

在现代社会,时间管理成为了我们工作和生活中的一大挑战。如果你经常感到时间不够用,无法高效地完成任务,那么Timemator for Mac将成为你的得力助手。 Timemator for Mac是一款出色的时间追踪软件,它可以帮助你精确记录和管理你的…

Codeforces Round 786 (Div. 3) D. A-B-C Sort

D. A-B-C Sort 步骤 1 :当 a不为空时,从 a中取出最后一个元素,并将其移动到数组 b的中间。如果 b 当前长度为奇数,则可以选择:将 a 中的元素放到 b 中间元素的左边或右边。结果, a 变空, b 由 n…

【2023.11.24】Mybatis基本连接语法学习➹

基本配置 1.如果使用Maven管理项目&#xff0c;需要在pom.xml中配置依赖。 2.安装Mybatis-3.5.7.jar包 3.进行XML配置&#xff1a;这里将文件命名为mybatis-config.xml 配置数据库连接XML文件 <?xml version"1.0" encoding"UTF-8" ?> <!DO…

数据结构-归并排序+计数排序

1.归并排序 基本思想&#xff1a; 归并排序是建立在归并操作上的一种有效的排序算法,该算法是采用分治法的一个非常典型的应用。将已有序的子序列合并&#xff0c;得到完全有序的序列&#xff1b;即先使每个子序列有序&#xff0c;再使子序列段间有序。若将两个有序表合并成一个…

Relabel与Metic Relabel

Prometheus支持多种方式的自动发现目标&#xff08;targets&#xff09;&#xff0c;以下是一些常见的自动发现方式&#xff1a; 静态配置&#xff1a;您可以在Prometheus配置文件中直接列出要监测的目标。这种方式适用于目标相对稳定的情况下&#xff0c;例如固定的服务器或设…

【多线程】Thread类的使用

目录 1.概述 2.Thread的常见构造方法 3.Thread的几个常见属性 4.启动一个线程-start() 5.中断一个线程 5.1通过共享的标记来进行沟通 5.2 调用 interrupt() 方法来通知 6.等待一个进程 7.获取当前线程引用 8.线程的状态 8.1所有状态 8.2线程状态和转移的意义 1.概述 …

字节序

计算机硬件有两种储存数据的方式&#xff1a;大端字节序big endian 和 小端字节序 little endian。 数值0x2211使用两个字节储存&#xff1a;高位字节是0x22&#xff0c;低位字节是0x11。 大端字节序&#xff1a;低位放高地址&#xff0c;高位字节在低地址&#xff0c;地址空间…

JDBC编程方法及细节

JDBC&#xff08;Java Database Connectivity&#xff09;是Java编程语言用于连接和操作数据库的API&#xff08;Application Programming Interface&#xff09;。它为开发人员提供了一组Java类和接口&#xff0c;用于与各种关系型数据库进行通信。使用JDBC&#xff0c;开发人…

路径规划之Best-First Search算法

系列文章目录 路径规划之Dijkstra算法 路径规划之Best-First Search算法 路径规划之Best-First Search算法 系列文章目录前言一、Best-First Search算法1.1 起源1.2 过程 三、简单使用 前言 Best-First Search算法和Dijkstra算法类似&#xff0c;都属于BFS的扩展或改进 一、…

【Python进阶笔记】md文档笔记第6篇:Python进程和多线程使用(图文和代码)

本文从14大模块展示了python高级用的应用。分别有Linux命令&#xff0c;多任务编程、网络编程、Http协议和静态Web编程、htmlcss、JavaScript、jQuery、MySql数据库的各种用法、python的闭包和装饰器、mini-web框架、正则表达式等相关文章的详细讲述。 全套md格式笔记和代码自…

【hive】列转行—collect_set()/collect_list()/concat_ws()函数的使用场景

文章目录 一、collect_set()/collect_list():二、实际运用1、创建测试表及插入数据 :举例1&#xff1a;按照id&#xff0c;cur_day分组&#xff0c;取出每个id对应的所有rule&#xff08;不去重&#xff09;。举例2&#xff1a;按照id&#xff0c;cur_day分组&#xff0c;取出每…

【Unity入门】碰撞检测

碰撞器由来 1.系统默认会给每个对象(GameObject)添加一个碰撞组件(ColliderComponent)&#xff0c;一些背景对象则可以取消该组件。 2.在unity3d中&#xff0c;能检测碰撞发生的方式有两种&#xff0c;一种是利用碰撞器&#xff0c;另一种则是利用触发器。这两种方式的应用非…

左孩子右兄弟(Java详解)

目录 一、题目描述 二、题解 一、题目描述 对于一棵多叉树&#xff0c;我们可以通过“左孩子右兄弟” 表示法&#xff0c;将其转化成一棵二叉树。 如果我们认为每个结点的子结点是无序的&#xff0c;那么得到的二叉树可能不唯一。 换句话说&#xff0c;每个结点可以选任意子结…

论文导读 | 10月专题内容精选:人的预测

编者按 本次论文导读&#xff0c;编者选择了10月份OR和MS上与"人的预测"有关的三篇文章&#xff0c;分别涉及群体智慧的提取&#xff0c;个体序列预测的评估&#xff0c;以及决策者对风险的扭曲感知在分布式鲁棒优化中的应用。其中&#xff0c;从基于"生成式可能…

红队攻防实战之从边界突破到漫游内网(无cs和msf)

也许有一天我们再相逢&#xff0c;睁大眼睛看清楚&#xff0c;我才是英雄。 本文首发于先知社区&#xff0c;原创作者即是本人 本篇文章目录 网络拓扑图&#xff1a; 本次红队攻防实战所需绘制的拓扑图如下&#xff1a; 边界突破 访问网站&#xff1a; http://xxx.xxx.xxx…

Flink 常用物理分区算子(Physical Partitioning)

Flink 物理分区算子(Physical Partitioning) 在Flink中&#xff0c;常见的物理分区策略有&#xff1a;随机分配(Random)、轮询分配(Round-Robin)、重缩放(Rescale)和广播(Broadcast)。 接下来&#xff0c;我们通过源码和Demo分别了解每种物理分区算子的作用和区别。 (1) 随机…

2024北京林业大学计算机考研分析

24计算机考研|上岸指南 北京林业大学 特色优势 Characteristics & Advantages&#xff1a;信息学院创建于2001年&#xff0c;是一个年轻而有朝气的学院。学院秉承“结构、特色、质量、创新”的八字方针&#xff0c;坚持以“质量提升、行业融合”为核心的内涵式发展战略&am…

Pycharm创建项目新环境,安装Pytorch

在python项目中&#xff0c;很多项目使用的各类包的版本是不一致的。所以我们可以对每个项目有专属于它的环境。所以这个文章就是教你如何创建新环境。 一、创建新环境 首先我们需要去官网下载conda。然后在Pycharm下面添加conda的可执行文件。 用conda创建新环境。 二、…

libmosquitto库的一个bug,任务消息id(mid)分配后不起作用

代码如图所示: 当订阅了所有主题后,每个主题的mid是他们的下标索引加100的数字,可是实际打印出来的值是: mid依然是1,2,这个参数在这里失效了,不知道是bug还是mqtt的什么机制?