文章目录
- 1. 三个条件
- 2. 二次上界引理
- 3. 证明
1. 三个条件
- f f f 有下界,可微,凸函数
-
∇
f
\nabla f
∇f是
Lipschitz
连续 - 步长
α
∈
(
0
,
1
L
]
\alpha \in (0,\frac{1}{L}]
α∈(0,L1]
则 { f ( x k ) } \{ f(x_k) \} {f(xk)} 以 O ( 1 k ) \mathcal{O}(\frac{1}{k}) O(k1) 收敛于 f ∗ f^{*} f∗
2. 二次上界引理
若
f
f
f 可微,
∇
f
\nabla f
∇f是 Lipschitz
连续,则
f
f
f有二次上界,即
∀
x
,
y
∈
R
n
→
f
(
y
)
≤
f
(
x
)
+
∇
f
(
x
)
T
(
y
−
x
)
+
L
2
∣
∣
y
−
x
∣
∣
2
\begin{equation} \forall x,y \in \mathbb{R}^n\rightarrow f(y)\le f(x)+\nabla f(x)^T(y-x)+\frac{L}{2}||y-x||^2 \end{equation}
∀x,y∈Rn→f(y)≤f(x)+∇f(x)T(y−x)+2L∣∣y−x∣∣2
3. 证明
-
我们前提是梯度下降法: x i = x i − 1 − ∇ f T ( x i − 1 ) α x_i=x_{i-1}-\nabla f^T(x_{i-1} ) \alpha xi=xi−1−∇fT(xi−1)α
-
不妨设 y = x i , x = x i − 1 y=x_i,x=x_{i-1} y=xi,x=xi−1,则可得: y − x = x i − x i − 1 = − ∇ f T ( x i − 1 ) α y-x=x_i-x_{i-1}=-\nabla f^T(x_{i-1} ) \alpha y−x=xi−xi−1=−∇fT(xi−1)α
-
代入到二次上界引理可得:
f ( x i ) ≤ f ( x i − 1 ) + ∇ f ( x i − 1 ) T [ − ∇ f T ( x i − 1 ) α ] + L 2 [ − ∇ f T ( x i − 1 ) α ] 2 \begin{equation} f(x_i)\le f(x_{i-1})+\nabla f(x_{i-1})^T[-\nabla f^T(x_{i-1} ) \alpha]+\frac{L}{2}[-\nabla f^T(x_{i-1} ) \alpha]^2 \end{equation} f(xi)≤f(xi−1)+∇f(xi−1)T[−∇fT(xi−1)α]+2L[−∇fT(xi−1)α]2 -
右边整理可得:
= f ( x i − 1 ) − [ ∇ f T ( x i − 1 ) ] 2 α + L α 2 2 [ ∇ f T ( x i − 1 ) ] 2 \begin{equation} = f(x_{i-1})-[\nabla f^T(x_{i-1} )]^2 \alpha+\frac{L\alpha^2}{2}[\nabla f^T(x_{i-1})]^2 \end{equation} =f(xi−1)−[∇fT(xi−1)]2α+2Lα2[∇fT(xi−1)]2 -
因为 α ≤ 1 L \alpha \le \frac{1}{L} α≤L1,则可得: L ≤ 1 α → L α 2 2 ≤ α 2 2 1 α → L α 2 2 ≤ α 2 L\le \frac{1}{\alpha}\rightarrow \frac{L\alpha^2}{2}\le \frac{\alpha^2}{2}\frac{1}{\alpha}\rightarrow \frac{L\alpha^2}{2}\le \frac{\alpha}{2} L≤α1→2Lα2≤2α2α1→2Lα2≤2α
-
右边放缩可得:
≤ f ( x i − 1 ) − [ ∇ f T ( x i − 1 ) ] 2 α + α 2 [ ∇ f T ( x i − 1 ) ] 2 = f ( x i − 1 ) − α 2 [ ∇ f T ( x i − 1 ) ] 2 \begin{equation} \le f(x_{i-1})-[\nabla f^T(x_{i-1} )]^2 \alpha+\frac{\alpha}{2}[\nabla f^T(x_{i-1})]^2= f(x_{i-1})-\frac{\alpha}{2}[\nabla f^T(x_{i-1} )]^2 \end{equation} ≤f(xi−1)−[∇fT(xi−1)]2α+2α[∇fT(xi−1)]2=f(xi−1)−2α[∇fT(xi−1)]2 -
综上所述可得:
f ( x i ) ≤ f ( x i − 1 ) − α 2 [ ∇ f T ( x i − 1 ) ] 2 \begin{equation} f(x_i)\le f(x_{i-1})-\frac{\alpha}{2}[\nabla f^T(x_{i-1} )]^2 \end{equation} f(xi)≤f(xi−1)−2α[∇fT(xi−1)]2
-
如图所示,极值点 f ∗ ≥ y A f^*\ge y_A f∗≥yA,
f ( x i − 1 ) − y A x i − 1 − x ∗ = ∇ f T ( x i − 1 ) \begin{equation} \frac{f(x_{i-1})-y_A}{x_{i-1}-x^*}=\nabla f^T(x_{i-1}) \end{equation} xi−1−x∗f(xi−1)−yA=∇fT(xi−1) -
整理上式可得:
y A = f ( x i − 1 ) − ∇ f T ( x i − 1 ) [ x i − 1 − x ∗ ] \begin{equation} y_A=f(x_{i-1})-\nabla f^T(x_{i-1})[x_{i-1}-x^*] \end{equation} yA=f(xi−1)−∇fT(xi−1)[xi−1−x∗]
f ∗ ≥ f ( x i − 1 ) − ∇ f T ( x i − 1 ) [ x i − 1 − x ∗ ] \begin{equation} f^*\ge f(x_{i-1})-\nabla f^T(x_{i-1})[x_{i-1}-x^*] \end{equation} f∗≥f(xi−1)−∇fT(xi−1)[xi−1−x∗] -
整理上式可得:
f ( x i − 1 ) ≤ f ∗ + ∇ f T ( x i − 1 ) [ x i − 1 − x ∗ ] \begin{equation} f(x_{i-1})\le f^*+\nabla f^T(x_{i-1})[x_{i-1}-x^*] \end{equation} f(xi−1)≤f∗+∇fT(xi−1)[xi−1−x∗] -
将上面得到的公式如下:
f ( x i ) ≤ f ( x i − 1 ) − α 2 [ ∇ f T ( x i − 1 ) ] 2 \begin{equation} f(x_i)\le f(x_{i-1})-\frac{\alpha}{2}[\nabla f^T(x_{i-1} )]^2 \end{equation} f(xi)≤f(xi−1)−2α[∇fT(xi−1)]2 -
放大右边可得:
f ( x i ) ≤ f ∗ + ∇ f T ( x i − 1 ) [ x i − 1 − x ∗ ] − α 2 [ ∇ f T ( x i − 1 ) ] 2 \begin{equation} f(x_i)\le f^*+\nabla f^T(x_{i-1})[x_{i-1}-x^*]-\frac{\alpha}{2}[\nabla f^T(x_{i-1} )]^2 \end{equation} f(xi)≤f∗+∇fT(xi−1)[xi−1−x∗]−2α[∇fT(xi−1)]2 -
配方右边等式可得:
= f ∗ − 1 2 α [ ( α 2 ( ∇ f T ( x i − 1 ) ) 2 − 2 α ∇ f T ( x i − 1 ) ( x i − 1 − x ∗ ) ] \begin{equation} =f^*-\frac{1}{2\alpha}[(\alpha^2 (\nabla f^T(x_{i-1} ))^2-2\alpha \nabla f^T(x_{i-1} )(x_{i-1}-x^*)] \end{equation} =f∗−2α1[(α2(∇fT(xi−1))2−2α∇fT(xi−1)(xi−1−x∗)]
= f ∗ − 1 2 α [ ( α 2 ( ∇ f T ( x i − 1 ) ) 2 − 2 α ∇ f T ( x i − 1 ) ( x i − 1 − x ∗ ) + ( x i − 1 − x ∗ ) 2 − ( x i − 1 − x ∗ ) 2 ] \begin{equation} =f^*-\frac{1}{2\alpha}[(\alpha^2 (\nabla f^T(x_{i-1} ))^2-2\alpha \nabla f^T(x_{i-1} )(x_{i-1}-x^*)+(x_{i-1}-x^*)^2-(x_{i-1}-x^*)^2] \end{equation} =f∗−2α1[(α2(∇fT(xi−1))2−2α∇fT(xi−1)(xi−1−x∗)+(xi−1−x∗)2−(xi−1−x∗)2]
= f ∗ − 1 2 α [ ( α ∇ f T ( x i − 1 ) − x i − 1 + x ∗ ) 2 − ( x i − 1 − x ∗ ) 2 ] \begin{equation} =f^*-\frac{1}{2\alpha}[(\alpha \nabla f^T(x_{i-1} )-x_{i-1}+x^*)^2-(x_{i-1}-x^*)^2] \end{equation} =f∗−2α1[(α∇fT(xi−1)−xi−1+x∗)2−(xi−1−x∗)2] -
将 ( α ∇ f T ( x i − 1 ) − x i − 1 + x ∗ ) 2 = ( x i − 1 − α ∇ f T ( x i − 1 ) − x ∗ ) 2 (\alpha \nabla f^T(x_{i-1} )-x_{i-1}+x^*)^2=(x_{i-1} -\alpha \nabla f^T(x_{i-1} )-x^*)^2 (α∇fT(xi−1)−xi−1+x∗)2=(xi−1−α∇fT(xi−1)−x∗)2,根据梯度公式可得:
x i = x i − 1 − ∇ f T ( x i − 1 ) α → ( α ∇ f T ( x i − 1 ) − x i − 1 + x ∗ ) 2 = ( x i − x ∗ ) 2 \begin{equation} x_i=x_{i-1}-\nabla f^T(x_{i-1} ) \alpha \rightarrow (\alpha \nabla f^T(x_{i-1} )-x_{i-1}+x^*)^2=(x_i-x^*)^2 \end{equation} xi=xi−1−∇fT(xi−1)α→(α∇fT(xi−1)−xi−1+x∗)2=(xi−x∗)2 -
综合整理上式可得:
f ( x i ) ≤ f ∗ − 1 2 α [ ( x i − x ∗ ) 2 − ( x i − 1 − x ∗ ) 2 ] \begin{equation} f(x_i)\le f^*-\frac{1}{2\alpha}[(x_i-x^*)^2-(x_{i-1}-x^*)^2] \end{equation} f(xi)≤f∗−2α1[(xi−x∗)2−(xi−1−x∗)2] -
改成范数形式可得:
f ( x i ) ≤ f ∗ + 1 2 α [ ∣ ∣ x i − 1 − x ∗ ∣ ∣ 2 − ∣ ∣ x i − x ∗ ∣ ∣ 2 ] \begin{equation} f(x_i)\le f^*+\frac{1}{2\alpha}[||x_{i-1}-x^*||^2-||x_{i}-x^*||^2] \end{equation} f(xi)≤f∗+2α1[∣∣xi−1−x∗∣∣2−∣∣xi−x∗∣∣2] -
不断代入可得:
f ( x i ) − f ∗ ≤ 1 2 α [ ∣ ∣ x 0 − x ∗ ∣ ∣ 2 − ∣ ∣ x 1 − x ∗ ∣ ∣ 2 ] \begin{equation} f(x_i) -f^*\le\frac{1}{2\alpha}[||x_{0}-x^*||^2-||x_{1}-x^*||^2] \end{equation} f(xi)−f∗≤2α1[∣∣x0−x∗∣∣2−∣∣x1−x∗∣∣2]
f ( x i ) − f ∗ ≤ 1 2 α [ ∣ ∣ x 1 − x ∗ ∣ ∣ 2 − ∣ ∣ x 2 − x ∗ ∣ ∣ 2 ] \begin{equation} f(x_i) -f^*\le\frac{1}{2\alpha}[||x_{1}-x^*||^2-||x_{2}-x^*||^2] \end{equation} f(xi)−f∗≤2α1[∣∣x1−x∗∣∣2−∣∣x2−x∗∣∣2]
f ( x k ) − f ∗ ≤ 1 2 α [ ∣ ∣ x k − 1 − x ∗ ∣ ∣ 2 − ∣ ∣ x k − x ∗ ∣ ∣ 2 ] \begin{equation} f(x_k) -f^*\le\frac{1}{2\alpha}[||x_{k-1}-x^*||^2-||x_{k}-x^*||^2] \end{equation} f(xk)−f∗≤2α1[∣∣xk−1−x∗∣∣2−∣∣xk−x∗∣∣2] -
左右相加可得:
∑ i = 1 k [ f ( x i ) − f ∗ ] ≤ 1 2 α [ ∣ ∣ x 0 − x ∗ ∣ ∣ 2 − ∣ ∣ x k − x ∗ ∣ ∣ 2 ] ≤ 1 2 α ∣ ∣ x 0 − x ∗ ∣ ∣ 2 \begin{equation} \sum_{i=1}^k[f(x_i) -f^*]\le\frac{1}{2\alpha}[||x_{0}-x^*||^2-||x_{k}-x^*||^2]\le\frac{1}{2\alpha}||x_{0}-x^*||^2 \end{equation} i=1∑k[f(xi)−f∗]≤2α1[∣∣x0−x∗∣∣2−∣∣xk−x∗∣∣2]≤2α1∣∣x0−x∗∣∣2 -
因为:
∑ i = 1 k ( f ( x k ) − f ∗ ) = k [ f ( x k ) − f ∗ ] → f ( x k ) − f ∗ = 1 k ∑ i = 1 k ( f ( x k ) − f ∗ ) \begin{equation} \sum_{i=1}^k(f(x_k)-f^*)=k[f(x_k)-f^*]\rightarrow f(x_k)-f^*=\frac{1}{k}\sum_{i=1}^k(f(x_k)-f^*) \end{equation} i=1∑k(f(xk)−f∗)=k[f(xk)−f∗]→f(xk)−f∗=k1i=1∑k(f(xk)−f∗) -
由于 f ( x i ) f(x_i) f(xi)我们定义单调递减,所以可得: f ( x i ) ≥ f ( x k ) f(x_i)\ge f(x_k) f(xi)≥f(xk)
f ( x k ) − f ∗ = 1 k ∑ i = 1 k ( f ( x k ) − f ∗ ) ≤ 1 k ∑ i = 1 k ( f ( x i ) − f ∗ ) \begin{equation} f(x_k)-f^*=\frac{1}{k}\sum_{i=1}^k(f(x_k)-f^*)\le \frac{1}{k}\sum_{i=1}^k(f(x_i)-f^*) \end{equation} f(xk)−f∗=k1i=1∑k(f(xk)−f∗)≤k1i=1∑k(f(xi)−f∗) -
整理可得:
f ( x k ) − f ∗ = 1 k ∑ i = 1 k ( f ( x i ) − f ∗ ) ≤ 1 k ⋅ 1 2 α ∣ ∣ x 0 − x ∗ ∣ ∣ 2 = O ( 1 k ) \begin{equation} f(x_k)-f^*=\frac{1}{k}\sum_{i=1}^k(f(x_i)-f^*)\le \frac{1}{k}\cdot \frac{1}{2\alpha}||x_0-x^*||^2=\mathcal{O}(\frac{1}{k}) \end{equation} f(xk)−f∗=k1i=1∑k(f(xi)−f∗)≤k1⋅2α1∣∣x0−x∗∣∣2=O(k1) -
综上所述:
f ( x k ) − f ∗ = O ( 1 k ) \begin{equation} f(x_k)-f^*=\mathcal{O}(\frac{1}{k}) \end{equation} f(xk)−f∗=O(k1)
则 { f ( x k ) } \{ f(x_k) \} {f(xk)} 以 O ( 1 k ) \mathcal{O}(\frac{1}{k}) O(k1) 收敛于 f ∗ f^{*} f∗
视频来源于B站大佬,以上为视频学习笔记