计算机应用数学--第三次作业

  • 第三次作业
    • 计算题
    • 编程题
    • 1 基于降维的机器学习
    • 2 深度学习训练方法总结

第三次作业

计算题

  1. (15 分)对于给定矩阵 A A A(规模为 4×2),求 A A A 的 SVD(奇异值分解),即求 U U U Σ Σ Σ V T V^T VT, 使得 A = U Σ V T A = UΣV^T A=UΣVT。其中 U T U = I U^TU = I UTU=I V T V = I V^TV = I VTV=I,给出求解过程。

A = [ 1 0 0 1 2 1 − 1 0 ] A= \begin{bmatrix} 1 & 0 \\ 0 &1 \\ 2 & 1 \\ -1 & 0 \end{bmatrix} A= 10210110

【解】

首先计算 A T A A^TA ATA
A T A = [ 6 2 2 2 ] A^TA= \begin{bmatrix} 6&2\\ 2&2\\ \end{bmatrix} ATA=[6222]
其特征值和单位特征向量分别为:
[ 2 ( 2 + 2 ) , 2 ( 2 − 2 ) ] \begin{bmatrix} 2 (2 + \sqrt{2}), 2 (2 - \sqrt{2}) \end{bmatrix} [2(2+2 ),2(22 )]

V = [ 1 + 2 1 + ( 1 + 2 ) 2 1 − 2 1 + ( 1 − 2 ) 2 1 1 + ( 1 + 2 ) 2 1 1 + ( 1 − 2 ) 2 ] V = \begin{bmatrix} \frac{1 + \sqrt{2}}{\sqrt{1 + (1 + \sqrt{2})^2}} & \frac{1 - \sqrt{2}}{\sqrt{1 + (1 - \sqrt{2})^2}}\\ \frac{1}{\sqrt{1 + (1 + \sqrt{2})^2}} & \frac{1}{\sqrt{1 + (1 - \sqrt{2})^2}}\\ \end{bmatrix} V= 1+(1+2 )2 1+2 1+(1+2 )2 11+(12 )2 12 1+(12 )2 1

奇异值 σ 1 = 2 ( 2 + 2 ) \sigma_1 = \sqrt{2 (2 + \sqrt{2})} σ1=2(2+2 ) σ 2 = 2 ( 2 − 2 ) \sigma_2 = \sqrt{2 (2 - \sqrt{2})} σ2=2(22 ) ,因此:
Σ = [ 2 ( 2 + 2 ) 0 0 2 ( 2 − 2 ) 0 0 0 0 ] Σ = \begin{bmatrix} \sqrt{2 (2 + \sqrt{2})} & 0\\ 0 & \sqrt{2 (2 - \sqrt{2})}\\ 0 & 0\\ 0 & 0\\ \end{bmatrix} Σ= 2(2+2 ) 00002(22 ) 00
U = ( u 1 , u 2 , u 3 , u 4 ) U = (u_1,u_2,u_3,u_4) U=(u1,u2,u3,u4),根据 A V = U Σ AV=U\Sigma AV=UΣ
U Σ = ( u 1 , u 2 , u 3 , u 4 ) [ 2 ( 2 + 2 ) 0 0 2 ( 2 − 2 ) 0 0 0 0 ] = ( 2 ( 2 + 2 ) u 1 , 2 ( 2 − 2 ) u 2 ) \begin{aligned} U\Sigma &= (u_1,u_2,u_3,u_4) \begin{bmatrix} \sqrt{2 (2 + \sqrt{2})} & 0\\ 0 & \sqrt{2 (2 - \sqrt{2})}\\ 0 & 0\\ 0 & 0\\ \end{bmatrix}\\ &=(\sqrt{2 (2 + \sqrt{2})}u_1,\sqrt{2 (2 - \sqrt{2})}u_2) \end{aligned} UΣ=(u1,u2,u3,u4) 2(2+2 ) 00002(22 ) 00 =(2(2+2 ) u1,2(22 ) u2)

A V = [ 1 + 2 2 ( 2 + 2 ) 1 − 2 4 − 2 2 1 2 ( 2 + 2 ) 1 4 − 2 2 3 + 2 2 2 ( 2 + 2 ) 3 − 2 2 4 − 2 2 − 1 + 2 2 ( 2 + 2 ) − 1 − 2 2 ( 2 − 2 ) ] AV = \begin{bmatrix}\frac{1 + \sqrt{2}}{\sqrt{2(2+\sqrt{2})}} & \frac{1 - \sqrt{2}}{\sqrt{4-2\sqrt{2}}}\\\frac{1}{\sqrt{2(2+\sqrt{2})}} & \frac{1}{\sqrt{4-2\sqrt{2}}}\\\frac{3 + 2\sqrt{2}}{\sqrt{2(2+\sqrt{2})}} & \frac{3 - 2\sqrt{2}}{\sqrt{4-2\sqrt{2}}}\\-\frac{1 + \sqrt{2}}{\sqrt{2(2+\sqrt{2})}} & -\frac{1 - \sqrt{2}}{\sqrt{2(2-\sqrt{2})}}\\\end{bmatrix} AV= 2(2+2 ) 1+2 2(2+2 ) 12(2+2 ) 3+22 2(2+2 ) 1+2 422 12 422 1422 322 2(22 ) 12

因此可得:
u 1 = 1 2 ( 2 + 2 ) [ 1 + 2 1 3 + 2 2 − 1 − 2 ] , u 2 = 1 2 ( 2 − 2 ) [ 1 − 2 1 3 − 2 2 − 1 + 2 ] u_1=\frac{1}{2(2+\sqrt{2})} \begin{bmatrix} 1+\sqrt{2}\\ 1\\ 3+2\sqrt{2}\\ -1-\sqrt{2} \end{bmatrix}, u_2=\frac{1}{2(2-\sqrt{2})} \begin{bmatrix} 1-\sqrt{2}\\ 1\\ 3-2\sqrt{2}\\ -1+\sqrt{2} \end{bmatrix} u1=2(2+2 )1 1+2 13+22 12 ,u2=2(22 )1 12 1322 1+2
扩展为标准正交基得:
U = [ 1 + 2 2 ( 2 + 2 ) 1 − 2 2 ( 2 − 2 ) 1 2 − 1 2 1 2 ( 2 + 2 ) 1 2 ( 2 − 2 ) 0 − 1 2 3 + 2 2 2 ( 2 + 2 ) 3 − 2 2 ( 2 − 2 ) 0 1 2 − 1 − 2 2 ( 2 + 2 ) 2 − 1 2 ( 2 − 2 ) 1 2 1 2 ] U = \begin{bmatrix}\frac{1 + \sqrt{2}}{2(2+\sqrt{2})} & \frac{1 - \sqrt{2}}{2(2-\sqrt{2})} & \frac{1}{\sqrt{2}} & -\frac{1}{2}\\\frac{1}{2(2+\sqrt{2})} & \frac{1}{2(2-\sqrt{2})} & 0 & -\frac{1}{2}\\\frac{3 + 2\sqrt{2}}{2(2+\sqrt{2})} & \frac{3 - \sqrt{2}}{2(2-\sqrt{2})} & 0 & \frac{1}{2}\\\frac{-1 - \sqrt{2}}{2(2+\sqrt{2})} & \frac{\sqrt{2} - 1}{2(2-\sqrt{2})} & \frac{1}{\sqrt{2}} & \frac{1}{2}\\\end{bmatrix} U= 2(2+2 )1+2 2(2+2 )12(2+2 )3+22 2(2+2 )12 2(22 )12 2(22 )12(22 )32 2(22 )2 12 1002 121212121
最终,
A = U Σ V T = [ 1 + 2 2 ( 2 + 2 ) 1 − 2 2 ( 2 − 2 ) 1 2 − 1 2 1 2 ( 2 + 2 ) 1 2 ( 2 − 2 ) 0 − 1 2 3 + 2 2 2 ( 2 + 2 ) 3 − 2 2 ( 2 − 2 ) 0 1 2 − 1 − 2 2 ( 2 + 2 ) 2 − 1 2 ( 2 − 2 ) 1 2 1 2 ] [ 2 ( 2 + 2 ) 0 0 2 ( 2 − 2 ) 0 0 0 0 ] [ 1 + 2 1 + ( 1 + 2 ) 2 1 1 + ( 1 + 2 ) 2 1 − 2 1 + ( 1 − 2 ) 2 1 1 + ( 1 − 2 ) 2 ] A=U\Sigma V^T=\begin{bmatrix}\frac{1 + \sqrt{2}}{2(2+\sqrt{2})} & \frac{1 - \sqrt{2}}{2(2-\sqrt{2})} & \frac{1}{\sqrt{2}} & -\frac{1}{2}\\\frac{1}{2(2+\sqrt{2})} & \frac{1}{2(2-\sqrt{2})} & 0 & -\frac{1}{2}\\\frac{3 + 2\sqrt{2}}{2(2+\sqrt{2})} & \frac{3 - \sqrt{2}}{2(2-\sqrt{2})} & 0 & \frac{1}{2}\\\frac{-1 - \sqrt{2}}{2(2+\sqrt{2})} & \frac{\sqrt{2} - 1}{2(2-\sqrt{2})} & \frac{1}{\sqrt{2}} & \frac{1}{2}\\\end{bmatrix} \begin{bmatrix} \sqrt{2 (2 + \sqrt{2})} & 0\\ 0 & \sqrt{2 (2 - \sqrt{2})}\\ 0 & 0\\ 0 & 0\\ \end{bmatrix} \begin{bmatrix} \frac{1 + \sqrt{2}}{\sqrt{1 + (1 + \sqrt{2})^2}} & \frac{1}{\sqrt{1 + (1 + \sqrt{2})^2}}\\ \frac{1 - \sqrt{2}}{\sqrt{1 + (1 - \sqrt{2})^2}} & \frac{1}{\sqrt{1 + (1 - \sqrt{2})^2}}\\ \end{bmatrix} A=UΣVT= 2(2+2 )1+2 2(2+2 )12(2+2 )3+22 2(2+2 )12 2(22 )12 2(22 )12(22 )32 2(22 )2 12 1002 121212121 2(2+2 ) 00002(22 ) 00 1+(1+2 )2 1+2 1+(12 )2 12 1+(1+2 )2 11+(12 )2 1

  1. (15 分)现有如下数据(5 个样本,4 个维度 ( A , B , C , D ) (A,B,C,D) (A,B,C,D)),⽤ P C A PCA PCA 将数据降到 2 维, 给出求解过程。
ABCD
1531
4-466
1432
4422
5524

【解】

X = [ 1 4 1 4 5 5 − 4 4 4 5 3 6 3 2 2 1 6 2 2 4 ] X = \begin{bmatrix} 1 & 4 & 1 & 4 & 5\\ 5 & -4 & 4 & 4 & 5\\ 3 & 6 & 3 & 2 & 2\\ 1 & 6 & 2 & 2 & 4 \end{bmatrix} X= 15314466143244225524
零均值化后的 X X X
[ − 2 1 − 2 1 2 2.2 − 6.8 1.2 1.2 2.2 − 0.2 2.8 − 0.2 − 1.2 − 1.2 − 2 3 − 1 − 1 1 ] \begin{bmatrix} -2 & 1 & -2 & 1 & 2 \\ 2.2 & -6.8 & 1.2 & 1.2 & 2.2\\ -0.2 & 2.8 & -0.2 & -1.2 & -1.2\\ -2 & 3 & -1 & -1 & 1 \end{bmatrix} 22.20.2216.82.8321.20.2111.21.2122.21.21
协方差矩阵 C = 1 5 X X T C=\frac{1}{5}XX^T C=51XXT
[ 2.8 − 1.6 0 2 − 1.6 11.76 − 4.76 − 5 0 − 4.76 2.16 1.8 2 − 5 1.8 3.2 ] \begin{bmatrix} 2.8 & -1.6 & 0 & 2 \\ -1.6 & 11.76 & -4.76 & -5 \\ 0 & -4.76 & 2.16 & 1.8 \\ 2 & -5 & 1.8 & 3.2 \end{bmatrix} 2.81.6021.611.764.76504.762.161.8251.83.2
其特征值及对应的特征向量(按行排列)为:
[ 16.27660036 3.30704892 0.30518298 0.03116774 ] \begin{bmatrix} 16.27660036 & 3.30704892 & 0.30518298 & 0.03116774 \end{bmatrix} [16.276600363.307048920.305182980.03116774]

[ 0.1582177 − 0.84232941 0.33404265 0.3922548 − 0.84205227 − 0.21992632 0.30154762 − 0.3894219 − 0.37915865 0.40046886 0.25782506 0.79334081 0.34950516 0.28589904 0.85499168 − 0.25514134 ] \begin{bmatrix} 0.1582177 &-0.84232941 & 0.33404265 & 0.3922548 \\-0.84205227 &-0.21992632 & 0.30154762 & -0.3894219 \\-0.37915865 & 0.40046886 & 0.25782506 & 0.79334081\\ 0.34950516 & 0.28589904 & 0.85499168 & -0.25514134\\ \end{bmatrix} 0.15821770.842052270.379158650.349505160.842329410.219926320.400468860.285899040.334042650.301547620.257825060.854991680.39225480.38942190.793340810.25514134

取前两行,即得到降维矩阵 P P P
P = [ 0.1582177 − 0.84232941 0.33404265 0.3922548 − 0.84205227 − 0.21992632 0.30154762 − 0.3894219 ] P=\begin{bmatrix} 0.1582177 & -0.84232941 & 0.33404265 & 0.3922548\\-0.84205227 & -0.21992632 & 0.30154762 & -0.3894219 \end{bmatrix} P=[0.15821770.842052270.842329410.219926320.334042650.301547620.39225480.3894219]
Y = P X Y=PX Y=PX 即为降维到 k k k 维后的数据(行代表属性、列代表样本):
[ − 3.02087823 7.99814153 − 1.78629402 − 1.64568358 − 1.5452857 1.91880091 0.32951437 1.74930533 − 1.0783991 − 2.9192215 ] \begin{bmatrix} -3.02087823 & 7.99814153 & -1.78629402& -1.64568358 & -1.5452857\\ 1.91880091 & 0.32951437 & 1.74930533 & -1.0783991 &-2.9192215\end{bmatrix} [3.020878231.918800917.998141530.329514371.786294021.749305331.645683581.07839911.54528572.9192215]

  1. (15 分)假设有六个样本在⼆维空间中:

    正样本:(1,1),(2,1),(2,3)

    负样本:(-1,-1),(0,-3),(-2,-4)

    使⽤⽀持向量机求划分平⾯,给出⽀持向量模型及求解过程。

【解】

根据数据, y 1 = y 2 = y 3 = 1 , y 4 = y 5 = y 6 = − 1 y_1=y_2=y_3=1,y_4=y_5=y_6=-1 y1=y2=y3=1,y4=y5=y6=1

w = ( w 1 , w 2 ) T w = (w_1,w_2)^T w=(w1,w2)T,则问题可以表示为以下约束最优化问题:
minimize w , b 1 2 ( w 1 2 + w 2 2 ) subject to { w 1 + w 2 + b ≥ 1 2 w 1 + w 2 + b ≥ 1 2 w 1 + 3 w 2 + b ≥ 1 − ( − w 1 − w 2 + b ) ≥ 1 − ( 0 w 1 − 3 w 2 + b ) ≥ 1 − ( − 2 w 1 − 4 w 2 + b ) ≥ 1 \begin{aligned} & \underset{w, b}{\text{minimize}} && \frac{1}{2}(w_1^2 + w_2^2) \\ & \text{subject to} && \begin{cases} w_1 + w_2 + b \geq 1 \\ 2w_1 + w_2 + b \geq 1 \\ 2w_1 + 3w_2 + b \geq 1 \\ -(-w_1 - w_2 + b) \geq 1 \\ -(0w_1 - 3w_2 + b) \geq 1 \\ -(-2w_1 - 4w_2 + b) \geq 1 \\ \end{cases} \end{aligned} w,bminimizesubject to21(w12+w22) w1+w2+b12w1+w2+b12w1+3w2+b1(w1w2+b)1(0w13w2+b)1(2w14w2+b)1
解得:
{ w 1 = 1 2 w 2 = 1 2 b = 0 \begin{cases} w_1 = \frac{1}{2} \\ w_2 = \frac{1}{2} \\ b=0 \end{cases} w1=21w2=21b=0
则最大间隔分离超平面为:
1 2 x ( 1 ) + 1 2 x ( 2 ) = 0 \frac{1}{2}x^{(1)}+\frac{1}{2}x^{(2)}=0 21x(1)+21x(2)=0
其中 x 1 = ( 1 , 1 ) T x_1=(1,1)^T x1=(1,1)T x 2 = ( − 1 , − 1 ) T x_2=(-1,-1)^T x2=(1,1)T 为支持向量。

  1. (15 分)给定⼀个 4 ∗ 3 ∗ 2 4*3*2 432 的神经⽹络,权重矩阵为:

W 1 = [ 0.10 0.40 0.35 0.15 0.20 0.25 0.05 0.35 0.40 0.20 0.25 0.15 ] W_1= \begin{bmatrix} 0.10 & 0.40 & 0.35 \\ 0.15 & 0.20 & 0.25 \\ 0.05 & 0.35 & 0.40 \\ 0.20 & 0.25 & 0.15 \end{bmatrix} W1= 0.100.150.050.200.400.200.350.250.350.250.400.15

b 1 = [ 0.15 , 0.10 , 0.25 ] b_1=\begin{bmatrix}0.15 , 0.10 , 0.25 \end{bmatrix} b1=[0.15,0.10,0.25]

W 2 = [ 0.20 0.40 0.35 0.15 0.30 0.50 ] W_2= \begin{bmatrix} 0.20 & 0.40 \\ 0.35 & 0.15 \\ 0.30 & 0.50 \end{bmatrix} W2= 0.200.350.300.400.150.50

b 2 = [ 0.30 , 0.20 ] b_2 = \begin{bmatrix} 0.30, 0.20 \end{bmatrix} b2=[0.30,0.20]

激活函数为 sigmoid 函数。给定⼀个训练样本 x = [ 0.80 , 0.55 , 0.20 , 0.10 ] x=[0.80,0.55,0.20,0.10] x=[0.80,0.55,0.20,0.10] y = [ 1 , 0 ] y=[1,0] y=[1,0],假设学习率为 0.01,损失函数为⼆元交叉熵损失,计算 w 10 , w 13 w_{10},w_{13} w10,w13 参数的⼀次更新结果。

image-20240527195411878

【解】

(1)前向传播

a. 输入层到隐层

根据权重矩阵 W 1 W_1 W1 和偏置 b 1 b_1 b1 ,隐层的输入为:
a 1 = W 1 T x 1 + b 1 = [ 0.10 0.15 0.05 0.20 0.40 0.20 0.35 0.25 0.35 0.25 0.40 0.15 ] [ 0.80 0.55 0.20 0.10 ] + [ 0.15 0.10 0.25 ] = [ 0.3425 0.625 0.7625 ] a_1 = W_1^Tx_1+b_1=\begin{bmatrix} 0.10 & 0.15 & 0.05 & 0.20 \\ 0.40 & 0.20 & 0.35 & 0.25 \\ 0.35 & 0.25 & 0.40 & 0.15 \end{bmatrix}\begin{bmatrix}0.80 \\ 0.55 \\ 0.20 \\ 0.10 \end{bmatrix} + \begin{bmatrix}0.15 \\ 0.10 \\ 0.25 \end{bmatrix} = \begin{bmatrix}0.3425 \\ 0.625 \\ 0.7625 \end{bmatrix} a1=W1Tx1+b1= 0.100.400.350.150.200.250.050.350.400.200.250.15 0.800.550.200.10 + 0.150.100.25 = 0.34250.6250.7625
在隐层经过 sigmoid 函数 σ ( a ) = 1 1 + e − a \sigma(a)=\frac{1}{1+e^{-a}} σ(a)=1+ea1 激活后:
h 1 = σ ( a 1 ) = [ σ ( 0.3425 ) σ ( 0.625 ) σ ( 0.7625 ) ] ≈ [ 0.5848 0.6514 0.6819 ] h_1=\sigma(a_1)=\begin{bmatrix} \sigma(0.3425) \\ \sigma(0.625) \\ \sigma(0.7625) \end{bmatrix} \approx \begin{bmatrix} 0.5848 \\ 0.6514 \\ 0.6819 \end{bmatrix} h1=σ(a1)= σ(0.3425)σ(0.625)σ(0.7625) 0.58480.65140.6819
b. 隐藏层到输出层

同理,输出层的输入为:
a 2 = W 2 T h 1 + b 2 = [ 0.20 0.35 0.30 0.40 0.15 0.50 ] [ 0.5848 0.6514 0.6819 ] + [ 0.30 0.20 ] = [ 0.8495 0.8726 ] a_2 = W_2^Th_1+b_2=\begin{bmatrix} 0.20 & 0.35 & 0.30 \\ 0.40 & 0.15 & 0.50 \end{bmatrix}\begin{bmatrix}0.5848 \\ 0.6514 \\ 0.6819 \end{bmatrix} + \begin{bmatrix}0.30 \\ 0.20 \end{bmatrix} = \begin{bmatrix} 0.8495 \\ 0.8726 \end{bmatrix} a2=W2Th1+b2=[0.200.400.350.150.300.50] 0.58480.65140.6819 +[0.300.20]=[0.84950.8726]
输出层经过 sigmoid 激活函数后:
h 2 = σ ( a 2 ) = [ σ ( 0.8495 ) σ ( 0.8726 ) ] ≈ [ 0.7005 0.7053 ] h_2=\sigma(a_2)=\begin{bmatrix} \sigma(0.8495) \\ \sigma(0.8726) \end{bmatrix} \approx \begin{bmatrix} 0.7005 \\ 0.7053 \end{bmatrix} h2=σ(a2)=[σ(0.8495)σ(0.8726)][0.70050.7053]
(2)计算损失

使用二元交叉熵损失,
L ( y , h 2 ) = − 1 N ∑ i = 1 N [ y i ⋅ l o g ⁡ ( h 2 i ) + ( 1 − y i ) ⋅ l o g ⁡ ( 1 − h 2 i ) ] = − 1 2 [ y 1 ⋅ l o g ( h 21 ) + ( 1 − y 1 ) ⋅ l o g ( 1 − h 21 ) + y 2 ⋅ l o g ( h 22 ) + ( 1 − y 2 ) ⋅ l o g ( 1 − h 22 ) ] = − 1 2 [ ( 1 ⋅ l o g ( 0.7005 ) + ( 1 − 1 ) ⋅ l o g ( 1 − 0.7005 ) + 0 ⋅ l o g ( 0.7053 ) + ( 1 − 0 ) ⋅ l o g ( 1 − 0.7053 ) ) ] ≈ 0.7889 \begin{aligned} L(y,h_2) &=−\frac{1}{N}\sum_{i=1}^N[y_i⋅log⁡( h_{2i})+(1−y_i)⋅log⁡(1− h_{2i})] \\ &= −\frac{1}{2}[y_1 \cdot log( h_{21})+(1−y_1)\cdot log(1− h_{21})+y_2\cdot log( h_{22})+(1−y_2)\cdot log(1− h_{22})] \\ &= −\frac{1}{2}[(1\cdot log(0.7005)+(1−1) \cdot log(1−0.7005)+0\cdot log(0.7053)+(1−0) \cdot log(1−0.7053))] \\ &\approx 0.7889 \end{aligned} L(y,h2)=N1i=1N[yilog(h2i)+(1yi)log(1h2i)]=21[y1log(h21)+(1y1)log(1h21)+y2log(h22)+(1y2)log(1h22)]=21[(1log(0.7005)+(11)log(10.7005)+0log(0.7053)+(10)log(10.7053))]0.7889
其中, y i y_i yi 是第 i i i 个样本的真实值, h 2 i h_{2i} h2i 是模型对第 i i i 个样本的预测值。

(3)反向传播

w 13 w_{13} w13
∂ L ∂ w 13 = ∂ L ∂ h 21 ⋅ ∂ h 21 ∂ a 21 ⋅ ∂ a 21 ∂ w 13 \frac{\partial L}{\partial w_{13}} = \frac{\partial L}{\partial h_{21}} \cdot \frac{\partial h_{21}}{\partial a_{21}} \cdot \frac{\partial a_{21}}{\partial w_{13}} w13L=h21La21h21w13a21
对于 y 1 = 1 y_1=1 y1=1
∂ L ∂ h 21 = − ( y 1 h 21 − 1 − y 1 1 − h 21 ) = − 1 h 21 ≈ − 1.4276 \frac{\partial L}{\partial h_{21}}=-(\frac{y_1}{ h_{21}}-\frac{1-y_1}{1- h_{21}})=-\frac{1}{ h_{21}}\approx -1.4276 h21L=(h21y11h211y1)=h2111.4276
由于 h 21 = σ ( a 21 ) = 1 1 + e − a 21 h_{21} = \sigma(a_{21})=\frac{1}{1+e^{-a_{21}}} h21=σ(a21)=1+ea211
∂ h 21 ∂ a 21 = h 21 ( 1 − h 21 ) = 0.7005 × 0.2995 ≈ 0.2098 \frac{\partial h_{21}}{\partial a_{21}}= h_{21}(1- h_{21})=0.7005 \times 0.2995 \approx 0.2098 a21h21=h21(1h21)=0.7005×0.29950.2098

a 2 = ( w 13 w 15 w 17 ) T ⋅ h 1 + b 2 a_{2}= \begin{pmatrix} w_{13} \\ w_{15} \\ w_{17} \end{pmatrix}^T \cdot h_1 + b_{2} a2= w13w15w17 Th1+b2
因此,
∂ a 21 ∂ w 13 = h 11 = 0.5848 \frac{\partial a_{21}}{\partial w_{13}} = h_{11}=0.5848 w13a21=h11=0.5848
更新 w 13 w_{13} w13 的值为:
w 13 = w 13 − η ⋅ ∂ L ∂ w 13 = 0.2 − 0.01 × ( − 1.4276 ) × 0.2098 × 0.5848 ≈ 0.2018 w_{13} = w_{13}-\eta \cdot \frac{\partial L}{\partial w_{13}} =0.2-0.01 \times (-1.4276)\times 0.2098 \times 0.5848 \approx 0.2018 w13=w13ηw13L=0.20.01×(1.4276)×0.2098×0.58480.2018
同理可得, w 14 ≈ 0.3959 w_{14}\approx 0.3959 w140.3959

w 10 w_{10} w10
∂ L ∂ w 10 = ∂ L ∂ h 11 ⋅ ∂ h 11 ∂ a 11 ⋅ ∂ a 11 ∂ w 10 = ∂ L ∂ h 11 ⋅ [ h 11 ⋅ ( 1 − h 11 ) ] ⋅ x 4 = ∂ L ∂ h 11 ⋅ 0.2428 ⋅ 0.1 \frac{\partial L}{\partial w_{10}} = \frac{\partial L}{\partial h_{11}} \cdot \frac{\partial h_{11}}{\partial a_{11}} \cdot \frac{\partial a_{11}}{\partial w_{10}} = \frac{\partial L}{\partial h_{11}} \cdot [h_{11} \cdot (1-h_{11})]\cdot x_4 = \frac{\partial L}{\partial h_{11}} \cdot 0.2428 \cdot 0.1 w10L=h11La11h11w10a11=h11L[h11(1h11)]x4=h11L0.24280.1
其中, h 11 h_{11} h11 会接受 a 21 a_{21} a21 a 22 a_{22} a22 两个地方传来的误差,
∂ L ∂ h 11 = ∂ L ∂ a 21 ⋅ ∂ a 21 ∂ h 11 + ∂ L ∂ a 22 ⋅ ∂ a 22 ∂ h 11 = − 1 a 21 ⋅ w 13 + [ ( − 1 a 22 ) ⋅ w 14 ] ≈ − 0.8494 \frac{\partial L}{\partial h_{11}} = \frac{\partial L}{\partial a_{21}} \cdot \frac{\partial a_{21}}{\partial h_{11}} + \frac{\partial L}{\partial a_{22}} \cdot \frac{\partial a_{22}}{\partial h_{11}} = -\frac{1}{a_{21}}\cdot w_{13} + [(-\frac{1}{a_{22}})\cdot w_{14}] \approx - 0.8494 h11L=a21Lh11a21+a22Lh11a22=a211w13+[(a221)w14]0.8494
更新 w 10 w_{10} w10 的值为:
w 10 = w 10 − η ⋅ ∂ L ∂ w 10 = 0.1999 w_{10} = w_{10}-\eta \cdot \frac{\partial L}{\partial w_{10}} = 0.1999 w10=w10ηw10L=0.1999

编程题

1 基于降维的机器学习

数据集:kddcup99_train.csv, kddcup99_test.csv

数据描述:https://www.kdd.org/kdd-cup/view/kdd-cup-1999/Data,⽬的是通过42维的数 据判断某⼀个数据包是否为 attack。(注意,数据中有多种类型的 attack,我们将他们统⼀ 认为是attack,不做具体区分,只区分normal和attack,可以预处理的时候把所有的attack 都统⼀ re-label,具体 attack 列表可参考 https://kdd.org/cupfiles/KDDCupData/1999/training_attack_types)。

任务描述:选择 SVM,结合降维⽅法 PCALDA,实现数据降维+分类。

要求输出:不同⽅法降低到不同维度对判别结果的影响(提升还是下降),Plot 出两种降维 ⽅法降到 3 维之后的结果(Normal 和 Attack 的 sample ⽤不同颜⾊)

import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder, StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.decomposition import PCA
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

# 创建列名列表, 数据集有 42 列(包括 41 个特征列和 1 个目标列)
column_names = [
    'duration', 'protocol_type', 'service', 'flag', 'src_bytes',
    'dst_bytes', 'land', 'wrong_fragment', 'urgent', 'hot',
    'num_failed_logins', 'logged_in', 'num_compromised', 'root_shell',
    'su_attempted', 'num_root', 'num_file_creations', 'num_shells',
    'num_access_files', 'num_outbound_cmds', 'is_host_login',
    'is_guest_login', 'count', 'srv_count', 'serror_rate',
    'srv_serror_rate', 'rerror_rate', 'srv_rerror_rate', 'same_srv_rate',
    'diff_srv_rate', 'srv_diff_host_rate', 'dst_host_count',
    'dst_host_srv_count', 'dst_host_same_srv_rate', 'dst_host_diff_srv_rate',
    'dst_host_same_src_port_rate', 'dst_host_srv_diff_host_rate',
    'dst_host_serror_rate', 'dst_host_srv_serror_rate', 'dst_host_rerror_rate',
    'dst_host_srv_rerror_rate', 'label'
]

# 加载数据, 跳过错误行
train = pd.read_csv('kddcup99_train.csv', names=column_names, on_bad_lines='skip')
test = pd.read_csv('kddcup99_test.csv', names=column_names, on_bad_lines='skip')

# 将训练集和测试集各缩小一百倍
train = train.sample(frac=0.01)
test = test.sample(frac=0.01)

print("1.数据加载和预处理完成")

# 对文本列进行编码
categorical_columns = ['protocol_type', 'service', 'flag']
for column in categorical_columns:
    combined_data = pd.concat([train[column], test[column]], axis=0)
    le = LabelEncoder()
    le.fit(combined_data)
    train[column] = le.transform(train[column])
    test[column] = le.transform(test[column])

X_train = train.drop(columns=['label'])
y_train = train['label']
y_train_relabel = y_train.apply(lambda x: 1 if x != 'normal.' else 0)
X_test = test.drop(columns=['label'])
y_test = test['label']
y_test_relabel = y_test.apply(lambda x: 1 if x != 'normal.' else 0)

print("2.特征处理完成")

# 定义降维函数
def reduce_dimension(reduction_method, X_train, y_train, X_test, n_components):
    if reduction_method == 'PCA':
        reducer = PCA(n_components=n_components)
    elif reduction_method == 'LDA':
        reducer = LDA(n_components=n_components)
    
    X_train_reduced = reducer.fit_transform(X_train, y_train)
    X_test_reduced = reducer.transform(X_test)
    
    return X_train_reduced, X_test_reduced

# 定义分类函数
def classify(X_train, y_train, X_test, y_test):
    clf = SVC()
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    
    accuracy = accuracy_score(y_test, y_pred)
    return accuracy, y_pred

# 降维和分类
results = {'PCA': [], 'LDA': []}

for dim in range(3,11):
    X_train_pca, X_test_pca = reduce_dimension('PCA', X_train, y_train, X_test, dim)
    accuracy_pca, _ = classify(X_train_pca, y_train_relabel, X_test_pca, y_test_relabel)
    results['PCA'].append((dim, accuracy_pca))
    
    X_train_lda, X_test_lda = reduce_dimension('LDA', X_train, y_train, X_test, dim)
    accuracy_lda, _ = classify(X_train_lda, y_train_relabel, X_test_lda, y_test_relabel)
    results['LDA'].append((dim, accuracy_lda))

print("PCA Results:")
print(f"Dimensions\tAccuracy")
for dim, acc in results['PCA']:
    print(dim,"\t", acc)

print("\nLDA Results:")
print(f"Dimensions\tAccuracy")
for dim, acc in results['LDA']:
    print(dim,"\t", acc)

# 可视化降到3维的结果
X_train_pca_3d, _ = reduce_dimension('PCA', X_train, y_train, X_test, 3)
X_train_lda_3d, _ = reduce_dimension('LDA', X_train, y_train, X_test, 3)

def plot_3d(X, y, title):
    fig = plt.figure()
    ax = fig.add_subplot(111, projection='3d')
    ax.set_title(title)
    
    colors = {0: 'b', 1: 'r'}  # 0: normal, 1: attack
    for label in np.unique(y):
        ax.scatter(X[y == label, 0], X[y == label, 1], X[y == label, 2], c=colors[label], label=label, s=20)
    
    ax.legend()
    plt.show()

plot_3d(X_train_pca_3d, y_train_relabel, "PCA reduced to 3 dimensions")
plot_3d(X_train_lda_3d, y_train_relabel, "LDA reduced to 3 dimensions")

Figure_1

Figure_2

result

2 深度学习训练方法总结

数据集:kddcup99_Train.csv, kddcup99_Test.csv ,数据的每一行有 42 列,根据前 41 列的值去预测第42列是否为 attack。

任务描述:实现⼀个简单的神经网络模型判别数据包是否为attack,网络层数不小于5层,,例如 41->36->24->12->6->1。

要求至少 2 种激活函数,至少 2 种 parameter initialization 方法,至少 2 种训练方法(SGD,SGD+Momentom,Adam)。

训练模型并判断训练结果。

要求输出:

1)模型描述,层数,每⼀层参数,激活函数选择,loss 函数设置等;

2)针对不同方法组合(至少 2 个组合),plot 出随着 epoch 增长 training error 和 test error 的变化情况。

import torch
import torch.nn as nn
import torch.optim as optim
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import StandardScaler
from colorama import Fore, Style

# Load data
train_data = pd.read_csv('kddcup99_train.csv', header= None, on_bad_lines='skip')
test_data = pd.read_csv('kddcup99_test.csv', header= None, on_bad_lines='skip')

# 将训练集和测试集各缩小一百倍
train_data = train_data.sample(frac=0.01)
test_data = test_data.sample(frac=0.01)

# 对文本列进行编码
categorical_columns = train_data.columns[1:4].tolist()
for column in categorical_columns:
    combined_data = pd.concat([train_data[column], test_data[column]], axis=0)
    le = LabelEncoder()
    le.fit(combined_data)
    train_data[column] = le.transform(train_data[column])
    test_data[column] = le.transform(test_data[column])


X_train = train_data.iloc[:, :-1]
y_train = train_data.iloc[:, -1]
y_train_relabel = y_train.apply(lambda x: 1 if x != 'normal.' else 0)
X_test = test_data.iloc[:, :-1]
y_test = test_data.iloc[:, -1]
y_test_relabel = y_test.apply(lambda x: 1 if x != 'normal.' else 0)

#  X_train  X_test normalization 
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

# Convert to tensors
X_train = torch.tensor(X_train, dtype=torch.float)
y_train_relabel = torch.tensor(y_train_relabel.values, dtype=torch.float)
X_test = torch.tensor(X_test, dtype=torch.float)
y_test_relabel = torch.tensor(y_test_relabel.values, dtype=torch.float)

# Define the network
class Net(nn.Module):
    def __init__(self, activation):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(41, 36)
        self.fc2 = nn.Linear(36, 24)
        self.fc3 = nn.Linear(24, 12)
        self.fc4 = nn.Linear(12, 6)
        self.fc5 = nn.Linear(6, 1)
        
        if activation == 'relu':
            self.activation = nn.ReLU()
        elif activation == 'sigmoid':
            self.activation = nn.Sigmoid()

    def forward(self, x):
        x = self.activation(self.fc1(x))
        x = self.activation(self.fc2(x))
        x = self.activation(self.fc3(x))
        x = self.activation(self.fc4(x))
        x = torch.sigmoid(self.fc5(x))  # final activation is sigmoid for binary classification
        return x

# Training function
def train(net, X_train, y_train, X_test, y_test, optimizer, criterion, epochs=100):
    train_errors = []
    test_errors = []
    for epoch in range(epochs):
        optimizer.zero_grad()
        output = net(X_train).squeeze()  # squeeze the output to remove the extra dimension
        train_loss = criterion(output, y_train)
        train_loss.backward()
        optimizer.step()

        # Record errors
        train_error = train_loss.item()
        test_output = net(X_test).squeeze()  # squeeze the output here too
        test_loss = criterion(test_output, y_test)
        test_error = test_loss.item()
        train_errors.append(train_error)
        test_errors.append(test_error)
        
        if epoch % 10 == 0:
            print(Fore.GREEN + 'epoch: [{:3d}/{}], '.format(epoch, epochs) + Fore.BLUE + 'train step: {}, '.format(epoch) + Fore.RED + 'train_loss: {:.5f}, '.format(train_error) + Fore.YELLOW + 'test_loss: {:.5f}'.format(test_error))

    print(Style.RESET_ALL)  # Reset the color to default
    return train_errors, test_errors

# Initialize network with ReLU activation, Xavier initialization, and SGD optimizer
net = Net('sigmoid')
nn.init.xavier_uniform_(net.fc1.weight)
nn.init.xavier_uniform_(net.fc2.weight)
nn.init.xavier_uniform_(net.fc3.weight)
nn.init.xavier_uniform_(net.fc4.weight)
nn.init.xavier_uniform_(net.fc5.weight)
optimizer = optim.SGD(net.parameters(), lr=0.01)
criterion = nn.BCELoss()

# Train and plot errors
train_errors, test_errors = train(net, X_train, y_train_relabel, X_test, y_test_relabel, optimizer, criterion)

print("Model structure: ", net, "\n\n")

for name, param in net.named_parameters():
    print(f"Layer: {name} | Size: {param.size()} | Values : {param[:2]} \n")

plt.title('Sigmoid Activation, Xavier Initialization, SGD Optimizer')
plt.plot(train_errors, label='Train Error')
plt.plot(test_errors, label='Test Error')
plt.legend()
plt.show()

# Initialize network with Sigmoid activation, Kaiming initialization, and Adam optimizer
net = Net('relu')
nn.init.kaiming_uniform_(net.fc1.weight)
nn.init.kaiming_uniform_(net.fc2.weight)
nn.init.kaiming_uniform_(net.fc3.weight)
nn.init.kaiming_uniform_(net.fc4.weight)
nn.init.kaiming_uniform_(net.fc5.weight)
optimizer = optim.Adam(net.parameters(), lr=0.01)


# Train and plot errors
train_errors, test_errors = train(net, X_train, y_train_relabel, X_test, y_test_relabel, optimizer, criterion)

print("Model structure: ", net, "\n\n")

for name, param in net.named_parameters():
    print(f"Layer: {name} | Size: {param.size()} | Values : {param[:2]} \n")

plt.title('ReLU Activation, Kaiming Initialization, Adam Optimizer')
plt.plot(train_errors, label='Train Error')
plt.plot(test_errors, label='Test Error')
plt.legend()
plt.show()

输出:

Figure_1

Figure_2

epoch: [  0/100], train step: 0, train_loss: 0.83076, test_loss: 0.82476
epoch: [ 10/100], train step: 10, train_loss: 0.77907, test_loss: 0.77409
epoch: [ 20/100], train step: 20, train_loss: 0.73555, test_loss: 0.73146
epoch: [ 30/100], train step: 30, train_loss: 0.69899, test_loss: 0.69566
epoch: [ 40/100], train step: 40, train_loss: 0.66831, test_loss: 0.66563
epoch: [ 50/100], train step: 50, train_loss: 0.64258, test_loss: 0.64046
epoch: [ 60/100], train step: 60, train_loss: 0.62099, test_loss: 0.61933
epoch: [ 70/100], train step: 70, train_loss: 0.60285, test_loss: 0.60160
epoch: [ 80/100], train step: 80, train_loss: 0.58759, test_loss: 0.58668
epoch: [ 90/100], train step: 90, train_loss: 0.57472, test_loss: 0.57411

Model structure:  Net(
  (fc1): Linear(in_features=41, out_features=36, bias=True)
  (fc2): Linear(in_features=36, out_features=24, bias=True)
  (fc3): Linear(in_features=24, out_features=12, bias=True)
  (fc4): Linear(in_features=12, out_features=6, bias=True)
  (fc5): Linear(in_features=6, out_features=1, bias=True)
  (activation): Sigmoid()
)


Layer: fc1.weight | Size: torch.Size([36, 41]) | Values : tensor([[-0.1204,  0.1695, -0.0945, -0.1015, -0.0338, -0.0430, -0.2108, -0.0487,
          0.2591,  0.0264, -0.2337,  0.1409, -0.2239,  0.1533,  0.1117, -0.1207,
          0.1573,  0.2463, -0.1667,  0.2229,  0.2368,  0.0499, -0.0194,  0.1738,
          0.0057,  0.2606, -0.1390, -0.2654, -0.1701,  0.0092, -0.0232, -0.2021,
         -0.1327, -0.0356,  0.2128,  0.0905, -0.2579,  0.0700, -0.2245,  0.1827,
          0.0351],
        [-0.0199, -0.0828,  0.1850,  0.2518,  0.1650, -0.0836, -0.0384,  0.0219,
          0.0745,  0.0179, -0.1767, -0.2702, -0.1961,  0.2151,  0.1896, -0.0962,
         -0.2253,  0.2759, -0.0934, -0.0413, -0.1034,  0.2449,  0.2524, -0.0120,
          0.1045,  0.0078,  0.1084, -0.0260, -0.2767, -0.1596, -0.1660,  0.1970,
          0.2388,  0.0917, -0.1074,  0.1396,  0.2334, -0.2057, -0.0928,  0.0844,
          0.2253]], grad_fn=<SliceBackward0>)

Layer: fc1.bias | Size: torch.Size([36]) | Values : tensor([-0.0125, -0.0647], grad_fn=<SliceBackward0>)       

Layer: fc2.weight | Size: torch.Size([24, 36]) | Values : tensor([[ 0.1901,  0.3076,  0.2780, -0.1221,  0.0865, -0.1958, -0.1591,  0.2710,
         -0.0584,  0.0713,  0.1986,  0.0998,  0.0596,  0.1909,  0.1504,  0.1529,
         -0.0607,  0.0267, -0.2466, -0.0994,  0.3090,  0.0109,  0.1064, -0.2247,
          0.0065, -0.2745, -0.2972,  0.1079, -0.2792, -0.2592, -0.0671,  0.1319,
          0.1867, -0.2757, -0.0122,  0.3061],
        [-0.2875,  0.0513, -0.2721,  0.1229,  0.1980, -0.2237,  0.0273,  0.0966,
         -0.1452, -0.1988, -0.0604, -0.0344, -0.2188,  0.1212, -0.1144, -0.2195,
          0.0199,  0.1941,  0.0436, -0.0758,  0.2529,  0.0526,  0.1667,  0.0497,
          0.1492, -0.0510,  0.1750,  0.1225, -0.1053,  0.1485,  0.2058,  0.2799,
         -0.0877, -0.0516, -0.1451, -0.0730]], grad_fn=<SliceBackward0>)

Layer: fc2.bias | Size: torch.Size([24]) | Values : tensor([ 0.0214, -0.1065], grad_fn=<SliceBackward0>)       

Layer: fc3.weight | Size: torch.Size([12, 24]) | Values : tensor([[ 0.0663, -0.1093,  0.1332,  0.0015,  0.0130, -0.0853, -0.1504,  0.4016,
          0.1938, -0.3305,  0.0805, -0.4046,  0.1531,  0.2350,  0.2465, -0.3746,
          0.1722, -0.0767,  0.2982,  0.3774,  0.3722, -0.0244,  0.3968,  0.0536],
        [-0.3444,  0.3722,  0.1660,  0.1776, -0.3715, -0.3797, -0.3049, -0.3762,
          0.3245,  0.4026, -0.0206,  0.0601,  0.3001, -0.2587,  0.3230,  0.1495,
          0.1577,  0.1810, -0.1501,  0.3237, -0.0502,  0.2333, -0.2059, -0.1671]],
       grad_fn=<SliceBackward0>)

Layer: fc3.bias | Size: torch.Size([12]) | Values : tensor([ 0.0377, -0.0293], grad_fn=<SliceBackward0>)       

Layer: fc4.weight | Size: torch.Size([6, 12]) | Values : tensor([[ 0.4014,  0.2608, -0.0577,  0.4244,  0.1046,  0.4315,  0.3025, -0.1401,
         -0.3695,  0.3276,  0.4545, -0.5888],
        [ 0.5549, -0.1166, -0.2757, -0.0290,  0.2277, -0.4197, -0.3324, -0.4567,
         -0.2027, -0.2449,  0.3914,  0.4241]], grad_fn=<SliceBackward0>)

Layer: fc4.bias | Size: torch.Size([6]) | Values : tensor([0.0023, 0.1753], grad_fn=<SliceBackward0>)

Layer: fc5.weight | Size: torch.Size([1, 6]) | Values : tensor([[-0.5094,  0.8143,  0.8610,  0.4556, -0.2517, -0.0298]],
       grad_fn=<SliceBackward0>)

Layer: fc5.bias | Size: torch.Size([1]) | Values : tensor([-0.0002], grad_fn=<SliceBackward0>)

epoch: [  0/100], train step: 0, train_loss: 0.77802, test_loss: 0.59542
epoch: [ 10/100], train step: 10, train_loss: 0.05965, test_loss: 0.04284
epoch: [ 20/100], train step: 20, train_loss: 0.00630, test_loss: 0.00698
epoch: [ 30/100], train step: 30, train_loss: 0.00557, test_loss: 0.00664
epoch: [ 40/100], train step: 40, train_loss: 0.00429, test_loss: 0.00585
epoch: [ 50/100], train step: 50, train_loss: 0.00323, test_loss: 0.00419
epoch: [ 60/100], train step: 60, train_loss: 0.00241, test_loss: 0.00331
epoch: [ 70/100], train step: 70, train_loss: 0.00185, test_loss: 0.00287
epoch: [ 80/100], train step: 80, train_loss: 0.00154, test_loss: 0.00289
epoch: [ 90/100], train step: 90, train_loss: 0.00137, test_loss: 0.00280

Model structure:  Net(
  (fc1): Linear(in_features=41, out_features=36, bias=True)
  (fc2): Linear(in_features=36, out_features=24, bias=True)
  (fc3): Linear(in_features=24, out_features=12, bias=True)
  (fc4): Linear(in_features=12, out_features=6, bias=True)
  (fc5): Linear(in_features=6, out_features=1, bias=True)
  (activation): ReLU()
)


Layer: fc1.weight | Size: torch.Size([36, 41]) | Values : tensor([[-0.2223,  0.2652,  0.2620, -0.1737,  0.3073, -0.0071, -0.1844, -0.0910,
         -0.3330, -0.3644,  0.1672,  0.1811, -0.1323,  0.1138,  0.2474,  0.1190,
          0.0716,  0.2176, -0.2005, -0.2051,  0.2049, -0.2390,  0.1245, -0.3129,
          0.2679,  0.2085, -0.2465,  0.3374, -0.3995, -0.1678,  0.0991,  0.2571,
         -0.2904, -0.3289,  0.1882, -0.2960, -0.2482,  0.2063,  0.4326,  0.0245,
          0.3398],
        [-0.0233,  0.4540, -0.2002, -0.1890,  0.3571, -0.0085,  0.1579,  0.3280,
          0.0949,  0.1917,  0.2366,  0.2644,  0.1930, -0.4022,  0.2962, -0.1272,
         -0.1952, -0.1286, -0.2101,  0.2916,  0.2925,  0.4614, -0.2036, -0.1995,
          0.0267, -0.1885,  0.1940,  0.2943, -0.0280, -0.0546,  0.3826,  0.0526,
          0.1689,  0.5428, -0.2704, -0.1747, -0.0180, -0.2829,  0.2023,  0.1309,
          0.0611]], grad_fn=<SliceBackward0>)

Layer: fc1.bias | Size: torch.Size([36]) | Values : tensor([-0.0362,  0.0334], grad_fn=<SliceBackward0>)       

Layer: fc2.weight | Size: torch.Size([24, 36]) | Values : tensor([[ 0.2063,  0.1689,  0.1686, -0.0299, -0.1831,  0.0941,  0.2320, -0.3211,
          0.1923,  0.0230,  0.4319, -0.1799,  0.1243, -0.3534, -0.1607,  0.0783,
          0.1387, -0.3061,  0.3025, -0.2749,  0.1129,  0.1131,  0.0264,  0.2124,
          0.0571,  0.1330,  0.2580,  0.1441, -0.2470,  0.1353, -0.2976,  0.1211,
         -0.4715, -0.3144,  0.3093,  0.0863],
        [-0.1541,  0.3662, -0.0299,  0.0588,  0.4050,  0.1201,  0.0695, -0.1545,
          0.3148, -0.0187, -0.0606, -0.2249,  0.2707, -0.3788,  0.3024,  0.1695,
          0.1224,  0.0702, -0.3575,  0.3632, -0.4167,  0.0654, -0.4184, -0.0808,
          0.1539,  0.3601,  0.2400, -0.1469,  0.0635,  0.4880, -0.2289,  0.2739,
         -0.3789,  0.0521,  0.1561,  0.1686]], grad_fn=<SliceBackward0>)

Layer: fc2.bias | Size: torch.Size([24]) | Values : tensor([-0.1722, -0.1973], grad_fn=<SliceBackward0>)       

Layer: fc3.weight | Size: torch.Size([12, 24]) | Values : tensor([[-0.3522,  0.3949, -0.4170, -0.0109, -0.0345, -0.0263,  0.1290,  0.2439,
         -0.4594, -0.0283, -0.3729,  0.4788,  0.2042,  0.3306, -0.3264, -0.2471,
          0.3756, -0.2152, -0.1101,  0.2048, -0.1268, -0.3149, -0.2185,  0.1856],
        [-0.0387,  0.3687, -0.3605,  0.1807, -0.1386,  0.1414,  0.4445,  0.5877,
         -0.1079,  0.2080, -0.3797,  0.3645,  0.1634, -0.2281,  0.4158, -0.5624,
          0.2715,  0.4161,  0.1966, -0.3475, -0.1538,  0.4041,  0.0800,  0.5462]],
       grad_fn=<SliceBackward0>)

Layer: fc3.bias | Size: torch.Size([12]) | Values : tensor([-0.1063,  0.0016], grad_fn=<SliceBackward0>)       

Layer: fc4.weight | Size: torch.Size([6, 12]) | Values : tensor([[ 0.3684,  0.3588,  0.4967,  0.4015, -0.6398, -0.2540,  0.2881, -0.4240,
         -0.2107,  0.1753, -0.0842,  0.5281],
        [ 0.0090, -0.4973,  0.6199, -0.4902,  0.1952,  0.6437, -0.5067,  0.5859,
          0.8092,  0.8759,  0.0526, -0.3403]], grad_fn=<SliceBackward0>)

Layer: fc4.bias | Size: torch.Size([6]) | Values : tensor([-0.2995,  0.0266], grad_fn=<SliceBackward0>)        

Layer: fc5.weight | Size: torch.Size([1, 6]) | Values : tensor([[-0.0259, -0.4790,  0.7276,  0.2180,  0.6982, -0.7128]],
       grad_fn=<SliceBackward0>)

Layer: fc5.bias | Size: torch.Size([1]) | Values : tensor([0.0468], grad_fn=<SliceBackward0>)

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mfbz.cn/a/776932.html

如若内容造成侵权/违法违规/事实不符,请联系我们进行投诉反馈qq邮箱809451989@qq.com,一经查实,立即删除!

相关文章

Spring 6.1.10版本源码编译

每篇一句 我们对时间的感知其实非常主观&#xff0c;我们越习惯于我们的生活方式&#xff0c;生活里面的新鲜感就越少&#xff0c;我们对时间 的感知就越快&#xff0c;生命就越短。 1.源码下载 进入Spring官网 https://spring.io/ 按照上图步骤进入如下Spring Framework链…

数据结构--单链表实现

欢迎光顾我的homepage 前言 链表和顺序表都是线性表的一种&#xff0c;但是顺序表在物理结构和逻辑结构上都是连续的&#xff0c;但链表在逻辑结构上是连续的&#xff0c;而在物理结构上不一定连续&#xff1b;来看以下图片来认识链表与顺序表的差别 这里以动态顺序表…

实现沉浸式体验的秘诀:深入了解折幕投影技术!

在当今多媒体技术的浪潮中&#xff0c;投影技术已蜕变成为超越传统内容展示范畴的非凡工具&#xff0c;它深度融合了互动性与沉浸感&#xff0c;成为连接观众与虚拟世界的桥梁。折幕投影技术&#xff0c;作为这一领域的璀璨明珠&#xff0c;更是以其独特而神奇的手法&#xff0…

小酌消烦暑|人间正清欢

小暑是二十四节气之第十一个节气。暑&#xff0c;是炎热的意思&#xff0c;小暑为小热&#xff0c;还不十分热。小暑虽不是一年中最炎热的时节&#xff0c;但紧接着就是一年中最热的节气大暑&#xff0c;民间有"小暑大暑&#xff0c;上蒸下煮"之说。中国多地自小暑起…

开发必备基础知识【字符编码合集】

开发必备基础知识【字符编码合集】 大家在日常开发交流中会发现&#xff0c;别人那里运行的好好的文件&#xff0c;在你电脑上却无法编译&#xff0c;甚至出现一堆莫名其妙的字符&#xff0c;比如&#xff1a;&#xfffd;&#xfffd;&#xfffd; 程序中经常遇到一些关于乱码…

科普文:如何进行有效沟通

概叙 你会沟通吗&#xff1f; 你知道正确的沟通应该怎么做吗&#xff1f; 在日常生活和工作中&#xff0c;不会沟通带来的困扰是否让你感同身受&#xff1f; 在工作中&#xff0c;你是否因表达不清让观点无法被同事理解和采纳&#xff0c;影响职业发展&#xff1f; 与上级交流是…

开源全新H5充值系统源码/自定义首页+充值页面/灵活对接上游渠道接口

开源全新H5充值系统源码&#xff0c;系统基于thinkphp框架开发&#xff0c;功能已全完善&#xff0c;可灵活对接其他上游渠道接口&#xff0c;默认对接了大猿人接口&#xff0c;另外可无限制自定义创建充值页面&#xff0c;首页支持后台自定义修改&#xff0c;支持三级分销&…

STM32嵌入式工业机器人控制系统教程

目录 引言环境准备工业机器人控制系统基础代码实现&#xff1a;实现工业机器人控制系统 4.1 数据采集模块 4.2 数据处理与分析 4.3 运动控制系统实现 4.4 用户界面与数据可视化应用场景&#xff1a;工业自动化与优化问题解决方案与优化收尾与总结 1. 引言 工业机器人控制系统…

Java基础(六)——继承

个人简介 &#x1f440;个人主页&#xff1a; 前端杂货铺 ⚡开源项目&#xff1a; rich-vue3 &#xff08;基于 Vue3 TS Pinia Element Plus Spring全家桶 MySQL&#xff09; &#x1f64b;‍♂️学习方向&#xff1a; 主攻前端方向&#xff0c;正逐渐往全干发展 &#x1…

计算机应用数学--第二次作业

第二次作业计算题编程题 第二次作业 计算题 给定图 G G G&#xff08;如图 1&#xff0c;图中数值为边权值&#xff09;&#xff0c;图切割将其分割成多个互不连通的⼦图。请使⽤谱聚类算法将图 G G G 聚类成 k 2 k 2 k2 类&#xff0c;使得&#xff1a; (a) RatioCut 最…

《向量数据库指南》——Milvus Cloud索引增强如何提升 RAG Pipeline 效果?

索引增强 1.自动合并块 在建立索引时&#xff0c;分两个粒度搭建&#xff0c;一个是chunk本身&#xff0c;另一个是chunk所在的parent chunk。先搜索更细粒度的chunks&#xff0c;接着采用一种合并的策略——如果前k个子chunk中超过n个chunk属于同一个parent chunk&#xff0c…

架构师学习理解和总结

1.架构设计理念 2.架构方法论 2.1需求分析 2.1.1常见需求层次 2.1.2 常见需求结果 2.1.3 需求与架构关系 2.2 领域分析 2.3 关键需求 2.4 概念架构设计 2.5 细化架构设计 2.6 架构设计验证 3.架构设计工具 3.1 DDD领域建模 3.2 41视图分析法 3.3 UML设计工具 4.架构师知…

Pathformer: multi-scale transformer

文章目录 摘要1 引言贡献 2 相关工作时间序列预测时间序列的多尺度建模 3 方法论3.1 多尺度变压器块多尺度划分双重注意力 3.2 自适应路径多尺度路由器多尺度聚合器 摘要 用于时间序列预测的变压器模型主要针对有限或固定尺度的时间序列进行建模&#xff0c;这使得捕捉跨越不同…

竞赛选题 卷积神经网络手写字符识别 - 深度学习

文章目录 0 前言1 简介2 LeNet-5 模型的介绍2.1 结构解析2.2 C1层2.3 S2层S2层和C3层连接 2.4 F6与C5层 3 写数字识别算法模型的构建3.1 输入层设计3.2 激活函数的选取3.3 卷积层设计3.4 降采样层3.5 输出层设计 4 网络模型的总体结构5 部分实现代码6 在线手写识别7 最后 0 前言…

如何实现一套键盘鼠标控制两台计算机(Mouse Without Borders快速上手教程)

需求背景 当我们需要同时使用一台主机和一台笔记本的时候&#xff0c;如果使用两套键盘和鼠标分别操作各自的系统&#xff0c;非常地不便捷且非常占据桌面空间。那么如何使用一套键盘鼠标控制两台电脑呢&#xff1f; 需求实现 软件说明 我们可以使用微软官方的一款软件Mous…

nodejs 获取客服端ip,以及获取ip一直都是127.0.0.1的问题

一、问题描述 在做登录日志的时候想要获取客户端的ip, 网上查了一下 通过 req.headers[x-forwarded-for] || req.connection.remoteAddress; 获取&#xff0c; 结果获取了之后不管是开发环境&#xff0c;还是生产环境获取到的一直都是 127.0.0.1&#xff0c;这是因为在配置N…

代码随想录算法训练营第22天|LeetCode 77. 组合、216.组合总和III、17.电话号码的字母组合

1. LeetCode 77. 组合 题目链接&#xff1a;https://leetcode.cn/problems/combinations/description/ 文章链接&#xff1a;https://programmercarl.com/0077.组合.html 视频链接&#xff1a;https://www.bilibili.com/video/BV1ti4y1L7cv 思路&#xff1a;利用递归回溯的方式…

开启视频创作新篇章!腾讯发布MimicMotion:单张图像+简单姿势,瞬间“活”化视频。

腾讯和上交发布了一个根据图片生成跳舞视频的项目MimicMotion。效果同时支持面部特征和唇形同步&#xff0c;不止可以搞跳舞视频&#xff0c;也可以做数字人。 MimicMotion方案优化的内容有&#xff1a; 引入基于置信度的姿态引导机制。确保生成的视频在时间上更加连贯流畅。 …

计算机图形学入门25:BRDF的测量

1.前言 BRDF(双向反射分布函数)可以用各种各样的材质去描述&#xff0c;但是这只是一种基于物理的描述或者近似&#xff0c;那什么是真正的BRDF&#xff1f;只有测出来的才是真正的。 为什么要测出BRDF&#xff1f;因为之前所描述的BRDF并不准确。如下图所示&#xff0c;以菲涅…

C++——模板详解(下篇)

一、非类型模板参数 模板参数分为类型形参与非类型形参。 类型形参即&#xff1a;出现在模板参数列表中&#xff0c;跟在class或者typename之后的参数类型名称。 非类型形参&#xff0c;就是用一个常量作为类&#xff08;函数&#xff09;模板的一个参数&#xff0c;在类&#…