Adaboost
一、基本内容
[!note]
实现思路:在每一轮训练中,记录每一次由 f ( x ) = ∑ m = 1 i − 1 α m G m ( x ) f(x) = \sum_{m=1}^{i-1}\alpha_mG_m(x) f(x)=∑m=1i−1αmGm(x)【错误\正确】分类的样本,在加入新的弱学习器中【提高\降低】分类【错误\正确】样本的权值(即改变样本的比例,类似过采样与降采样)
- 加法模型:多个弱分类器 G m ( x ) G_m(x) Gm(x)与对应权值 α m \alpha_m αm的叠加:
f ( x ) = ∑ m = 1 M α m G m ( x ) f(x) = \sum_{m=1}^M\alpha_mG_m(x) f(x)=m=1∑MαmGm(x)
- 样本权值的初始设计(权值平等):
w 1 , i = 1 N w_{1,i}=\frac{1}{N} w1,i=N1
- 二分类损失函数,指数损失函数:
L ( y , f ( x ) ) = e x p [ − y f ( x ) ] = e x p [ − y ⋅ ( f m − 1 ( x i ) + α m G m ( x ) ) ] L(y,f(x)) = exp[-yf(x)]=exp[-y ·(f_{m-1}(x_i)+\alpha_mG_m(x))] L(y,f(x))=exp[−yf(x)]=exp[−y⋅(fm−1(xi)+αmGm(x))]
二、样本权值更新
可以发现,指数损失函数在【正确\错误】分类的样本的值【小于\大于】1,正好符合Adaboost
加法模型的实现思路,在加入新的弱学习器中【提高\降低】分类【错误\正确】样本的权值,所以第
m
m
m个弱分类器,第
i
i
i个样本的权值更新可以设计为:
ω
m
,
i
=
e
x
p
[
−
y
i
f
m
−
1
(
x
i
)
]
\omega_{m,i} = exp[-y_if_{m-1}(x_i)]
ωm,i=exp[−yifm−1(xi)]
在Adaboost
模型中,
f
m
−
1
(
x
i
)
=
a
m
−
1
G
m
−
1
f_{m-1}(x_i) = a_{m-1}G_{m-1}
fm−1(xi)=am−1Gm−1,所以,权值更新的公式为:
w
m
,
i
=
e
x
p
(
y
i
(
−
a
m
−
1
)
G
m
−
1
(
x
i
)
)
w_{m,i} = exp(y_i(-a_{m-1})G_{m-1}(x_i))
wm,i=exp(yi(−am−1)Gm−1(xi))
为了加强不同弱分类器之间的依赖性,在更新权值时是在上一个弱分类器模型的基础上进行更新的:
w
m
,
i
=
w
m
−
1
,
i
⋅
e
x
p
(
−
a
m
−
1
y
i
G
m
−
1
(
x
i
)
)
w_{m,i} = w_{{m-1},i} · exp(-a_{m-1}y_iG_{m-1}(x_i))
wm,i=wm−1,i⋅exp(−am−1yiGm−1(xi))
最后加入
Z
m
−
1
Z_{m-1}
Zm−1,得到最终的权值更新式子:
w
m
,
i
=
w
m
−
1
,
i
z
m
−
1
e
x
p
(
−
a
m
−
1
y
i
G
m
−
1
(
x
i
)
)
w_{m,i} = \frac{w_{m-1,i}}{z_{m-1}}exp(-a_{m-1}y_iG_{m-1}(x_i))
wm,i=zm−1wm−1,iexp(−am−1yiGm−1(xi))
其中,规范化因子
Z
m
−
1
Z_{m-1}
Zm−1表示为:
Z
m
−
1
=
∑
i
=
1
N
ω
m
−
1
,
i
e
x
p
(
−
a
m
−
1
y
i
G
m
−
1
(
x
i
)
)
Z_{m-1} = \sum_{i=1}^{N}\omega_{m-1,i}exp(-a_{m-1}y_iG_{m-1}(x_i))
Zm−1=i=1∑Nωm−1,iexp(−am−1yiGm−1(xi))
[!important]
分类正确时, y i = G m − 1 ( x i ) y_i=G_{m-1}(x_i) yi=Gm−1(xi), e x p ( − a m − 1 y i G m − 1 ( x i ) ) = e x p ( − a m − 1 ) < 1 exp(-a_{m-1}y_iG_{m-1}(x_i))=exp(-a_{m-1}) < 1 exp(−am−1yiGm−1(xi))=exp(−am−1)<1, 其中 a m − 1 > 1 a_{m-1}>1 am−1>1,对应正确样本的权值会减少,同理当错误分类时, e x p ( − a m − 1 ) > 1 exp(-a_{m-1}) > 1 exp(−am−1)>1,对应样本的权值增加
三、弱分类器权值更新
目标损失函数:
L
(
y
,
f
(
x
)
)
=
e
x
p
[
−
y
f
(
x
)
]
=
e
x
p
[
−
y
⋅
(
f
m
−
1
(
x
i
)
+
α
m
G
m
(
x
)
)
]
L(y,f(x)) = exp[-yf(x)]=exp[-y ·(f_{m-1}(x_i)+\alpha_mG_m(x))]
L(y,f(x))=exp[−yf(x)]=exp[−y⋅(fm−1(xi)+αmGm(x))]
在模型优化更新权重的过程中,并不是与传统模型一样采用梯度下降法,因为弱分类器的数量多,更新的参数多,难以实现,在Adaboost模型中采用的前向分布算法,只更新当前弱分类器
G
m
G_m
Gm的参数,优化目标:
(
a
m
,
G
m
(
x
)
=
a
r
g
m
a
x
a
,
G
∑
i
=
1
N
e
x
p
[
−
y
i
(
f
m
−
1
(
x
i
)
+
α
G
m
(
x
i
)
)
]
)
(a_m,G_m(x) = argmax_{a,G} \sum_{i=1}^Nexp[-y_i(f_{m-1}(x_i) + \alpha G_m(x_i))])
(am,Gm(x)=argmaxa,Gi=1∑Nexp[−yi(fm−1(xi)+αGm(xi))])
对
a
a
a求导的结果,表示损失最小的
α
\alpha
α:
α
m
=
1
2
l
o
g
1
−
e
m
e
m
\alpha_m = \frac{1}{2}log\frac{1-e_m}{e_m}
αm=21logem1−em
其中
e
m
e_m
em表示误差率:
e
m
′
=
∑
i
=
1
N
P
(
G
m
(
x
i
≠
y
i
)
)
=
∑
i
=
1
N
ω
m
i
I
(
G
m
(
x
i
≠
y
i
)
)
=
∑
G
m
(
x
i
)
≠
y
i
ω
m
i
e_m' = \sum_{i=1}^N P(G_m(x_i \neq y_i))= \sum_{i=1}^N\omega_{mi}I(G_m(x_i \neq y_i))= \sum_{G_m(x_i) \neq y_i}\omega_{mi}
em′=i=1∑NP(Gm(xi=yi))=i=1∑NωmiI(Gm(xi=yi))=Gm(xi)=yi∑ωmi
最后还需要实现归一化:
e
m
=
∑
G
m
(
x
i
)
≠
y
i
ω
m
i
∑
i
=
1
N
ω
i
m
e_m = \frac{\sum_{G_m(x_i) \neq y_i}\omega_{mi}}{\sum_{i=1}^N}\omega_i^{m}
em=∑i=1N∑Gm(xi)=yiωmiωim
四、代码实现
- 定义弱分类器,采用决策树
model = DecisionTreeClassifier(max_depth=1)
model.fit(X, y, sample_weight=w)
y_pred = model.predict(X)
- 权重更新
权重初始化: w 1 , i = 1 N w_{1,i}=\frac{1}{N} w1,i=N1
# 权重初始化
w = np.ones(n_samples) / n_samples # 初始化权重
误差率计算: e m = ∑ G m ( x i ) ≠ y i ω m i ∑ i = 1 N ω i m e_m = \frac{\sum_{G_m(x_i) \neq y_i}\omega_{mi}}{\sum_{i=1}^N}\omega_i^{m} em=∑i=1N∑Gm(xi)=yiωmiωim
弱分类器 α \alpha α更新: α m = 1 2 l o g 1 − e m e m \alpha_m = \frac{1}{2}log\frac{1-e_m}{e_m} αm=21logem1−em
# 误差率计算以及更新权重
err = np.sum(w * (y_pred != y)) / np.sum(w)
alpha = 0.5 * np.log((1 - err) / max(err, 1e-10))
样本权重更新: w m , i = w m − 1 , i z m − 1 e x p ( − a m − 1 y i G m − 1 ( x i ) ) w_{m,i} = \frac{w_{m-1,i}}{z_{m-1}}exp(-a_{m-1}y_iG_{m-1}(x_i)) wm,i=zm−1wm−1,iexp(−am−1yiGm−1(xi))
# 更新样本权重
norm = np.sum(w)
w = w * np.exp(-alpha * y * y_pred)
w /= norm
- 加法模型预测: f ( x ) = ∑ m = 1 M α m G m ( x ) f(x) = \sum_{m=1}^M\alpha_mG_m(x) f(x)=∑m=1MαmGm(x)
for alpha, model in zip(self.alphas, self.models):
pred += alpha * model.predict(X)
return np.sign(pred)
- 完整代码:
class AdaBoost:
def __init__(self, n_estimators=50):
self.n_estimators = n_estimators
self.alphas = [] # 每个弱分类器的权重
self.models = [] # 弱分类器列表
def fit(self, X, y):
n_samples, n_features = X.shape
# 初始化样本权重
w = np.ones(n_samples) / n_samples # 初始化权重
for _ in range(self.n_estimators):
# 使用样本权重训练一个弱分类器
model = DecisionTreeClassifier(max_depth=1)
model.fit(X, y, sample_weight=w)
y_pred = model.predict(X)
# 计算分类误差率
err = np.sum(w * (y_pred != y)) / np.sum(w)
if err >= 0.5:
break
# 计算弱分类器的权重
alpha = 0.5 * np.log((1 - err) / max(err, 1e-10))
self.alphas.append(alpha)
self.models.append(model)
norm = np.sum(w)
# 更新样本权重
w = w * np.exp(-alpha * y * y_pred)
w /= norm
def predict(self, X):
pred = np.zeros(X.shape[0])
for alpha, model in zip(self.alphas, self.models):
pred += alpha * model.predict(X)
return np.sign(pred)