动机
Many previous works observed that the well-generalized solutions are located in flat areas rather than sharp areas of the loss landscapes.
通常,由于每个本地模型的任务是相同的,因此每个客户端的损失情况仍然相似。直观上,与尖锐的最佳区域相比,不同客户端的平坦最佳区域部分重叠的可能性更大。换句话说,当模型收敛到重叠区域时,它可以在大多数客户端中实现高推理性能。
怎么做能达到效果?
FedMut框架
global model :
ω
g
l
b
=
{
l
a
y
e
r
1
,
l
a
y
e
r
2
,
.
.
.
,
l
a
y
e
r
L
}
.
\omega_{glb} = \{ layer_1, layer_2,..., layer_L \}.
ωglb={layer1,layer2,...,layerL}.
gradients of
ω
g
l
b
\omega_{glb}
ωglb :
g
g
l
b
=
{
l
g
1
,
l
g
2
,
.
.
.
,
l
g
L
}
.
g_{glb} = \{lg_1, lg_2,..., lg_L\}.
gglb={lg1,lg2,...,lgL}.
mutation weight list
L
m
u
t
i
L^{i}_{mut}
Lmuti:
L
m
u
t
i
=
{
v
1
i
,
v
2
i
,
.
.
.
,
v
L
i
}
.
L^{i}_{mut}=\{v^i_1, v^i_2,..., v^i_L\}.
Lmuti={v1i,v2i,...,vLi}.
mutated model
ω
m
u
t
i
\omega^{i}_{mut}
ωmuti
ω
m
u
t
i
=
{
l
a
y
e
r
1
+
α
v
1
i
⋅
l
g
1
,
.
.
.
,
l
a
y
e
r
L
+
α
v
L
i
⋅
l
g
L
}
\omega^{i}_{mut} = \{layer_1+\alpha v^i_1 \cdot lg_1 ,..., layer_L+\alpha v^i_L \cdot lg_L\}
ωmuti={layer1+αv1i⋅lg1,...,layerL+αvLi⋅lgL}