前面我们介绍了人工智能是如何成为一个强大函数。接下来,搞清损失函数、优化方法和正则化等核心概念,才能真正驾驭它!
1. 什么是网络模型?
网络模型就像是一个精密的流水线工厂,由多个车间(层)组成,每个车间都负责特定的加工任务。原材料(输入数据)在这条流水线上逐步加工,最终产出成品(预测结果)。
基本组成部分
- 输入层:接收原始数据
- 隐藏层:进行数据处理转换
- 输出层:产生最终结果
import numpy as np
class SimpleNeuralNetwork:
def __init__(self, input_size, hidden_size, output_size):
# 初始化网络参数
self.hidden_weights = np.random.randn(input_size, hidden_size)
self.hidden_bias = np.zeros(hidden_size)
self.output_weights = np.random.randn(hidden_size, output_size)
self.output_bias = np.zeros(output_size)
def relu(self, x):
"""激活函数:小于0则置0,大于0保持原值"""
return np.maximum(0, x)
def forward(self, x):
"""前向传播:数据通过网络的过程"""
# 第一层转换
self.hidden = self.relu(np.dot(x, self.hidden_weights) + self.hidden_bias)
# 第二层转换
self.output = np.dot(self.hidden, self.output_weights) + self.output_bias
return self.output
常见网络模型类型
1. 前馈神经网络(最基础的模型)
class FeedForwardNetwork:
def __init__(self):
self.layers = [
{"neurons": 128, "activation": "relu"},
{"neurons": 64, "activation": "relu"},
{"neurons": 10, "activation": "softmax"}
]
2. 卷积神经网络(处理图像)
class SimpleCNN:
def __init__(self):
self.layers = [
{"type": "conv2d", "filters": 32, "kernel_size": 3},
{"type": "maxpool", "size": 2},
{"type": "conv2d", "filters": 64, "kernel_size": 3},
{"type": "flatten"},
{"type": "dense", "neurons": 10}
]
3. 循环神经网络(处理序列)
class SimpleRNN:
def __init__(self, input_size, hidden_size):
self.hidden_size = hidden_size
# 初始化权重
self.Wx = np.random.randn(input_size, hidden_size) # 输入权重
self.Wh = np.random.randn(hidden_size, hidden_size) # 隐状态权重
self.b = np.zeros(hidden_size) # 偏置
模型的实际应用示例
- 图像识别模型:
def image_recognition_model():
model = {
"conv1": {"filters": 32, "kernel_size": 3},
"pool1": {"size": 2},
"conv2": {"filters": 64, "kernel_size": 3},
"pool2": {"size": 2},
"flatten": {},
"dense1": {"units": 128},
"dense2": {"units": 10}
}
return model
- 文本处理模型:
def text_processing_model():
model = {
"embedding": {"vocab_size": 10000, "embed_dim": 100},
"lstm": {"units": 64, "return_sequences": True},
"global_pool": {},
"dense": {"units": 1, "activation": "sigmoid"}
}
return model
模型的特点
- 层次结构
class LayeredNetwork:
def __init__(self):
self.architecture = [
("input", 784), # 输入层:接收原始数据
("hidden", 256, "relu"), # 隐藏层:特征提取
("hidden", 128, "relu"), # 隐藏层:特征组合
("output", 10, "softmax") # 输出层:生成预测
]
- 参数学习
def train_step(model, inputs, targets):
# 前向传播
predictions = model.forward(inputs)
# 计算损失
loss = calculate_loss(predictions, targets)
# 反向传播
gradients = calculate_gradients(loss)
# 更新参数
model.update_parameters(gradients)
return loss
- 特征提取能力
def extract_features(model, input_data):
features = []
# 逐层提取特征
for layer in model.layers:
input_data = layer.process(input_data)
features.append(input_data)
return features
模型选择建议
根据任务类型选择合适的模型:
- 图像处理:使用CNN
def choose_model(task_type):
if task_type == "image":
return CNN()
elif task_type == "text":
return RNN()
elif task_type == "tabular":
return FeedForwardNetwork()
- 文本处理:使用RNN或Transformer
- 表格数据:使用前馈神经网络
示例:完整的模型定义
class ComprehensiveModel:
def __init__(self, input_shape, num_classes):
self.input_shape = input_shape
self.num_classes = num_classes
def build(self):
model = {
# 特征提取部分
"feature_extractor": [
{"type": "conv2d", "filters": 32, "kernel_size": 3},
{"type": "maxpool", "size": 2},
{"type": "conv2d", "filters": 64, "kernel_size": 3},
{"type": "maxpool", "size": 2}
],
# 分类部分
"classifier": [
{"type": "flatten"},
{"type": "dense", "units": 128, "activation": "relu"},
{"type": "dropout", "rate": 0.5},
{"type": "dense", "units": self.num_classes, "activation": "softmax"}
]
}
return model
这个网络模型就像一个智能工厂:
- 输入层是原料验收处
- 隐藏层是各个加工车间
- 输出层是成品检验处
- 参数是工人的操作技能
- 激活函数是工人的操作方法
- 训练过程就是工人练习和提升技能的过程
通过这种方式,网络模型能够学习处理各种复杂的任务,从图像识别到语言翻译,从游戏对弈到自动驾驶。
2. 什么是学习?
想象你在教一个小孩认识猫:
- 开始时,他可能把所有毛茸茸的动物都叫做猫
- 通过不断看例子,他逐渐学会区分猫和狗
- 最后,他能准确认出猫
在AI中,学习就是:
- 看大量例子(数据)
- 调整模型参数
- 提高预测准确率
# 简单的学习过程示例
class SimpleModel:
def __init__(self):
self.weight = 1.0 # 初始参数
def predict(self, x):
return self.weight * x
def learn(self, x, true_value, learning_rate):
prediction = self.predict(x)
error = true_value - prediction
# 调整参数
self.weight += learning_rate * error
3. 什么是学习率?
学习率就像是学习时的"步子大小":
- 太大:容易跨过最佳答案(学得太快,容易过头)
- 太小:需要很长时间才能找到答案(学得太慢)
# 不同学习率的效果
def train_with_different_learning_rates():
learning_rates = [0.1, 0.01, 0.001]
for lr in learning_rates:
model = SimpleModel()
for _ in range(100):
model.learn(x=2, true_value=4, learning_rate=lr)
4. 什么是损失函数?
损失函数就像是"考试成绩",用来衡量模型预测得有多准:
- 预测越准确,分数越低
- 预测越差,分数越高
常见的损失函数:
import numpy as np
# 均方误差(MSE)
def mse_loss(predictions, targets):
return np.mean((predictions - targets) ** 2)
# 平均绝对误差(MAE)
def mae_loss(predictions, targets):
return np.mean(np.abs(predictions - targets))
# 交叉熵损失(用于分类问题)
def cross_entropy_loss(predictions, targets):
return -np.sum(targets * np.log(predictions))
5. 什么是优化器?
优化器就像是"学习策略",决定如何调整模型参数:
常见优化器示例:
class SGD:
def __init__(self, learning_rate=0.01):
self.lr = learning_rate
def update(self, parameter, gradient):
return parameter - self.lr * gradient
class Momentum:
def __init__(self, learning_rate=0.01, momentum=0.9):
self.lr = learning_rate
self.momentum = momentum
self.velocity = 0
def update(self, parameter, gradient):
self.velocity = self.momentum * self.velocity - self.lr * gradient
return parameter + self.velocity
6. 什么是收敛?
收敛就像是"学有所成"的状态:
- 模型的表现趋于稳定
- 损失不再明显下降
- 预测结果基本符合预期
def check_convergence(loss_history, tolerance=1e-5):
"""检查是否收敛"""
if len(loss_history) < 2:
return False
recent_loss_change = abs(loss_history[-1] - loss_history[-2])
return recent_loss_change < tolerance
7. 什么是正则化?
正则化就像是给模型设置"课外作业",防止它"死记硬背"(过拟合):
# L1正则化(Lasso)
def l1_regularization(weights, lambda_param):
return lambda_param * np.sum(np.abs(weights))
# L2正则化(Ridge)
def l2_regularization(weights, lambda_param):
return lambda_param * np.sum(weights ** 2)
# Dropout正则化
def dropout(layer_output, dropout_rate=0.5):
mask = np.random.binomial(1, 1-dropout_rate, size=layer_output.shape)
return layer_output * mask / (1-dropout_rate)
实际应用示例
让我们把这些概念组合起来:
class SimpleNeuralNetwork:
def __init__(self):
self.weights = np.random.randn(10)
self.optimizer = Momentum()
self.loss_history = []
def train(self, x, y, epochs=1000):
for epoch in range(epochs):
# 前向传播
prediction = self.predict(x)
# 计算损失
loss = mse_loss(prediction, y)
self.loss_history.append(loss)
# 计算梯度
gradient = self.calculate_gradient(x, y)
# 更新参数
self.weights = self.optimizer.update(self.weights, gradient)
# 检查是否收敛
if check_convergence(self.loss_history):
print(f"模型在第 {epoch} 轮收敛")
break
def predict(self, x):
return np.dot(x, self.weights)
小结
这些概念环环相扣:
- 函数定义了模型的结构
- 学习让模型不断改进
- 学习率决定改进的步子大小
- 损失函数评估模型表现
- 优化器指导参数更新
- 收敛标志学习完成
- 正则化防止过度学习
就像学习骑自行车:
- 函数是自行车的结构
- 学习是练习的过程
- 学习率是每次调整的幅度
- 损失函数是摔倒的次数
- 优化器是练习的方法
- 收敛是学会骑车
- 正则化是在不同路况下练习
延伸阅读
-
深度学习中的优化器解析:从 SGD 到 Adam - https://ruder.io/optimizing-gradient-descent/
-
神经网络基础:一文搞懂前向传播与反向传播 - https://medium.com/@14prakash/back-propagation-is-very-simple-who-made-it-complicated-97b794c97e5c
-
理解 LSTM 网络工作原理 - https://colah.github.io/posts/2015-08-Understanding-LSTMs/
-
深入浅出 Batch Normalization - https://towardsdatascience.com/batch-normalization-in-neural-networks-1ac91516821c
-
一文理解深度学习中的正则化技术 - https://neptune.ai/blog/fighting-overfitting-with-l1-or-l2-regularization
-
可视化理解卷积神经网络 - https://poloclub.github.io/cnn-explainer/
-
深度学习中的学习率设置技巧 - https://www.jeremyjordan.me/nn-learning-rate/
-
损失函数最优化指南 - https://machinelearningmastery.com/how-to-choose-loss-functions-when-training-deep-learning-neural-networks/
-
Transformer模型详解:理解自注意力机制 - https://jalammar.github.io/illustrated-transformer/
-
深度学习中的激活函数对比 - https://mlfromscratch.com/activation-functions-explained/
-
梯度下降优化算法总结 - https://towardsdatascience.com/gradient-descent-algorithm-and-its-variants-10f652806a3
-
深度学习模型训练技巧:实用指南 - https://stanford.edu/~shervine/blog/pytorch-how-to-generate-data-parallel
-
交叉验证与模型评估详解 - https://scikit-learn.org/stable/modules/cross_validation.html
-
神经网络架构搜索入门 - https://lilianweng.github.io/posts/2020-08-06-nas/
-
深度学习中的数据增强技术 - https://neptune.ai/blog/data-augmentation-in-deep-learning