上次我们基于CIFAR-10训练一个图像分类器,梳理了一下训练模型的全过程,并且对卷积神经网络有了一定的理解,我们再在GPU上搭建一个手写的数字识别cnn网络,加深巩固一下
步骤
- 加载数据集
- 定义神经网络
- 定义损失函数
- 训练网络
- 测试网络
MNIST数据集简介
MINIST是一个手写数字数据库(官网地址:http://yann.lecun.com/exdb/mnist/),它有6w张训练样本和1w张测试样本,每张图的像素尺寸为28*28,如下图一共4个图片,这些图片文件均被保存为二进制格式
训练全过程
1.加载数据集
import torch
import torchvision
from torchvision import transforms
trainset = torchvision.datasets.MNIST(
root='./data',
train=True,
download=True,
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
]))
trainloader = torch.utils.data.DataLoader(trainset,
batch_size=64,
shuffle=True
)
testset = torchvision.datasets.MNIST('./data', train=False, transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
]))
test_loader = torch.utils.data.DataLoader(testset
, batch_size=64, shuffle=True)
展示一些训练图片
import numpy as np
import matplotlib.pyplot as plt
def imshow(img):
img = img / 2 + 0.5 # unnormalize
npimg = img.numpy()
plt.imshow(np.transpose(npimg, (1, 2, 0)))
plt.show()
# 得到batch中的数据
dataiter = iter(train_loader)
images, labels = dataiter.next()
imshow(torchvision.utils.make_grid(images))
2.定义卷积神经网络
import torch
import torch.nn as nn
import torch.nn.functional as F#可以调用一些常见的函数,例如非线性以及池化等
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
# input image channel, 6 output channels, 5x5 square convolution
self.conv1 = nn.Conv2d(1, 6, 5)
self.conv2 = nn.Conv2d(6, 16, 5)
# 全连接 从16 * 4 * 4的维度转成120
self.fc1 = nn.Linear(16 * 4 * 4, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
x = F.max_pool2d(F.relu(self.conv2(x)), 2)#(2,2)也可以直接写成数字2
x = x.view(-1, self.num_flat_features(x))#将维度转成以batch为第一维 剩余维数相乘为第二维
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
def num_flat_features(self, x):
size = x.size()[1:] # 第一个维度batch不考虑
num_features = 1
for s in size:
num_features *= s
return num_features
net = Net()
print(net)
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(device)
net.to(device)
3.定义损失和优化器
criterion = nn.CrossEntropyLoss()
import torch.optim as optim
optimizer = optim.SGD(net.parameters(), lr=0.01, momentum=0.9)
这里设置了 momentum=0.9 ,训练一轮的准确率由90%提到了98%
4.训练网络
def train(epochs):
net.train()
for epoch in range(epochs):
running_loss = 0.0
for i, data in enumerate(trainloader):
# 得到输入 和 标签
inputs, labels = data
inputs, labels = inputs.to(device), labels.to(device)
# 消除梯度
optimizer.zero_grad()
# 前向传播 计算损失 后向传播 更新参数
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
# 打印日志
running_loss += loss.item()
if i % 100 == 0: # 每100个batch打印一次
print('[%d, %5d] loss: %.3f' %
(epoch + 1, i + 1, running_loss / 100))
running_loss = 0.0
torch.save(net, 'mnist.pth')
net.train():调用该
方法时,模型将进入训练模式。在训练模式下,一些特定的模块,例如Dropout和Batch Normalization,将被启用。这是因为在训练过程中,我们需要使用Dropout来防止过拟合,并使用Batch Normalization来加速收敛
net.eval():调用该
方法时,模型将进入评估模式。在评估模式下,一些特定的模块,例如Dropout和Batch Normalization,将被禁用。这是因为在评估过程中,我们不需要使用Dropout来防止过拟合,并且Batch Normalization的统计信息应该是固定的。
5.测试网络
在其它地方导入模型测试时需要将类的定义添加到加载模型的这个py文件中
from mnist.py import Net # 导入会运行mnist.py
net = torch.load('mnist.pth')
testset = torchvision.datasets.MNIST('./data', train=False, transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
]))
testloader = torch.utils.data.DataLoader(testset
, batch_size=64, shuffle=True)
correct = 0
total = 0
net.to('cpu')
print(net)
with torch.no_grad(): # 或者model.eval()
for data in testloader:
images, labels = data
outputs = net(images)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
print('Accuracy of the network on the 10000 test images: %d %%' % (100 * correct / total))
训练一轮速度
GPU:10s
CPU:10s
训练三轮速度
GPU:24.5s
CPU:28.6s
得出结论:训练数据计算量少的时候,无论在CPU上还是GPU,性能几乎都是接近的,而当训练数据计算量达到一定多的时候,GPU的优势就比较显著直观了
小小实验:
(1)加载并测试一张图片,正确则输出True
import torch
import torch.nn as nn
import torchvision
from torchvision import transforms
import torch.nn.functional as F
import cv2
import numpy as np
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 6, 5)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 4 * 4, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
x = F.max_pool2d(F.relu(self.conv2(x)), 2)
x = x.view(-1, self.num_flat_features(x))
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
def num_flat_features(self, x):
size = x.size()[1:]
num_features = 1
for s in size:
num_features *= s
return num_features
correct = 0
total = 0
net = torch.load('mnist.pth')
net.to('cpu')
# print(net)
with torch.no_grad():
imgdir = '3.jpeg'
img = cv2.imread(imgdir, 0)
img = cv2.resize(img, (28, 28))
trans = transforms.Compose(
[
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])
image = trans(img)
image = image.unsqueeze(0)
label = torch.tensor([int(imgdir.split('.')[0])])
outputs = net(image)
_, predicted = torch.max(outputs.data, 1)
print(predicted)
print((predicted == label).item())
拿刚刚训练的模型试了6张数字图片,只有一张2是预测对的....
unsuqeeze:通过unsuqeeze(int)中的int整数,增加一个维度,int整数表示维度增加到哪儿去,且维度为1,参数:【0, 1, 2】