目标检测——LeNet

demo的流程

model.py ——定义LeNet网络模型
train.py ——加载数据集并训练，训练集计算loss，测试集计算accuracy，保存训练好的网络参数
predict.py——得到训练好的网络参数后，用自己找的图像进行分类测试

demo的流程
1. model.py
- 1.1 卷积 Conv2d
- 1.2 池化 MaxPool2d
- 1.3 Tensor的展平：view()
- 1.4 全连接 Linear
2. train.py
- 2.1 导入数据集
- 数据预处理
- 数据集介绍
- 导入、加载训练集
- 导入、加载测试集
- 2.2 训练过程
- 2.3 使用GPU/CPU训练
3. predict.py

1. model.py

先给出代码，模型是基于LeNet做简单修改，层数很浅，容易理解：

# 使用torch.nn包来构建神经网络.
import torch.nn as nn
import torch.nn.functional as Fclass LeNet(nn.Module):                  # 继承于nn.Module这个父类def __init__(self):                       # 初始化网络结构super(LeNet, self).__init__()      # 多继承需用到super函数self.conv1 = nn.Conv2d(3, 16, 5)self.pool1 = nn.MaxPool2d(2, 2)self.conv2 = nn.Conv2d(16, 32, 5)self.pool2 = nn.MaxPool2d(2, 2)self.fc1 = nn.Linear(32*5*5, 120)self.fc2 = nn.Linear(120, 84)self.fc3 = nn.Linear(84, 10)def forward(self, x):             # 正向传播过程x = F.relu(self.conv1(x))    # input(3, 32, 32) output(16, 28, 28)x = self.pool1(x)            # output(16, 14, 14)x = F.relu(self.conv2(x))    # output(32, 10, 10)x = self.pool2(x)            # output(32, 5, 5)x = x.view(-1, 32*5*5)       # output(32*5*5)x = F.relu(self.fc1(x))      # output(120)x = F.relu(self.fc2(x))      # output(84)x = self.fc3(x)              # output(10)return x

需注意：

pytorch 中 tensor（也就是输入输出层）的通道排序为：[batch, channel, height, width]
pytorch中的卷积、池化、输入输出层中参数的含义与位置，如下图：

1.1 卷积 Conv2d

我们常用的卷积（Conv2d）在pytorch中对应的函数是：

torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros')

一般使用时关注以下几个参数即可：

in_channels：输入特征矩阵的深度。如输入一张RGB彩色图像，那in_channels=3
out_channels：输入特征矩阵的深度。也等于卷积核的个数，使用n个卷积核输出的特征矩阵深度就是n
kernel_size：卷积核的尺寸。可以是int类型，如3 代表卷积核的height=width=3，也可以是 tuple类型如(3, 5)代表卷积核的height=3，width=5
stride：卷积核的步长。默认为1，和kernel_size一样输入可以是int型，也可以是tuple类型
padding：补零操作，默认为0。可以为int型如1即补一圈0，如果输入为tuple型如(2, 1) 代表在上下补2行，左右补1列。

附上pytorch官网上的公式：

经卷积后的输出层尺寸计算公式为：

输入图片大小 W×W（一般情况下Width=Height）
Filter大小 F×F
步长 S
padding的像素数 P
若计算结果不为整数呢？pytorch中的卷积操作详解

1.2 池化 MaxPool2d

最大池化（MaxPool2d）在 pytorch 中对应的函数是：

MaxPool2d(kernel_size, stride)

1.3 Tensor的展平：view()

注意到，在经过第二个池化层后，数据还是一个三维的Tensor (32, 5, 5)，需要先经过展平后(3255)再传到全连接层：

  x = self.pool2(x)            # output(32, 5, 5)x = x.view(-1, 32*5*5)       # output(32*5*5)x = F.relu(self.fc1(x))      # output(120)

1.4 全连接 Linear

全连接（ Linear）在 pytorch 中对应的函数是：

Linear(in_features, out_features, bias=True)

2. train.py

2.1 导入数据集

导入包

import torch
import torchvision
import torch.nn as nn
from model import LeNet
import torch.optim as optim
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np
import time

数据预处理

对输入的图像数据做预处理，即由shape (H x W x C) in the range [0, 255] → shape (C x H x W) in the range [0.0, 1.0]

transform = transforms.Compose([transforms.ToTensor(),transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

数据集介绍

利用torchvision.datasets函数可以在线导入pytorch中的数据集，包含一些常见的数据集如MNIST等

此demo用的是CIFAR10数据集，也是一个很经典的图像分类数据集，由 Hinton 的学生 Alex Krizhevsky 和 Ilya Sutskever 整理的一个用于识别普适物体的小型数据集，一共包含 10 个类别的 RGB 彩色图片。

导入、加载训练集

# 导入50000张训练图片
train_set = torchvision.datasets.CIFAR10(root='./data',      # 数据集存放目录train=True,       # 表示是数据集中的训练集download=True,    # 第一次运行时为True，下载数据集，下载完成后改为Falsetransform=transform) # 预处理过程

# 加载训练集，实际过程需要分批次（batch）训练
train_loader = torch.utils.data.DataLoader(train_set,    # 导入的训练集batch_size=50, # 每批训练的样本数shuffle=False,  # 是否打乱训练集num_workers=0)  # 使用线程数，在windows下设置为0

导入、加载测试集

# 导入10000张测试图片
test_set = torchvision.datasets.CIFAR10(root='./data',  #放在根目录的data文件夹下train=False,    # 表示是数据集中的测试集download=False,transform=transform) #下载好后要把download改成false
# 加载测试集，并且分批次训练
test_loader = torch.utils.data.DataLoader(test_set, batch_size=10000, # 每批用于验证的样本数shuffle=False, num_workers=0)
# 获取测试集中的图像和标签，用于accuracy计算
test_data_iter = iter(test_loader)
test_image, test_label = test_data_iter.next()

2.2 训练过程

以本demo为例，训练集一共有50000个样本，batch_size=50，那么完整的训练一次样本：iteration或step=1000，epoch=1

net = LeNet()                                       # 定义训练的网络模型
loss_function = nn.CrossEntropyLoss()              # 定义损失函数为交叉熵损失函数
optimizer = optim.Adam(net.parameters(), lr=0.001)  # 定义优化器（训练参数，学习率）for epoch in range(5):  # 一个epoch即对整个训练集进行一次训练running_loss = 0.0time_start = time.perf_counter()for step, data in enumerate(train_loader, start=0):   # 遍历训练集，step从0开始计算inputs, labels = data    # 获取训练集的图像和标签optimizer.zero_grad()   # 清除历史梯度# forward + backward + optimizeoutputs = net(inputs)                  # 正向传播loss = loss_function(outputs, labels) # 计算损失loss.backward()                      # 反向传播optimizer.step()                      # 优化器更新参数# 打印耗时、损失、准确率等数据running_loss += loss.item()if step % 1000 == 999:    # print every 1000 mini-batches，每1000步打印一次with torch.no_grad(): # 在以下步骤中（验证过程中）不用计算每个节点的损失梯度，防止内存占用outputs = net(test_image)                # 测试集传入网络（test_batch_size=10000），output维度为[10000,10]predict_y = torch.max(outputs, dim=1)[1] # 以output中值最大位置对应的索引（标签）作为预测输出accuracy = (predict_y == test_label).sum().item() / test_label.size(0)print('[%d, %5d] train_loss: %.3f  test_accuracy: %.3f' %  # 打印epoch，step，loss，accuracy(epoch + 1, step + 1, running_loss / 500, accuracy))print('%f s' % (time.perf_counter() - time_start))        # 打印耗时running_loss = 0.0print('Finished Training')# 保存训练得到的参数
save_path = './Lenet.pth'
torch.save(net.state_dict(), save_path)

打印信息如下：

[1,  1000] train_loss: 1.537  test_accuracy: 0.541
35.345407 s
[2,  1000] train_loss: 1.198  test_accuracy: 0.605
40.532376 s
[3,  1000] train_loss: 1.048  test_accuracy: 0.641
44.144097 s
[4,  1000] train_loss: 0.954  test_accuracy: 0.647
41.313228 s
[5,  1000] train_loss: 0.882  test_accuracy: 0.662
41.860646 s
Finished Training

2.3 使用GPU/CPU训练

使用下面语句可以在有GPU时使用GPU，无GPU时使用CPU进行训练

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)

也可以直接指定

device = torch.device("cuda")
# 或者
# device = torch.device("cpu")

对应的，需要用to()函数来将Tensor在CPU和GPU之间相互移动，分配到指定的device中计算

net = LeNet()
net.to(device) # 将网络分配到指定的device中
loss_function = nn.CrossEntropyLoss()
optimizer = optim.Adam(net.parameters(), lr=0.001) for epoch in range(5): running_loss = 0.0time_start = time.perf_counter()for step, data in enumerate(train_loader, start=0):inputs, labels = dataoptimizer.zero_grad()outputs = net(inputs.to(device))                  # 将inputs分配到指定的device中loss = loss_function(outputs, labels.to(device))  # 将labels分配到指定的device中loss.backward()optimizer.step()running_loss += loss.item()if step % 1000 == 999:    with torch.no_grad(): outputs = net(test_image.to(device)) # 将test_image分配到指定的device中predict_y = torch.max(outputs, dim=1)[1]accuracy = (predict_y == test_label.to(device)).sum().item() / test_label.size(0) # 将test_label分配到指定的device中print('[%d, %5d] train_loss: %.3f  test_accuracy: %.3f' %(epoch + 1, step + 1, running_loss / 1000, accuracy))print('%f s' % (time.perf_counter() - time_start))running_loss = 0.0print('Finished Training')save_path = './Lenet.pth'
torch.save(net.state_dict(), save_path)

打印信息如下：

cuda
[1,  1000] train_loss: 1.569  test_accuracy: 0.527
18.727597 s
[2,  1000] train_loss: 1.235  test_accuracy: 0.595
17.367685 s
[3,  1000] train_loss: 1.076  test_accuracy: 0.623
17.654908 s
[4,  1000] train_loss: 0.984  test_accuracy: 0.639
17.861825 s
[5,  1000] train_loss: 0.917  test_accuracy: 0.649
17.733115 s
Finished Training

可以看到，用GPU训练时，速度提升明显，耗时缩小。

3. predict.py

# 导入包
import torch
import torchvision.transforms as transforms
from PIL import Image
from model import LeNet# 数据预处理
transform = transforms.Compose([transforms.Resize((32, 32)), # 首先需resize成跟训练集图像一样的大小transforms.ToTensor(),transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])# 导入要测试的图像（自己找的，不在数据集中），放在源文件目录下
im = Image.open('horse.jpg')
im = transform(im)  # [C, H, W]
im = torch.unsqueeze(im, dim=0)  # 对数据增加一个新维度，因为tensor的参数是[batch, channel, height, width] # 实例化网络，加载训练好的模型参数
net = LeNet()
net.load_state_dict(torch.load('Lenet.pth'))# 预测
classes = ('plane', 'car', 'bird', 'cat','deer', 'dog', 'frog', 'horse', 'ship', 'truck')
with torch.no_grad():outputs = net(im)predict = torch.max(outputs, dim=1)[1].data.numpy()
print(classes[int(predict)])

输出即为预测的标签。

其实预测结果也可以用 softmax 表示，输出10个概率：

with torch.no_grad():outputs = net(im)predict = torch.softmax(outputs, dim=1)
print(predict)

输出结果中最大概率值对应的索引即为预测标签的索引。

tensor([[2.2782e-06, 2.1008e-07, 1.0098e-04, 9.5135e-05, 9.3220e-04, 2.1398e-04,3.2954e-08, 9.9865e-01, 2.8895e-08, 2.8820e-07]])

目标检测——LeNet相关推荐

深度学习在遥感图像目标检测中的应用综述
深度学习在遥感图像目标检测中的应用综述 1 人工智能发展 1.1 发展历程 1.2 深度学习的应用 2 深度学习 2.1 机器学习概述 2.2 神经网络模型 2.3 深度学习 2.4 深度学习主要模型 ...
《目标检测蓝皮书》第4篇经典热门网络结构
本专栏将系统性地讲解计算机视觉基础知识.包含第1篇机器学习基础.第2篇深度学习基础.第3篇卷积神经网络.第4篇经典热门网络结构.第5篇目标检测基础.第6篇网络搭建及训练.第7篇模型优化方法及思路.第8 ...
计算机视觉笔记及资料整理(含图像分割、目标检测)
前言 1.简单聊聊: 在我脑海中我能通过这些年听到的技术名词来感受到技术的更新及趋势,这种技术发展有时候我觉得连关注的脚步都赶不上.简单回顾看看,从我能听到的技术名词来感受,最开始耳闻比较多「云计算」 ...
神经网络、目标检测学习
文章目录 1.数学模型 1.感知器算法 2.线性可分与线性不可分 2.多层神经网络后向传播算法(Back Propogation Algorithm) 3.深度学习(卷积神经网络) LeNet Al ...
深度神经网络及目标检测学习笔记
这是一段实时目标识别的演示, 计算机在视频流上标注出物体的类别, 包括人.汽车.自行车.狗.背包.领带.椅子等. 今天的计算机视觉技术已经可以在图片. 视频中识别出大量类别的物体, 甚至可以初步理解图 ...
一种基于深度学习的目标检测提取视频图像关键帧的方法
摘要:针对传统的关键帧提取方法误差率高.实时性差等问题,提出了一种基于深度学习的目标检测提取视频图像关键帧的方法,分类提取列车头部.尾部及车身所在关键帧.在关键帧提取过程中,重点研究了基于SIFT特征 ...
目标检测——卷积神经网路基础知识
目标检测--卷积神经网路基础知识卷积神经网络 LeCun的LeNet(1998)网络结构卷积神经网络发展历史卷积神经网络可以做什么全连接层介绍 BP神经网络通过BP神经网络做车牌数字识别 ...
一、（1）：开题后的内容整体把握--多目标检测综述
学习目标: 1.多目标检测算法模型 2.主流的用于多目标检测的卷积神经网络框架 3.目标检测算法(one-stage.two-stage) 学习内容: 1. 目标检测算法模型有哪些目标检测算法模型有 ...
卷积网络应用于目标检测算法
简介: 目标检测算法作为计算机视觉领域最基本且最具挑战性的任务之一,一直处于研究的热门领域.近年来,随着深度学习和卷积神经网络的兴起,传统的目标检测算法的性能已不能满足现今的指标要求而被基于卷积网络 ...

目标检测——LeNet

demo的流程

目录

1. model.py

1.1 卷积 Conv2d

1.2 池化 MaxPool2d

1.3 Tensor的展平：view()

1.4 全连接 Linear

2. train.py

2.1 导入数据集

数据预处理

数据集介绍

导入、加载训练集

导入、加载测试集

2.2 训练过程

2.3 使用GPU/CPU训练

3. predict.py

目标检测——LeNet相关推荐

最新文章

热门文章

目标检测——LeNet

demo的流程

目录

1. model.py

1.1 卷积 Conv2d

1.2 池化 MaxPool2d

1.3 Tensor的展平：view()

1.4 全连接 Linear

2. train.py

2.1 导入数据集

数据预处理

数据集介绍

导入、加载 训练集

导入、加载 测试集

2.2 训练过程

2.3 使用GPU/CPU训练

3. predict.py

目标检测——LeNet相关推荐

最新文章

热门文章

导入、加载训练集

导入、加载测试集