windows+libtorch+vs2019+yolov5项目部署实践总结

  • 前言
    • 环境配置
    • 环境搭建参考:
    • 给出我的libtorch配置
    • GPU模型 导出 export代码
      • 效果展示
      • 结束

前言

这是本人第一篇博客,只是对近期学习工作的一些总结。主要是利用libtorch对pytorch训练的模型进行部署,之前也是成功使用pyinstaller将整个python项目进行打包成exe,但是不满足对方的需求才使用的libtorch。

环境配置

vs2019+opencv4.5+libtorch1.7.1:
1.vs2019下载:链接: link
2. opencv官网:链接: link
3. Libtorch下载:链接: link推荐下载release版本(版本需要与训练模型pytorch版本符合,cuda版本需要相符)

环境搭建参考:

https://blog.csdn.net/weixin_44936889/article/details/111186818
https://blog.csdn.net/zzz_zzz12138/article/details/109138805
https://blog.csdn.net/wenghd22/article/details/112512231
vs2019和opencv配置过程参考链接;
https://blog.csdn.net/sophies671207/article/details/89854368

给出我的libtorch配置

给出我的libtorch配置过程:新建空项目 新建main.cpp文件
1、新建项目->属性->VC++目录->包含目录

2新建项目->属性->VC++目录->库目录

3新建项目->属性->C/C++目录->常规->附加包含目录

4新建项目->属性->C/C++目录->常规->SDL检查 :否

5新建项目->属性->连接器->输入->附加依赖项:写入以下
E:\opencv\build\x64\vc15\lib\opencv_world450.lib
c10.lib
asmjit.lib
c10_cuda.lib
caffe2_detectron_ops_gpu.lib
caffe2_module_test_dynamic.lib
caffe2_nvrtc.lib
clog.lib
cpuinfo.lib
dnnl.lib
fbgemm.lib
libprotobuf.lib
libprotobuf-lite.lib
libprotoc.lib
mkldnn.lib
torch.lib
torch_cuda.lib
torch_cpu.lib
kernel32.lib
user32.lib
gdi32.lib
winspool.lib
comdlg32.lib
advapi32.lib
shell32.lib
ole32.lib
oleaut32.lib
uuid.lib
odbc32.lib
odbccp32.lib

6新建项目->属性->连接器->命令行:输入/INCLUDE:?warp_size@cuda@at@@YAHXZ

7新建项目->属性->C/C++目录->语言->符合模式 :否

配置好了以上环境,打包好的文件夹如下图:

权重文件:官方权重导出固定尺度模型即可 直接运行main.cpp即可。采用的samples文件夹下的图片进行测试。
main.cpp代码 十六行使用GPU时需注意


#include <opencv2/opencv.hpp>
#include <torch/script.h>
#include <torch/torch.h>
#include <algorithm>
#include <iostream>
#include <time.h>std::vector<torch::Tensor> non_max_suppression(torch::Tensor preds, float score_thresh = 0.01, float iou_thresh = 0.35)
{std::vector<torch::Tensor> output;for (size_t i = 0; i < preds.sizes()[0]; ++i){torch::Tensor pred = preds.select(0, i);//GPU推理结果为cuda数据类型,nms之前要转成cpu,否则会报错pred = pred.to(at::kCPU); //增加到函数里pred = pred.to(at::kCPU); 注意preds的数据类型,转成cpu进行后处理。// Filter by scorestorch::Tensor scores = pred.select(1, 4) * std::get<0>(torch::max(pred.slice(1, 5, pred.sizes()[1]), 1));pred = torch::index_select(pred, 0, torch::nonzero(scores > score_thresh).select(1, 0));if (pred.sizes()[0] == 0) continue;// (center_x, center_y, w, h) to (left, top, right, bottom)pred.select(1, 0) = pred.select(1, 0) - pred.select(1, 2) / 2;pred.select(1, 1) = pred.select(1, 1) - pred.select(1, 3) / 2;pred.select(1, 2) = pred.select(1, 0) + pred.select(1, 2);pred.select(1, 3) = pred.select(1, 1) + pred.select(1, 3);// Computing scores and classesstd::tuple<torch::Tensor, torch::Tensor> max_tuple = torch::max(pred.slice(1, 5, pred.sizes()[1]), 1);pred.select(1, 4) = pred.select(1, 4) * std::get<0>(max_tuple);pred.select(1, 5) = std::get<1>(max_tuple);torch::Tensor  dets = pred.slice(1, 0, 6);torch::Tensor keep = torch::empty({ dets.sizes()[0] });torch::Tensor areas = (dets.select(1, 3) - dets.select(1, 1)) * (dets.select(1, 2) - dets.select(1, 0));std::tuple<torch::Tensor, torch::Tensor> indexes_tuple = torch::sort(dets.select(1, 4), 0, 1);torch::Tensor v = std::get<0>(indexes_tuple);torch::Tensor indexes = std::get<1>(indexes_tuple);int count = 0;while (indexes.sizes()[0] > 0){keep[count] = (indexes[0].item().toInt());count += 1;// Computing overlapstorch::Tensor lefts = torch::empty(indexes.sizes()[0] - 1);torch::Tensor tops = torch::empty(indexes.sizes()[0] - 1);torch::Tensor rights = torch::empty(indexes.sizes()[0] - 1);torch::Tensor bottoms = torch::empty(indexes.sizes()[0] - 1);torch::Tensor widths = torch::empty(indexes.sizes()[0] - 1);torch::Tensor heights = torch::empty(indexes.sizes()[0] - 1);for (size_t i = 0; i < indexes.sizes()[0] - 1; ++i){lefts[i] = std::max(dets[indexes[0]][0].item().toFloat(), dets[indexes[i + 1]][0].item().toFloat());tops[i] = std::max(dets[indexes[0]][1].item().toFloat(), dets[indexes[i + 1]][1].item().toFloat());rights[i] = std::min(dets[indexes[0]][2].item().toFloat(), dets[indexes[i + 1]][2].item().toFloat());bottoms[i] = std::min(dets[indexes[0]][3].item().toFloat(), dets[indexes[i + 1]][3].item().toFloat());widths[i] = std::max(float(0), rights[i].item().toFloat() - lefts[i].item().toFloat());heights[i] = std::max(float(0), bottoms[i].item().toFloat() - tops[i].item().toFloat());}torch::Tensor overlaps = widths * heights;// FIlter by IOUstorch::Tensor ious = overlaps / (areas.select(0, indexes[0].item().toInt()) + torch::index_select(areas, 0, indexes.slice(0, 1, indexes.sizes()[0])) - overlaps);indexes = torch::index_select(indexes, 0, torch::nonzero(ious <= iou_thresh).select(1, 0) + 1);}keep = keep.toType(torch::kInt64);output.push_back(torch::index_select(dets, 0, keep.slice(0, 0, count)));}return output;
}#include <torch/script.h>
#include <iostream>
#include <memory>
//int main(int argc, const char* argv[]) {
//    std::cout << "cuda::is_available():" << torch::cuda::is_available() << std::endl;
//    torch::DeviceType device_type = at::kCPU; // 定义设备类型
//    if (torch::cuda::is_available())
//        device_type = at::kCUDA;
//}int main(int argc, char* argv[])
{std::cout << "cuda::is_available():" << torch::cuda::is_available() << std::endl;torch::DeviceType device_type = at::kCPU; // 定义设备类型if (torch::cuda::is_available())device_type = at::kCUDA;// Loading  Moduletorch::jit::script::Module module = torch::jit::load("yolov5x.torchscript.pt");//best.torchscript3.pt//yolov5x.torchscript.ptmodule.to(device_type); // 模型加载至GPUstd::vector<std::string> classnames;std::ifstream f("coco.names");std::string name = "";while (std::getline(f, name)){classnames.push_back(name);}if (argc < 2){std::cout << "Please run with test video." << std::endl;return -1;}std::string video = argv[1];cv::VideoCapture cap = cv::VideoCapture(video);// cap.set(cv::CAP_PROP_FRAME_WIDTH, 1920);// cap.set(cv::CAP_PROP_FRAME_HEIGHT, 1080);cv::Mat frame, img;cap.read(frame);int width = frame.size().width;int height = frame.size().height;int count = 0;while (cap.isOpened()){count++;clock_t start = clock();cap.read(frame);if (frame.empty()){std::cout << "Read frame failed!" << std::endl;break;}// Preparing input tensorcv::resize(frame, img, cv::Size(640, 640));// cv::cvtColor(img, img, cv::COLOR_BGR2RGB);// torch::Tensor imgTensor = torch::from_blob(img.data, {img.rows, img.cols,3},torch::kByte);// imgTensor = imgTensor.permute({2,0,1});// imgTensor = imgTensor.toType(torch::kFloat);// imgTensor = imgTensor.div(255);// imgTensor = imgTensor.unsqueeze(0);// imgTensor = imgTensor.to(device_type);cv::cvtColor(img, img, cv::COLOR_BGR2RGB);  // BGR -> RGBimg.convertTo(img, CV_32FC3, 1.0f / 255.0f);  // normalization 1/255auto imgTensor = torch::from_blob(img.data, { 1, img.rows, img.cols, img.channels() }).to(device_type);imgTensor = imgTensor.permute({ 0, 3, 1, 2 }).contiguous();  // BHWC -> BCHW (Batch, Channel, Height, Width)std::vector<torch::jit::IValue> inputs;inputs.emplace_back(imgTensor);// preds: [?, 15120, 9]torch::jit::IValue output = module.forward(inputs);auto preds = output.toTuple()->elements()[0].toTensor();// torch::Tensor preds = module.forward({ imgTensor }).toTensor();std::vector<torch::Tensor> dets = non_max_suppression(preds, 0.35, 0.5);if (dets.size() > 0){// Visualize resultfor (size_t i = 0; i < dets[0].sizes()[0]; ++i){float left = dets[0][i][0].item().toFloat() * frame.cols / 640;float top = dets[0][i][1].item().toFloat() * frame.rows / 640;float right = dets[0][i][2].item().toFloat() * frame.cols / 640;float bottom = dets[0][i][3].item().toFloat() * frame.rows / 640;float score = dets[0][i][4].item().toFloat();int classID = dets[0][i][5].item().toInt();cv::rectangle(frame, cv::Rect(left, top, (right - left), (bottom - top)), cv::Scalar(0, 255, 0), 2);cv::putText(frame,classnames[classID] + ": " + cv::format("%.2f", score),cv::Point(left, top),cv::FONT_HERSHEY_SIMPLEX, (right - left) / 200, cv::Scalar(0, 255, 0), 2);}}// std::cout << "-[INFO] Frame:" <<  std::to_string(count) << " FPS: " + std::to_string(float(1e7 / (clock() - start))) << std::endl;std::cout << "-[INFO] Frame:" << std::to_string(count) << std::endl;// cv::putText(frame, "FPS: " + std::to_string(int(1e7 / (clock() - start))),//     cv::Point(50, 50),//     cv::FONT_HERSHEY_SIMPLEX, 1, cv::Scalar(0, 255, 0), 2);cv::imshow("", frame);// cv::imwrite("../images/"+cv::format("%06d", count)+".jpg", frame);cv::resize(frame, frame, cv::Size(width, height));if (cv::waitKey(1) == 27) break;}cap.release();return 0;
}

GPU模型 导出 export代码

注意修改导出模型尺度。

"""Exports a YOLOv5 *.pt model to ONNX and TorchScript formatsUsage:$ export PYTHONPATH="$PWD" && python models/export.py --weights ./weights/yolov5s.pt --img 640 --batch 1
"""import argparse
import sys
import timesys.path.append('./')  # to run '$ python *.py' files in subdirectoriesimport torch
import torch.nn as nnimport models
from models.experimental import attempt_load
from utils.activations import Hardswish, SiLU
from utils.general import set_logging, check_img_sizeif __name__ == '__main__':parser = argparse.ArgumentParser()parser.add_argument('--weights', type=str, default='E:\\yolov5-master\\runs\\train\\exp\weights\\best.pt', help='weights path')  # from yolov5/models/parser.add_argument('--img-size', nargs='+', type=int, default=[352, 640], help='image size')  # height, widthparser.add_argument('--batch-size', type=int, default=1, help='batch size')opt = parser.parse_args()opt.img_size *= 2 if len(opt.img_size) == 1 else 1  # expandprint(opt)set_logging()t = time.time()# Load PyTorch modelmodel = attempt_load(opt.weights, map_location=torch.device('cuda'))  # load FP32 modellabels = model.names# Checksgs = int(max(model.stride))  # grid size (max stride)opt.img_size = [check_img_size(x, gs) for x in opt.img_size]  # verify img_size are gs-multiples# Inputimg = torch.zeros(opt.batch_size, 3, *opt.img_size).to(device='cuda')# image size(1,3,320,192) iDetection# Update modelfor k, m in model.named_modules():m._non_persistent_buffers_set = set()  # pytorch 1.6.0 compatibilityif isinstance(m, models.common.Conv):  # assign export-friendly activationsif isinstance(m.act, nn.Hardswish):m.act = Hardswish()elif isinstance(m.act, nn.SiLU):m.act = SiLU()# elif isinstance(m, models.yolo.Detect):#     m.forward = m.forward_export  # assign forward (optional)#model.model[-1].export = True  # set Detect() layer export=Truemodel.model[-1].export = Falsey = model(img)  # dry run# TorchScript exporttry:print('\nStarting TorchScript export with torch %s...' % torch.__version__)f = opt.weights.replace('.pt', '.torchscript.pt')  # filenamets = torch.jit.trace(model, img)ts.save(f)print('TorchScript export success, saved as %s' % f)except Exception as e:print('TorchScript export failure: %s' % e)# ONNX exporttry:import onnxprint('\nStarting ONNX export with onnx %s...' % onnx.__version__)f = opt.weights.replace('.pt', '.onnx')  # filenametorch.onnx.export(model, img, f, verbose=False, opset_version=12, input_names=['images'],output_names=['classes', 'boxes'] if y is None else ['output'])# Checksonnx_model = onnx.load(f)  # load onnx modelonnx.checker.check_model(onnx_model)  # check onnx model# print(onnx.helper.printable_graph(onnx_model.graph))  # print a human readable modelprint('ONNX export success, saved as %s' % f)except Exception as e:print('ONNX export failure: %s' % e)# CoreML exporttry:import coremltools as ctprint('\nStarting CoreML export with coremltools %s...' % ct.__version__)# convert model from torchscript and apply pixel scaling as per detect.pymodel = ct.convert(ts, inputs=[ct.ImageType(name='image', shape=img.shape, scale=1 / 255.0, bias=[0, 0, 0])])f = opt.weights.replace('.pt', '.mlmodel')  # filenamemodel.save(f)print('CoreML export success, saved as %s' % f)except Exception as e:print('CoreML export failure: %s' % e)# Finishprint('\nExport complete (%.2fs). Visualize with https://github.com/lutzroeder/netron.' % (time.time() - t))

效果展示

测试结果:

视频测试:

2021-01-20 14-31-49

libtorch+yolov5

另一个视频链接: link

结束

自己训练的模型效果就不展示了,感谢大家观看,请多多关注,共同学习进步!

###问题 关于forward耗时很长的问题 其实早在我批处理多张图片的时候发现了 如下图:前两张很慢 后面正常。自己查过也在GitHub咨询过 原因可能是libtorch1.7.1版本存在warm up的问题。

GitHub:
Gitee:

libtorch+yolov5相关推荐

  1. libtorch+YOLOV5配置踩坑记

    电脑配置: CPU:Intel i7-10750H 内存:16G 显卡:GeForce GTX 1650 Ti(4GB显存) 操作系统:Windows 10 家庭中文版 CUDA:10.2 CUDNN ...

  2. windows下使用libtorch对yolov5模型重构(CPU和GPU双版本)

    首先是对项目的环境配置 win10 libtorch1.6 debug版本 使用release或者gpu版本的自己设置就可以 opencv4.1 libtorch下载网址 https://downlo ...

  3. windows下基于libtorch的yolov5 6.0的c++部署

    windows下基于libtorch的yolov5 6.0的c++部署 1.概述 libtorch是pytorch的C++版本,在需要多进程.提高推理速度等需求下会比python语言更具有优势.本文根 ...

  4. tensorrt部署YOLOv5模型记录【附代码,支持视频检测】

    训练出来的模型最终都需要进行工业部署,现今部署方案有很多,tensorflow和pytorch官方也都有发布,比如现在pytorch有自己的Libtorch进行部署[可以看我另一篇文章有讲利用Libt ...

  5. yolov5目标检测神经网络——损失函数计算原理

    前面已经写了4篇关于yolov5的文章,链接如下: 1.基于libtorch的yolov5目标检测网络实现--COCO数据集json标签文件解析 2.基于libtorch的yolov5目标检测网络实现 ...

  6. Libtorch的介绍与使用方法

    Libtorch的介绍与使用方法 1.libtorch是什么 2.libtorch如何下载 3.libtorch在windows下如何使用 4.libtorch推理YOLOv5的例子 5.libtor ...

  7. YOLO系列梳理(三)YOLOv5

    前言 YOLOv5 是在 YOLOv4 出来之后没多久就横空出世了.今天笔者介绍一下 YOLOv5 的相关知识.目前 YOLOv5 发布了新的版本,6.0版本.在这里,YOLOv5 也在5.0基础上集 ...

  8. 深度学习目标检测YOLOV5的主要过程梳理和工程部署的具体手段

    一.YOLOV5训练数据基本格式以及格式的相互转换(labimg标注数据格式) 1.1 VOC数据格式 基本图片数据集,每张图片对应的.xml标注文件,类别classes.txt文件. 其中.xml的 ...

  9. c++读取yolov5模型进行目标检测(读取摄像头实时监测)

    文章介绍 本文是篇基于yolov5模型的一个工程,主要是利用c++将yolov5模型进行调用并测试,从而实现目标检测任务 任务过程中主要重点有两个,第一 版本问题,第二配置问题 一,所需软件及版本 训 ...

最新文章

  1. SQL中distinct的用法
  2. 游戏行业两大核心问题:数据挖掘与安全
  3. ISLR_ANOVA
  4. nginx源码编译和集群及高可用
  5. ztree实现左边动态生成树,右边为具体信息功能
  6. ArduinoIDE安装与配置与第一个程序的烧录和运行——人人都能玩硬件
  7. linux手动注入网络数据_Linux网络 - 数据包的接收过程
  8. java调用outlook
  9. 200919阶段一C++STL容器
  10. Redis Cluster 集群模式原理和动态扩容
  11. BZOJ 2720 [Violet 5]列队春游 ——期望DP
  12. win10虚拟机下载安装安全狗(Apache版本)
  13. 逆置/反转单链表(C语言)
  14. dataframe drop_Pandas数据结构Series和DataFrame基础详解
  15. 任正非:华为不会拆分;以色列公司称可解锁所有 iOS 设备;Java 13 要来了! | 极客头条...
  16. 利用单壁路由实现vlan间路由
  17. Cocos2dx游戏开发系列笔记8:开搞一个射击游戏《战神传说》//就个打飞机的
  18. Windows10 快捷方式失效
  19. 市场上的安防摄像头或安防系统多少钱?安防摄像头价格指南
  20. [易飞]信息传递-多表(含外表)关联取值

热门文章

  1. LAMDA隶属于 计算机软件新技术国家重点实验室 和 南京大学计算机科学与技术系...
  2. SpriteAtlas 使用小结
  3. 【原创】EXCEL颜色相关操作:改变单元格颜色,获得颜色值,按颜色筛选等
  4. x10ti怎么禁用核显_顶级配置畅玩光追游戏,机械革命X10Ti-S游戏本快速体验
  5. e3d教程做logo教程_设计教程|如何快速做动物LOGO
  6. 查询准考证电脑上显示暂无信息
  7. Vue 2.x 使用高德地图JS API 2.0加载起点终点路径轨迹
  8. Centos7软RAID配置
  9. (一)IntelliJ IDEA的安装、配置
  10. Maya导入DDS贴图出现显示问题