显示batch和隐式batch

  • implicit batch demo
  • explicit batch demo
  • 对比总结

关于explicit_batch 和 implicit_batch的官方文档参考链接:
Explicit vs Implicit Batch

官网部分描述如下,简而言之,隐式batch中,tensor中没有batch维度的信息,并且所有维度必须是常数。tensorrt保留隐式batch是为了向后兼容。因此新代码不推荐使用隐式batch。

implicit batch demo

下面是一个展示implicit batch使用的demo, network只有一个layer——conv2d,完整代码如下:

#include "NvInfer.h"
#include <iostream>
#include <cuda_runtime_api.h>
#include <vector>
#include <sstream>
#include <assert.h>using namespace nvinfer1;#define DEFAULT_VALUE 1.0class Logger : public ILogger
{public:void log(Severity severity, const char* msg) noexcept override{// suppress info-level messagesif (severity <= Severity::kWARNING)std::cout << msg << std::endl;}
};size_t ProductOfDims(Dims dims) {size_t result = 1;for(size_t i = 0; i < dims.nbDims; i++) {result *= dims.d[i];}return result;
}std::string DimsToStr(Dims dims) {std::stringstream ss;for(size_t i = 0; i < dims.nbDims; i++) {ss << dims.d[i] << " ";}return ss.str();
}int main() {Logger logger;// Create a Network DefinitionIBuilder* builder = createInferBuilder(logger);INetworkDefinition* network = builder->createNetworkV2(0); // implict_batchbuilder->setMaxBatchSize(3);Dims3 input_shape{3, 4, 4};Dims4 filter_shape{1, 3, 2, 2};DimsHW kernel_size{2, 2};DimsHW stride{1, 1};const int exec_batch = 3;Dims4 output_shape {exec_batch, 1, 3, 3};// Add the Input layer to the networkauto input_data = network->addInput("input", DataType::kFLOAT, input_shape);// Add the Convolution layer with hidden layer input nodes, strides and weights for filter and bias.std::vector<float>filter(ProductOfDims(filter_shape), DEFAULT_VALUE);Weights filter_w{DataType::kFLOAT, filter.data(), filter.size()};Weights bias_w{DataType::kFLOAT, nullptr, 0}; // no biasint32_t output_channel = filter_shape.d[0];auto conv2d = network->addConvolutionNd(*input_data, output_channel, kernel_size, filter_w, bias_w);conv2d->setStrideNd(stride);// Add a name for the output of the conv2d layer so that the tensor can be bound to a memory buffer at inference time:conv2d->getOutput(0)->setName("output");{std::cout << "conv2d input tensor dims : [";for(size_t i = 0; i < conv2d->getInput(0)->getDimensions().nbDims; i++ ) {std::cout << conv2d->getInput(0)->getDimensions().d[i] << " "; }std::cout << "]" << std::endl;std::cout << "conv2d output tensor dims : [";for(size_t i = 0; i < conv2d->getOutput(0)->getDimensions().nbDims; i++ ) {std::cout << conv2d->getOutput(0)->getDimensions().d[i] << " "; }std::cout << "]" << std::endl;}// Mark it as the output of the entire network:network->markOutput(*conv2d->getOutput(0));// Building an Engine(optimize the network)IBuilderConfig* config = builder->createBuilderConfig();IHostMemory*  serializedModel = builder->buildSerializedNetwork(*network, *config);IRuntime* runtime = createInferRuntime(logger);ICudaEngine* engine = runtime->deserializeCudaEngine(serializedModel->data(), serializedModel->size());// Prepare input_dataint32_t inputIndex = engine->getBindingIndex("input");int32_t outputIndex = engine->getBindingIndex("output");std::vector<float> input(ProductOfDims(input_shape)*exec_batch, DEFAULT_VALUE);std::vector<float> output(ProductOfDims(output_shape));void *GPU_input_Buffer_ptr;  // a host ptr point to a GPU buffervoid *GPU_output_Buffer_ptr;  // a host ptr point to a GPU buffervoid* buffers[2];cudaMalloc(&GPU_input_Buffer_ptr, sizeof(float)*input.size()); //malloc gpu buffer for inputcudaMalloc(&GPU_output_Buffer_ptr, sizeof(float)*output.size()); //malloc gpu buffer for outputcudaMemcpy(GPU_input_Buffer_ptr, input.data(), input.size()*sizeof(float), cudaMemcpyHostToDevice); // copy input data from cpu to gpubuffers[inputIndex] = static_cast<void*>(GPU_input_Buffer_ptr);buffers[outputIndex] = static_cast<void*>(GPU_output_Buffer_ptr);// Performing InferenceIExecutionContext *context = engine->createExecutionContext();context->execute(3, buffers);// copy result data from gpu to cpucudaMemcpy(output.data(), GPU_output_Buffer_ptr, output.size()*sizeof(float), cudaMemcpyDeviceToHost); // display outputstd::cout << "output shape : " << DimsToStr(output_shape) << "\n";std::cout << "output data : \n";for(auto i : output)std::cout << i << " ";std::cout << std::endl;
}

关键API使用:

INetworkDefinition* network = builder->createNetworkV2(0); // implict_batch
builder->setMaxBatchSize(3);
context->execute(3, buffers);

使用隐式batch时,输入不需要指定batch信息,Dims3 input_shape{3, 4, 4}, 在执行的时候使用 execute(3, buffers)指定。

explicit batch demo

作为类比,explicit demo完整代码如下:

#include "NvInfer.h"
#include <iostream>
#include <cuda_runtime_api.h>
#include <vector>
#include <sstream>
#include <assert.h>using namespace nvinfer1;#define DEFAULT_VALUE 1.0class Logger : public ILogger
{public:void log(Severity severity, const char* msg) noexcept override{// suppress info-level messagesif (severity <= Severity::kWARNING)std::cout << msg << std::endl;}
};size_t ProductOfDims(Dims dims) {size_t result = 1;for(size_t i = 0; i < dims.nbDims; i++) {result *= dims.d[i];}return result;
}std::string DimsToStr(Dims dims) {std::stringstream ss;for(size_t i = 0; i < dims.nbDims; i++) {ss << dims.d[i] << " ";}return ss.str();
}int main() {Logger logger;// Create a Network DefinitionIBuilder* builder = createInferBuilder(logger);uint32_t flag = 1U << static_cast<uint32_t>(NetworkDefinitionCreationFlag::kEXPLICIT_BATCH);INetworkDefinition* network = builder->createNetworkV2(flag); // explicit_batchDims4 input_shape{3, 3, 4, 4};Dims4 filter_shape{1, 3, 2, 2};DimsHW kernel_size{2, 2};DimsHW stride{1, 1};Dims4 output_shape {3, 1, 3, 3};// Add the Input layer to the networkauto input_data = network->addInput("input", DataType::kFLOAT, input_shape);// Add the Convolution layer with hidden layer input nodes, strides and weights for filter and bias.std::vector<float>filter(ProductOfDims(filter_shape), DEFAULT_VALUE);Weights filter_w{DataType::kFLOAT, filter.data(), filter.size()};Weights bias_w{DataType::kFLOAT, nullptr, 0}; // no biasint32_t output_channel = filter_shape.d[0];auto conv2d = network->addConvolutionNd(*input_data, output_channel, kernel_size, filter_w, bias_w);conv2d->setStrideNd(stride);// Add a name for the output of the conv2d layer so that the tensor can be bound to a memory buffer at inference time:conv2d->getOutput(0)->setName("output");{std::cout << "conv2d input tensor dims : [";for(size_t i = 0; i < conv2d->getInput(0)->getDimensions().nbDims; i++ ) {std::cout << conv2d->getInput(0)->getDimensions().d[i] << " "; }std::cout << "]" << std::endl;std::cout << "conv2d output tensor dims : [";for(size_t i = 0; i < conv2d->getOutput(0)->getDimensions().nbDims; i++ ) {std::cout << conv2d->getOutput(0)->getDimensions().d[i] << " "; }std::cout << "]" << std::endl;}// Mark it as the output of the entire network:network->markOutput(*conv2d->getOutput(0));// Building an Engine(optimize the network)IBuilderConfig* config = builder->createBuilderConfig();IHostMemory*  serializedModel = builder->buildSerializedNetwork(*network, *config);IRuntime* runtime = createInferRuntime(logger);ICudaEngine* engine = runtime->deserializeCudaEngine(serializedModel->data(), serializedModel->size());// Prepare input_dataint32_t inputIndex = engine->getBindingIndex("input");int32_t outputIndex = engine->getBindingIndex("output");std::vector<float> input(ProductOfDims(input_shape), DEFAULT_VALUE);std::vector<float> output(ProductOfDims(output_shape));void *GPU_input_Buffer_ptr;  // a host ptr point to a GPU buffervoid *GPU_output_Buffer_ptr;  // a host ptr point to a GPU buffervoid* buffers[2];cudaMalloc(&GPU_input_Buffer_ptr, sizeof(float)*input.size()); //malloc gpu buffer for inputcudaMalloc(&GPU_output_Buffer_ptr, sizeof(float)*output.size()); //malloc gpu buffer for outputcudaMemcpy(GPU_input_Buffer_ptr, input.data(), input.size()*sizeof(float), cudaMemcpyHostToDevice); // copy input data from cpu to gpubuffers[inputIndex] = static_cast<void*>(GPU_input_Buffer_ptr);buffers[outputIndex] = static_cast<void*>(GPU_output_Buffer_ptr);// Performing InferenceIExecutionContext *context = engine->createExecutionContext();context->executeV2(buffers);// copy result data from gpu to cpucudaMemcpy(output.data(), GPU_output_Buffer_ptr, output.size()*sizeof(float), cudaMemcpyDeviceToHost); // display outputstd::cout << "output shape : " << DimsToStr(output_shape) << "\n";std::cout << "output data : \n";for(auto i : output)std::cout << i << " ";std::cout << std::endl;
}

explicit batch时,input输入需要指定四维,Dims4 input_shape{3, 3, 4, 4}。使用executeV2(buffers) 执行。

对比总结

explicit batch implicit batch
创建方式 createNetworkV2中指定flag(1值), builder->setMaxBatchSize(3) createNetworkV2传入0值
输入shape 四维 三维,没有batch信息
执行方式 executeV2(buffer) execute(batch, buffer)

TensorRT 初探(3)—— explicit_batch vs implicit_batch相关推荐

  1. TensorRT初探——MobileNet_V1

    TensorRT初探--MobileNet_V1 引言 ​ 随着人工智能的发展,深度学习在计算机视觉上的应用越来越广泛,但如何落地实际工程项目的问题也越发受人关注,如何在边缘设备上运行深度学习模型也成 ...

  2. TensorRT学习第一篇:python 中 TensorRT 使用流程之onnx

    python 版本的TensorRT 什么是TensorRT 基本流程 python 中 TensorRT 使用流程之 onnx 1.查询自己电脑的 TensorRT 版本: 2.查询TensorRT ...

  3. 用于ONNX的TensorRT后端

    用于ONNX的TensorRT后端 解析ONNX模型以使用TensorRT执行. 另请参阅TensorRT文档. 有关最近更改的列表,请参见changelog. 支持的TensorRT版本 Maste ...

  4. YOLOV5 v6.1更新 | TensorRT+TPU+OpenVINO+TFJS+TFLite等平台一键导出和部署

    点击上方"3D视觉工坊",选择"星标" 干货第一时间送达 作者丨ChaucerG 来源丨集智书童 yolov5 release 6.1版本增加了TensorRT ...

  5. tensorrt动态输入分辨率尺寸

    本文只有 tensorrt python部分涉动态分辨率设置,没有c++的. 目录 pytorch转onnx: onnx转tensorrt: python tensorrt推理: 知乎博客也可以参考: ...

  6. TensorRT加速 ——NVIDIA终端AI芯片加速用,可以直接利用caffe或TensorFlow生成的模型来predict(inference)...

    官网:https://developer.nvidia.com/tensorrt 作用:NVIDIA TensorRT™ is a high-performance deep learning inf ...

  7. 【TensorRT】将 PyTorch 转化为可部署的 TensorRT

    文章目录 一.什么是 ONNX 二.PyTorch 转 ONNX 三.什么是 TensorRT 四.ONNX 转 TensorRT 在深度学习模型落地的过程中,会面临将模型部署到边端设备的问题,模型训 ...

  8. onnx 测试_用于ONNX的TensorRT后端

    用于ONNX的TensorRT后端 解析ONNX模型以使用TensorRT执行. 另请参阅TensorRT文档. 有关最近更改的列表,请参见changelog. 支持的TensorRT版本 Maste ...

  9. onnx格式转tensorRT

    自己写的onnx转trt的代码. 此处记录几点: 模型的输入须为numpy格式,所以直接从DataLoader取出的数据是不能直接扔进模型的 模型的输入是多个的时候,如输入多张图片时,可以通过下面这种 ...

最新文章

  1. ActiveMQ在C#中的应用
  2. Mac bower install bootstrap bug解决
  3. MySQL提供了以下三种方法用于获取数据库对象的元数据
  4. R,让你的数据分析更简便!
  5. P7011-[CERC2013]Escape【堆,启发式合并】
  6. 区块链 智能合约安全 重入攻击(re-entrancy attack)DAO incident
  7. MySQL 管理表记录
  8. Siamese Network(孪生网络)
  9. 数字图像相关(Digital Image Correlation, DIC)中的非线性优化方法(FA-GN与IC-GN)
  10. 【JavaScript 红宝书】JavaScript 高级教程第 3 版学习笔记
  11. LCD屏幕 ~ 字模提取工具和图片转码工具
  12. [2021时空AI白皮书]时空人工智能:关键技术
  13. 动态圣诞树html,圣诞了,送大家一颗HTML5圣诞树
  14. 科普文——浅析拉卡拉支付安全通道建设
  15. linux 把进程调到前台,【如何将后台运行的程序转到前台来?】
  16. android 数据图标的刷新通知流程
  17. 码蹄集 - MT3029 - 新月轩就餐
  18. 移动端布局-px转vw、vh
  19. 论文实证分析怎么写?
  20. 【BLE】CC2541之动态广播加密数据

热门文章

  1. 利用ip addr 操作网卡的ip
  2. sed删除指定行的上一行
  3. js动态修改html标签属性,通过js动态创建标签,并设置属性方法
  4. word不支持汉字输入--解决方案
  5. C#(三十八)之StreamWriter StreamWriter使用方法及与FileStream类的区别
  6. locust(一)http 接口demo
  7. Color.FromArgb设置颜色
  8. 【DM8】达梦8 DEM部署
  9. 史上最全的NB-IoT知识
  10. 免杀gh0st过瑞星