从零开始--搭建语义分割模型--部署与量化（实践）

一、概述：

本文章分为以下几个部分：数据获取，模型搭建与训练，模型导出与部署，模型量化数据集采用Stanford Background Dataset | Kaggle 下载完数据集解压之后会出现：

下边会用到的是 archive/images, archive/labels_colored, archive/labels_class_dict.csv 注意：本文的目的是快速实现一个可用的模型训练和部署方案，暂不考虑代码的工程化和完整化，关于每个部分可以更改的地方都很多，有需要的朋友自行DIY即可。

二、模型搭建与训练：

1.模型搭建

net.py整体架构采用的是ResNet18+FPN的编解码结构实现, 主模型为SegModel，SegModel2是为了在后边导出时将一部分后处理工作添加到模型里，减少其他部门开发人员的工作量：

import torch
from torch import nn, Tensor
from torch.nn import functional as F

class BasicBlock(nn.Module):expansion: int = 1def __init__(self, c1, c2, s=1, downsample= None) -> None:super().__init__()self.conv1 = nn.Conv2d(c1, c2, 3, s, 1, bias=False)self.bn1 = nn.BatchNorm2d(c2)self.conv2 = nn.Conv2d(c2, c2, 3, 1, 1, bias=False)self.bn2 = nn.BatchNorm2d(c2)self.downsample = downsample
def forward(self, x: Tensor) -> Tensor:identity = xout = F.relu(self.bn1(self.conv1(x)))out = self.bn2(self.conv2(out))if self.downsample is not None: identity = self.downsample(x)out += identityreturn F.relu(out)

class ResNet18(nn.Module):def __init__(self) -> None:super().__init__()
self.depths = [2, 2, 2, 2]self.channels = [64, 128, 256, 512]self.inplanes = 64self.conv1 = nn.Conv2d(3, self.inplanes, 7, 2, 3, bias=False)self.bn1 = nn.BatchNorm2d(self.inplanes)self.maxpool = nn.MaxPool2d(3, 2, 1)
self.layer1 = self._make_layer(BasicBlock, 64, 2, s=1)self.layer2 = self._make_layer(BasicBlock, 128, 2, s=2)self.layer3 = self._make_layer(BasicBlock, 256, 2, s=2)self.layer4 = self._make_layer(BasicBlock, 512, 2, s=2)
def _make_layer(self, block, planes, depth, s=1) -> nn.Sequential:downsample = Noneif s != 1 or self.inplanes != planes * block.expansion:downsample = nn.Sequential(nn.Conv2d(self.inplanes, planes * block.expansion, 1, s, bias=False),nn.BatchNorm2d(planes * block.expansion))layers = nn.Sequential(block(self.inplanes, planes, s, downsample),*[block(planes * block.expansion, planes) for _ in range(1, depth)])self.inplanes = planes * block.expansionreturn layers
def forward(self, x: Tensor) -> Tensor:x = self.maxpool(F.relu(self.bn1(self.conv1(x))))  x1 = self.layer1(x)   x2 = self.layer2(x1) x3 = self.layer3(x2)  x4 = self.layer4(x3)  return x1, x2, x3, x4

class ConvModule(nn.Sequential):def __init__(self, c1, c2, k, s=1, p=0, d=1, g=1):super().__init__(nn.Conv2d(c1, c2, k, s, p, d, g, bias=False),nn.BatchNorm2d(c2),nn.ReLU(True))

class FPNHead(nn.Module):
def __init__(self, in_channels, channel=128, num_classes=10):super().__init__()self.lateral_convs = nn.ModuleList([])self.output_convs = nn.ModuleList([])
for ch in in_channels[::-1]:self.lateral_convs.append(ConvModule(ch, channel, 1))self.output_convs.append(ConvModule(channel, channel, 3, 1, 1))
self.conv_seg = nn.Conv2d(channel, num_classes, 1)self.dropout = nn.Dropout2d(0.1)
def forward(self, features) -> Tensor:features = features[::-1]out = self.lateral_convs[0](features[0])for i in range(1, len(features)):out = F.interpolate(out, scale_factor=2.0, mode='nearest')out = out + self.lateral_convs[i](features[i])out = self.output_convs[i](out)out = self.conv_seg(self.dropout(out))return out


class SegModel(nn.Module):def __init__(self) -> None:super().__init__()self.backbone = ResNet18()self.head = FPNHead(in_channels=[64, 128, 256, 512],num_classes=9)def forward(self, x):feature = self.backbone(x)out = self.head(feature)output = F.interpolate(out, size=x.shape[-2:], mode='bilinear', align_corners=False)return output
class SegModel2(nn.Module):def __init__(self) -> None:super().__init__()self.backbone = ResNet18()self.head = FPNHead(in_channels=[64, 128, 256, 512],num_classes=9)def forward(self, x):feature = self.backbone(x)out = self.head(feature)output = F.interpolate(out, size=x.shape[-2:], mode='bilinear', align_corners=False)
y = torch.argmax(output, dim=1)  y = torch.squeeze(y)return y

if __name__ == '__main__':model = SegModel()model = SegModel2()x = torch.randn(1,3,256,320)print(model(x).shape)

2.数据处理

(1)关于原始数据的处理(process_data.py)

原始的mask数据采用的8位伪彩图，本模型训练需要的是单通道类别图，所以需要将原本的archive/labels_colored文件夹下的图像进行转换。核心代码为colorMask_2_grayMask，这个函数会将彩色maks转为看起来全黑(数字表较小)的灰度mask：

def colorMask_2_grayMask(color_mask_path, sv_gray_path,all_colors):color_to_num_dic = {all_colors[k][0]:all_colors[k][-1] for k in all_colors.keys()}color = cv2.imread(color_mask_path)color = cv2.cvtColor(color, cv2.COLOR_BGR2RGB)zeros = np.zeros(shape=(color.shape[0],color.shape[1]),dtype=np.uint8)
for rgb in color_to_num_dic.keys():mask = np.all(color==np.array(rgb).reshape(1, 1, 3), axis=2)zeros[mask] = color_to_num_dic[rgb]
cv2.imwrite(sv_gray_path, zeros)

转换前后的效果如下，训练用的是右边的：

process_data.py运行完成之后会gray_mask文件夹下生成每个彩色掩码图对应的灰度掩码图，并在项目路径下生成train.txt和test.txt，文件每行内容为（rgb图路径,灰色mask路径,彩色mask路径）

(2)dataset设计(dataload.py)

主类为SegData,在getitem的时候之所以要对输入和label进行resize的原因是数据集中存在尺寸不一致的问题

class SegData(Dataset):def __init__(self, train_file_path, val_file_path, data_root, train=True):super().__init__()if train:self.files = open(train_file_path,'r')else:self.files  = open(val_file_path,'r')
self.train_data = []self.val_data = []
self.data=[]for line in self.files.readlines():img, label = line.strip().split(',')[0:2]img_path = os.path.join(data_root, img)label_path = os.path.join(data_root, label)
self.data.append([img_path,label_path])
self.x_transform =  transforms.Compose([transforms.ToTensor(),  # -> [0,1]transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5]) ])

def __len__(self,):return len(self.data)
def __getitem__(self, index):x_path, y_path = self.data[index]

img_x = Image.open(x_path)    img_y = Image.open(y_path)
img_x = img_x.resize((320,256),Image.NEAREST)img_y = img_y.resize((320,256),Image.NEAREST)
img_x = self.x_transform(img_x)
img_y = np.array(img_y) # PIL -> ndarryimg_y = torch.from_numpy(img_y).long() return img_x, img_y

3.训练文件

训练文件没什么需要注意的地方，直接运行即可，详细内容见train.py 模型训练完毕之后会保存一个权重文件

三、模型的导出：

导出文件分为onnx导出，coreml导出，mnn导出，torchScript导出

1.ONNX模型导出

onnx导出可以直接用torch.onnx.export的函数

def export_onnx(torch_model, example_input):torch.onnx.export(torch_model, example_input, "exported_models/SegModel.onnx", verbose=True, input_names=['input'], output_names=['output'])

2.coreml导出

在导出mlmodel的时候需要可以将预处理的一部分添加到模型里边，注意在coremltools中的数据预处理和pytorch中的数据预处理是有区别的，具体内容请自行查阅文末的参考资料：

3.MNN导出

此处采用.onnx-->.mnn命令行转换的形式，python版本和C++版本在转换参数上是一致的 MNN库的简要编译过程如下，具体内容参阅官方文档编译主库：

cd /path/to/MNN
./schema/generate.sh
mkdir build
cd build
cmake ..
make -j8

主库编译完成之后会生成以下文件：

编译模型转换工具（以ONNX为例）：

cd build
cmake .. -DMNN_BUILD_CONVERTER=ON -DMNN_BUILD_TORCH=ON
cd tools/converter
make -j8

得到可执行文件之后可以对模型进行文件转换(export_mnn.sh)（以ONNX为例）：其中： --modelFile 表示onnx的路径 --MNNModel 表示生成的mnn文件的路径 C++版本：

./MNNConvert -f ONNX \
--modelFile exported_models/SegModel.onnx \
--MNNModel exported_models/SegModel.mnn  \
--bizCode biz

python版本（需要pip install MNN）:

mnnconvert -f ONNX \
--modelFile exported_models/SegModel.onnx \
--MNNModel exported_models/SegModel.mnn  \
--bizCode biz

4.torchScript导出

没啥可说的，一行代码：

torch.jit.save (traced_model, "exported_models/SegModel.pt")

四、模型预测

经过上述的文件转换之后可以得到不同平台下的预测文件，这时候将转换后的模型交给专门的部署人员可以放在移动端上或者其他平台人员进行部署了，还差一步，那么当他们得到了转换后的文件应该怎样使用呢？模型的训练通常使用的是python, 而如果进行部署的话通常使用的是其他语言，但是每个部署框架在其自身不同的语言环境下流程基本不会相差太大。模型只是起到输出网络推理结果的作用，仅仅这些还不足以让开发人人员知道如何正常使用的。输入图像如何进行预处理，输出结果如何进行后处理设置都是需要注意的问题，此时如果我们能够提供不同平台下的预测文件供参考，就可以大大减少开发人员的工作量。下边提供不同框架下的python预测脚本：

1.PyTorch和ONNX预测

这两个比较简单，详见：iner_torch.py和infer_onnx.py

2.coreml预测

因为之前已经将预处理包含在了转换的模型里边（不含缩放），所以此处只对输入图像进行了缩放，详见infer_coreml.py：

3.mnn预测

对应于pytorch的预处理有两种方式，一种是使用外部库设置如下，另一种是使用MNN单库：

(1)使用外部库

def get_input(img_path):
image = Image.open(img_path)print(image.size)
image = crop(image)image = np.array(image)image = image / 255.0image = image - (0.5, 0.5, 0.5)image = image / (0.5, 0.5, 0.5)
image = image.transpose((2, 0, 1))image = image.astype(np.float32)
tmp_input = MNN.Tensor((1, 3, 256,320), MNN.Halide_Type_Float,image, MNN.Tensor_DimensionType_Caffe)
return tmp_input

(2)使用MNN

def get_input2(img_path):import MNN.cv as cvimport MNN.numpy as npimport MNN.expr as exprimg = cv.imread(img_path,cv.COLOR_BGR2RGB)
img = img / 255.0img = (img - 0.5) / 0.5
imgf = img.astype(np.float32)imgf_batch = np.expand_dims(imgf, 0)input_var = expr.convert(imgf_batch, expr.NCHW)input_var = MNN.Tensor(input_var)return input_var

总结：大部分框架进行预测，会分为四个部分： 1.加载模型 2.数据预处理 3.获取推理结果 4.推理结果后处理

五、模型压缩量化

如果模型比较大直接部署到移动端，移动端负载会过大，此时就需要对模型进行压缩了，关于模型压缩的内容很多，有机会我重新介绍的。下边主要介绍MNN和CoreML 训练后量化的实践部分。

1.CoreML压缩：

量化后模型大小缩小一半,预测时间可能是因为模型本来就太小的原因，略有减小但差别不大

Quantizing to float 16, which reduces by half the model's disk size, is the safest quantization option since it generally does not affect the model's accuracy

import coremltools as ct
from coremltools.models.neural_network import quantization_utils

# load full precision model
model_fp32 = ct.models.MLModel('exported_models/SegModel.mlmodel')


'''Quantizing to float 16, which reduces by half the model's disk size,
is the safest quantization option since it generally
does not affect the model's accuracy:'''
model_fp16 = quantization_utils.quantize_weights(model_fp32, nbits=16)
model_fp16.save('exported_models/SegModel_fp16.mlmodel')


# quantize to 8 bit using linear mode
model_8bit = quantization_utils.quantize_weights(model_fp32, nbits=8)
model_8bit.save("exported_models/SegModel_bit8_linear.mlmodel")
print("linear success")

# quantize to 8 bit using LUT kmeans mode
model_8bit = quantization_utils.quantize_weights(model_fp32, nbits=8,quantization_mode="kmeans")
model_8bit.save("exported_models/SegModel_kmeans.mlmodel")
print("kmeans success")


# quantize to 8 bit using linearsymmetric mode
model_8bit = quantization_utils.quantize_weights(model_fp32, nbits=8,quantization_mode="linear_symmetric")
model_8bit.save("exported_models/SegModel_linear_symmetric.mlmodel")
print("linear-symmetric success")

coreml使用上述方式在进行模型fp16量化后，模型存储减小1/2，当前模型预测时间基本不变 coreml在进行bit8量化后，模型存储减小为原来的1/4 <a name="mxK7R"></a>

2.MNN模型压缩

(1)fp16量化

在进行模型转化的时候添加 -- fp16

mnnconvert -f ONNX \
--modelFile exported_models/SegModel.onnx \
--MNNModel exported_models/SegModel_fp16.mnn  \
--bizCode biz
-- fp16

(2)bit8量化

在模型转化的时候添加 --weightQuantBits 8 MNN可以直接进行模型的量化，保存为fp16类型的权重，权重比未量化前缩小一半，关于推理速度，官网上称在支持float16的设备上推理速度会快一倍，（本地PC端世纪测试不变，不知道是不是参数设置还是硬件的问题）

(3)不同模型对比

关于模型转换前后的大小变化如图：

无论是MNN还是CoreML,在使用上述方式进行FP16量化的时候存储空间都会缩小到1/2，在进行bit8量化的时候存储空间缩小到1/4，效果上FP16和原模型差距不大，bit8会稍稍掉一些精度，时间上PC端差距不大（可能是因为这个模型本来就比较小或者哪里参数没设置正确或者是因为PC不支持的原因）本文项目地址为https://github.com/xbbkok/SegDemo：有需要的朋友自取

六、参考资料：

Core ML Tools Overview 欢迎使用MNN文档 — MNN-Doc 2.1.1 documentation

转换PyTorch模型到CoreML - 知乎

Welcome to PyTorch Tutorials — PyTorch Tutorials 2.0.0+cu117 documentation

GitHub - sithu31296/semantic-segmentation: SOTA Semantic Segmentation Models in PyTorch