Mask RCNN是在Faster_RCNN基础上改进的得到的集检测与分割于一体的网络模型，主要用于目标检测和实例分割，是在Faster RCNN框架上加入了Mask分支进行像素分割。另外Mask R-CNN 也可以应用到人体姿势识别。
Mask RCNN是基于Faster RCNN的可以看一下Faster RCNN。
接下来将会进行swin-transformer，以swin-transformer为主干的mask rcnn的学习。
阅读的源码是facebook的Mask_RCNN，是一个基于Pytorch的代码。

1.Mask RCNN是Faster RCNN的扩展，对于Faster RCNN的每个Proposal Box都要使用FCN进行语义分割。。
2.创新RoI Align代替Faster RCNN中的RoI Pooling。RoI Pooling不是按照像素对齐的（pixel-to-pixel alignment），也许这对bbox的影响不是很大，但对于mask的精度却有很大影响。使用RoI Align后mask的精度从10%显著提高到50%。
3.引入语义分割分支，实现了mask和class预测的关系的解耦，mask分支只做语义分割，类型预测和bbox回归任务交给另一个分支。这与原本的FCN网络是不同的，原始的FCN在预测mask时还用同时预测mask所属的种类。

1.backbone + fpn

如上图所示，最左侧是"由底至顶"采样层(botton-up layers)，中间的是"由顶向底"采样层(top-down layers)，右侧融合不同深度采样层得到最终的多尺度特征层，整个这一块是一个编码-解码结构。
左侧的C1_{C5层为resnet的5个模块，每个模块下采样1/2，每层降采样率是原始图像分辨率的[1/2,1/4,1/8,1/16,1/32]。中间的P5层是经过C5层的1*1卷积得到的，通道数为256，分辨率为32。中间的P4}P1层是由P5层上采样得到的，上采样2倍，分辨率分别为[64,128,256,512]，通道数都为256。右侧的P5～P2在最后进行分类和回归时使用，P6～P2是在RPN中计算proposals时使用。
以下代码给出主体结构，详细部分和使用到的模块请参考源代码github：

# 代码在 maskrcnn-benchmark-main\maskrcnn_benchmark\modeling\backbone\resnet.py处
from collections import namedtupleimport torch
import torch.nn.functional as F
from torch import nnfrom maskrcnn_benchmark.layers import FrozenBatchNorm2d
from maskrcnn_benchmark.layers import Conv2d
from maskrcnn_benchmark.layers import DFConv2d
from maskrcnn_benchmark.modeling.make_layers import group_norm
from maskrcnn_benchmark.utils.registry import RegistryStageSpec = namedtuple("StageSpec",["index",  # Index of the stage, eg 1, 2, ..,. 5"block_count",  # Number of residual blocks in the stage"return_features",  # True => return the last feature map from this stage],
)
#这里以ResNet-50-FPN为例，因此相比原代码只留这一部分
# ResNet-50-FPN (including all stages)
ResNet50FPNStagesTo5 = tuple(StageSpec(index=i, block_count=c, return_features=r)for (i, c, r) in ((1, 3, True), (2, 4, True), (3, 6, True), (4, 3, True))
)class ResNet(nn.Module):def __init__(self, cfg):super(ResNet, self).__init__()# Translate string names to implementations# 使用StemWithFixedBatchNorm是一个 Conv+ bn + relu + maxpooling的基础模块stem_module = _STEM_MODULES[cfg.MODEL.RESNETS.STEM_FUNC]# R-50-FPN 主干结构 resnet50 + fpnstage_specs = _STAGE_SPECS[cfg.MODEL.BACKBONE.CONV_BODY]# 残差转置模块 是 resnet中的Bottleneck模块transformation_module = _TRANSFORMATION_MODULES[cfg.MODEL.RESNETS.TRANS_FUNC]# Construct the stem module 构建StemWithFixedBatchNorm基础模块self.stem = stem_module(cfg)# Constuct the specified ResNet stages #默认1 num_groups = cfg.MODEL.RESNETS.NUM_GROUPS #默认64width_per_group = cfg.MODEL.RESNETS.WIDTH_PER_GROUP # 1*64in_channels = cfg.MODEL.RESNETS.STEM_OUT_CHANNELS # RESNETS输出通道数 256stage2_bottleneck_channels = num_groups * width_per_group # bottleneck输入通道数stage2_out_channels = cfg.MODEL.RESNETS.RES2_OUT_CHANNELS # 每个阶段的输出 用于上图中间和右侧1*1卷积，3*3卷积self.stages = [] # 保存每个阶段的名称 上述最右侧P5-P2self.return_features = {}# 保存每个阶段的feature maps 上述最右侧P5-P2# stage_specs是上边的ResNet50FPNStagesTo5结构 for stage_spec in stage_specs: name = "layer" + str(stage_spec.index) # 每一层的名称stage2_relative_factor = 2 ** (stage_spec.index - 1) # 下一层输入相对stage2_out_channels增大的倍数bottleneck_channels = stage2_bottleneck_channels * stage2_relative_factor # 每次迭代后 新bottleneck的输入out_channels = stage2_out_channels * stage2_relative_factor # 每次迭代后 新bottleneck的输出stage_with_dcn = cfg.MODEL.RESNETS.STAGE_WITH_DCN[stage_spec.index -1] # 是否使用可形性卷积，默认false#见下边_make_stage代码部分module = _make_stage(transformation_module,# bottleneck模块，其余参数在_make_stage的transformation_module解释in_channels,bottleneck_channels,out_channels,stage_spec.block_count,num_groups,cfg.MODEL.RESNETS.STRIDE_IN_1X1,first_stride=int(stage_spec.index > 1) + 1,dcn_config={"stage_with_dcn": stage_with_dcn,"with_modulated_dcn": cfg.MODEL.RESNETS.WITH_MODULATED_DCN,"deformable_groups": cfg.MODEL.RESNETS.DEFORMABLE_GROUPS,}# 没有用到dcn，暂不解释)# 上一个网络模块的输出更新下一个模块的输入in_channels = out_channelsself.add_module(name, module)# 添加到nn.Module中进行前向和反向self.stages.append(name)self.return_features[name] = stage_spec.return_features# 冻结部分层次，根据FREEZE_CONV_BODY_AT参数self._freeze_backbone(cfg.MODEL.BACKBONE.FREEZE_CONV_BODY_AT)def forward(self, x):outputs = []x = self.stem(x)for stage_name in self.stages:x = getattr(self, stage_name)(x)if self.return_features[stage_name]:outputs.append(x)return outputsdef _make_stage(transformation_module,in_channels,bottleneck_channels,out_channels,block_count,num_groups,stride_in_1x1,first_stride,dilation=1,dcn_config={}
):blocks = []stride = first_stride# block_count 每个阶段模块中Bottleneck的个数，分别为c 3 4 6 4 for _ in range(block_count):blocks.append(# 这一块实际执行Bottleneck，细节见Bottlenecktransformation_module(in_channels,#输入通道bottleneck_channels,# bottleneck通道，中作为bottleneck维度不变的那部分通道，具体细节减resnetout_channels,# 模块最终的输出通道num_groups,# 卷积分组stride_in_1x1,# 1*1卷积的跨度stride,dilation=dilation,#膨胀因子，一般用在分割任务处较多，dcn_config=dcn_config))stride = 1in_channels = out_channelsreturn nn.Sequential(*blocks)class Bottleneck(nn.Module):def __init__(self,in_channels,bottleneck_channels,out_channels,num_groups,stride_in_1x1,stride,dilation,norm_func,dcn_config):super(Bottleneck, self).__init__()self.downsample = Noneif in_channels != out_channels: # 若输入等于输出down_stride = stride if dilation == 1 else 1#若if语句满足了，则使用shortcut,就是残差操作使用的特征映射模块self.downsample = nn.Sequential(Conv2d(in_channels, out_channels,kernel_size=1, stride=down_stride, bias=False),norm_func(out_channels),)# 初始化downsample 的权重for modules in [self.downsample,]:for l in modules.modules():if isinstance(l, Conv2d):nn.init.kaiming_uniform_(l.weight, a=1)if dilation > 1:stride = 1 # reset to be 1# The original MSRA ResNet models have stride in the first 1x1 conv# The subsequent fb.torch.resnet and Caffe2 ResNe[X]t implementations have# stride in the 3x3 convstride_1x1, stride_3x3 = (stride, 1) if stride_in_1x1 else (1, stride)# resnet里每个Bottleneck模块的第一个卷积，一般是一个下采样，用1*1卷积self.conv1 = Conv2d(in_channels,bottleneck_channels,kernel_size=1,stride=stride_1x1,bias=False,)self.bn1 = norm_func(bottleneck_channels)# TODO: specify init for the abovewith_dcn = dcn_config.get("stage_with_dcn", False)if with_dcn:deformable_groups = dcn_config.get("deformable_groups", 1)with_modulated_dcn = dcn_config.get("with_modulated_dcn", False)self.conv2 = DFConv2d(bottleneck_channels,bottleneck_channels,with_modulated_dcn=with_modulated_dcn,kernel_size=3,stride=stride_3x3,groups=num_groups,dilation=dilation,deformable_groups=deformable_groups,bias=False)else:# resnet里每个Bottleneck模块的第二个卷积，特征提取唯度不变，用3*3卷积self.conv2 = Conv2d(bottleneck_channels,bottleneck_channels,kernel_size=3,stride=stride_3x3,padding=dilation,bias=False,groups=num_groups,dilation=dilation)nn.init.kaiming_uniform_(self.conv2.weight, a=1)self.bn2 = norm_func(bottleneck_channels)# resnet里每个Bottleneck模块的第二个卷积，一般是一个上采样，用1*1卷积self.conv3 = Conv2d(bottleneck_channels, out_channels, kernel_size=1, bias=False)self.bn3 = norm_func(out_channels)for l in [self.conv1, self.conv3,]:nn.init.kaiming_uniform_(l.weight, a=1)def forward(self, x):identity = xout = self.conv1(x)out = self.bn1(out)out = F.relu_(out)out = self.conv2(out)out = self.bn2(out)out = F.relu_(out)out = self.conv3(out)out = self.bn3(out)# 若shortcut条件满足，则进行残差操作if self.downsample is not None:identity = self.downsample(x)out += identity out = F.relu_(out)return out_TRANSFORMATION_MODULES = Registry({"BottleneckWithFixedBatchNorm": BottleneckWithFixedBatchNorm,"BottleneckWithGN": BottleneckWithGN,
})_STEM_MODULES = Registry({"StemWithFixedBatchNorm": StemWithFixedBatchNorm,"StemWithGN": StemWithGN,
})_STAGE_SPECS = Registry({"R-50-C4": ResNet50StagesTo4,"R-50-C5": ResNet50StagesTo5,"R-101-C4": ResNet101StagesTo4,"R-101-C5": ResNet101StagesTo5,"R-50-FPN": ResNet50FPNStagesTo5,"R-50-FPN-RETINANET": ResNet50FPNStagesTo5,"R-101-FPN": ResNet101FPNStagesTo5,"R-101-FPN-RETINANET": ResNet101FPNStagesTo5,"R-152-FPN": ResNet152FPNStagesTo5,
})

2 RPN

借用大佬画的图链接：

左侧anchors是生成锚框流程，中间构建RPN网络流程，右侧的ProposalLayer是筛选ROIs的生成建议框流程。
首先是RPN的整体部分代码：

class RPNModule(torch.nn.Module):def __init__(self, cfg, in_channels):super(RPNModule, self).__init__()self.cfg = cfg.clone()#生成anchoranchor_generator = make_anchor_generator(cfg)# 模块名为RPNHeadrpn_head = registry.RPN_HEADS[cfg.MODEL.RPN.RPN_HEAD]head = rpn_head(cfg, in_channels, anchor_generator.num_anchors_per_location()[0])rpn_box_coder = BoxCoder(weights=(1.0, 1.0, 1.0, 1.0))# 训练用到，后处理rpn计算的boxbox_selector_train = make_rpn_postprocessor(cfg, rpn_box_coder, is_train=True)# 测试用到box_selector_test = make_rpn_postprocessor(cfg, rpn_box_coder, is_train=False)# 训练用到计算proposal部分的lossesloss_evaluator = make_rpn_loss_evaluator(cfg, rpn_box_coder)self.anchor_generator = anchor_generatorself.head = headself.box_selector_train = box_selector_trainself.box_selector_test = box_selector_testself.loss_evaluator = loss_evaluatordef forward(self, images, features, targets=None):objectness, rpn_box_regression = self.head(features)anchors = self.anchor_generator(images, features)if self.training:return self._forward_train(anchors, objectness, rpn_box_regression, targets)else:return self._forward_test(anchors, objectness, rpn_box_regression)def _forward_train(self, anchors, objectness, rpn_box_regression, targets):if self.cfg.MODEL.RPN_ONLY:boxes = anchorselse:# For end-to-end models, anchors must be transformed into boxes and# sampled into a training batch.with torch.no_grad():# 对roi进行进行非极大值抑制后留下的微调过的框boxes = self.box_selector_train(anchors, objectness, rpn_box_regression, targets)loss_objectness, loss_rpn_box_reg = self.loss_evaluator(anchors, objectness, rpn_box_regression, targets)losses = {"loss_objectness": loss_objectness,"loss_rpn_box_reg": loss_rpn_box_reg,}return boxes, lossesclass RPNHead(nn.Module):"""Adds a simple RPN Head with classification and regression heads"""def __init__(self, cfg, in_channels, num_anchors):"""Arguments:cfg              : configin_channels (int): number of channels of the input featurenum_anchors (int): number of anchors to be predicted"""super(RPNHead, self).__init__()# 一个3*3卷积进行特征提取self.conv = nn.Conv2d(in_channels, in_channels, kernel_size=3, stride=1, padding=1)# 一个1*1卷积分类，类别数num_anchors，是个二分类，表示anchors是背景还是objectself.cls_logits = nn.Conv2d(in_channels, num_anchors, kernel_size=1, stride=1)#一个1*1卷积回归，类别数4 * num_anchors，回归每个anchors的偏置self.bbox_pred = nn.Conv2d(in_channels, num_anchors * 4, kernel_size=1, stride=1)# 初始化权重for l in [self.conv, self.cls_logits, self.bbox_pred]:torch.nn.init.normal_(l.weight, std=0.01)torch.nn.init.constant_(l.bias, 0)def forward(self, x):logits = []bbox_reg = []for feature in x: # 对每个图像进行计算t = F.relu(self.conv(feature))logits.append(self.cls_logits(t))bbox_reg.append(self.bbox_pred(t))return logits, bbox_reg

生成anchors，这里采集的特征层是由第一部分basebone最终产生的rpn_feature_maps，共有5层分别为[P2, P3, P4, P5, P6]，通过配置参数对不同分辨率特征层进行锚框采集。

def make_anchor_generator(config):anchor_sizes = config.MODEL.RPN.ANCHOR_SIZES# (32, 64, 128, 256, 512)锚框尺寸aspect_ratios = config.MODEL.RPN.ASPECT_RATIOS # (0.5, 1.0, 2.0)锚框长宽比anchor_stride = config.MODEL.RPN.ANCHOR_STRIDE # (8, 16, 32, 64, 128)特征层相对原图的下采样倍率下采样倍率也可以理解为该特征层生成的锚框中心点在原图中的间隔straddle_thresh = config.MODEL.RPN.STRADDLE_THRESH # 0# 异常判断条件if config.MODEL.RPN.USE_FPN:assert len(anchor_stride) == len(anchor_sizes), "FPN should have len(ANCHOR_STRIDE) == len(ANCHOR_SIZES)"else:assert len(anchor_stride) == 1, "Non-FPN should have a single ANCHOR_STRIDE"# 生成Anchor的类，见下边anchor_generator = AnchorGenerator(anchor_sizes, aspect_ratios, anchor_stride, straddle_thresh)return anchor_generator
# 生成Anchor的类
class AnchorGenerator(nn.Module):def __init__(self,sizes=(128, 256, 512),aspect_ratios=(0.5, 1.0, 2.0),anchor_strides=(8, 16, 32),straddle_thresh=0,):super(AnchorGenerator, self).__init__()if len(anchor_strides) == 1:anchor_stride = anchor_strides[0]cell_anchors = [# 生成锚框的函数generate_anchors(anchor_stride, sizes, aspect_ratios).float()]else:if len(anchor_strides) != len(sizes):raise RuntimeError("FPN should have #anchor_strides == #sizes")cell_anchors = [generate_anchors(anchor_stride,size if isinstance(size, (tuple, list)) else (size,),aspect_ratios).float()for anchor_stride, size in zip(anchor_strides, sizes)]self.strides = anchor_stridesself.cell_anchors = BufferList(cell_anchors)self.straddle_thresh = straddle_threshdef num_anchors_per_location(self):return [len(cell_anchors) for cell_anchors in self.cell_anchors]def grid_anchors(self, grid_sizes):anchors = []for size, stride, base_anchors in zip(grid_sizes, self.strides, self.cell_anchors):grid_height, grid_width = sizedevice = base_anchors.deviceshifts_x = torch.arange(0, grid_width * stride, step=stride, dtype=torch.float32, device=device)shifts_y = torch.arange(0, grid_height * stride, step=stride, dtype=torch.float32, device=device)shift_y, shift_x = torch.meshgrid(shifts_y, shifts_x)shift_x = shift_x.reshape(-1)shift_y = shift_y.reshape(-1)shifts = torch.stack((shift_x, shift_y, shift_x, shift_y), dim=1)anchors.append((shifts.view(-1, 1, 4) + base_anchors.view(1, -1, 4)).reshape(-1, 4))return anchorsdef add_visibility_to(self, boxlist):image_width, image_height = boxlist.sizeanchors = boxlist.bboxif self.straddle_thresh >= 0:inds_inside = ((anchors[..., 0] >= -self.straddle_thresh)& (anchors[..., 1] >= -self.straddle_thresh)& (anchors[..., 2] < image_width + self.straddle_thresh)& (anchors[..., 3] < image_height + self.straddle_thresh))else:device = anchors.deviceinds_inside = torch.ones(anchors.shape[0], dtype=torch.bool, device=device)boxlist.add_field("visibility", inds_inside)def forward(self, image_list, feature_maps):grid_sizes = [feature_map.shape[-2:] for feature_map in feature_maps]anchors_over_all_feature_maps = self.grid_anchors(grid_sizes)anchors = []for i, (image_height, image_width) in enumerate(image_list.image_sizes):anchors_in_image = []for anchors_per_feature_map in anchors_over_all_feature_maps:boxlist = BoxList(anchors_per_feature_map, (image_width, image_height), mode="xyxy")self.add_visibility_to(boxlist)anchors_in_image.append(boxlist)anchors.append(anchors_in_image)return anchors
def generate_anchors(stride=16, sizes=(32, 64, 128, 256, 512), aspect_ratios=(0.5, 1, 2)
):"""生成(x1, y1, x2, y2)格式的锚框矩阵。锚点以步幅/ 2为中心，具有指定大小的(近似)根号面积和给定的纵横比。"""return _generate_anchors(stride,np.array(sizes, dtype=np.float) / stride,np.array(aspect_ratios, dtype=np.float),)def _generate_anchors(base_size, scales, aspect_ratios):"""Generate anchor (reference) windows by enumerating aspect ratios Xscales wrt a reference (0, 0, base_size - 1, base_size - 1) window."""anchor = np.array([1, 1, base_size, base_size], dtype=np.float) - 1anchors = _ratio_enum(anchor, aspect_ratios) #重新计算 围绕中心点生成anchoranchors = np.vstack([_scale_enum(anchors[i, :], scales) for i in range(anchors.shape[0])])return torch.from_numpy(anchors)

RPN_ANCHOR_SCALES是anchor尺寸，分别为 (32, 64, 128, 256, 512)，对应rpn_feature_maps的[P2, P3, P4, P5, P6]，分辨率依次为[256,128,64,32,16]，也就是说底层高分辨率特征去检测较小的目标，顶层低分辨率特征图用于去检测较大的目标。最终得到anchors的shape为[anchor_count, (y1, x1, y2, x2)]，此时计算的anchor_count = (256256 + 128128 + 6464 + 3232 + 16*16)*3 = 261888。后续的proposallayer会进行筛选。
上述进行非极大值抑制后得到proposals的具体实现在make_rpn_postprocessor函数中。

3. roi heads

再次借用大佬的图:

# 检测部分头
class ROIBoxHead(torch.nn.Module):"""Generic Box Head class."""def __init__(self, cfg, in_channels):super(ROIBoxHead, self).__init__()self.cfg = cfg# 构建一个roi特征提取器self.feature_extractor = make_roi_box_feature_extractor(cfg, in_channels)# 最终的预测器self.predictor = make_roi_box_predictor(cfg, self.feature_extractor.out_channels)# 进行极大值抑制和锚框回归的后处理self.post_processor = make_roi_box_post_processor(cfg)# 继续最周的损失计算self.loss_evaluator = make_roi_box_loss_evaluator(cfg)def forward(self, features, proposals, targets=None):if self.training:# Faster R-CNN subsamples during training the proposals with a fixed# positive / negative ratiowith torch.no_grad():# 根据rpn计算的proposals和targets进行采样作为新的proposalsproposals = self.loss_evaluator.subsample(proposals, targets)# extract features that will be fed to the final classifier. The# feature_extractor generally corresponds to the pooler + headsx = self.feature_extractor(features, proposals)#提取特征，用于分类# final classifier that converts the features into predictionsclass_logits, box_regression = self.predictor(x) # 进行分类和框回归if not self.training:result = self.post_processor((class_logits, box_regression), proposals)return x, result, {}loss_classifier, loss_box_reg = self.loss_evaluator([class_logits], [box_regression])return (x,proposals,dict(loss_classifier=loss_classifier, loss_box_reg=loss_box_reg),)

以下是分割部分的头

# 这一部分与上述检测头基本相同，主要区别在于分类是对每个像素分类，而不用对object box分类
class ROIMaskHead(torch.nn.Module):def __init__(self, cfg, in_channels):super(ROIMaskHead, self).__init__()self.cfg = cfg.clone()self.feature_extractor = make_roi_mask_feature_extractor(cfg, in_channels)self.predictor = make_roi_mask_predictor(cfg, self.feature_extractor.out_channels)self.post_processor = make_roi_mask_post_processor(cfg)self.loss_evaluator = make_roi_mask_loss_evaluator(cfg)def forward(self, features, proposals, targets=None):if self.training:# during training, only focus on positive boxesall_proposals = proposalsproposals, positive_inds = keep_only_positive_boxes(proposals)if self.training and self.cfg.MODEL.ROI_MASK_HEAD.SHARE_BOX_FEATURE_EXTRACTOR:x = featuresx = x[torch.cat(positive_inds, dim=0)]else:x = self.feature_extractor(features, proposals)mask_logits = self.predictor(x)if not self.training:result = self.post_processor(mask_logits, proposals)return x, result, {}loss_mask = self.loss_evaluator(proposals, mask_logits, targets)return x, all_proposals, dict(loss_mask=loss_mask)

最后看一下make_roi_box_feature_extractor。这里使用到roi_align,在Pooler类中对特征进行提取。

class FPN2MLPFeatureExtractor(nn.Module):def __init__(self, cfg, in_channels):super(FPN2MLPFeatureExtractor, self).__init__()resolution = cfg.MODEL.ROI_BOX_HEAD.POOLER_RESOLUTIONscales = cfg.MODEL.ROI_BOX_HEAD.POOLER_SCALESsampling_ratio = cfg.MODEL.ROI_BOX_HEAD.POOLER_SAMPLING_RATIO# 可以参考代码查看Pooler的细节pooler = Pooler(output_size=(resolution, resolution),scales=scales,sampling_ratio=sampling_ratio,)input_size = in_channels * resolution ** 2representation_size = cfg.MODEL.ROI_BOX_HEAD.MLP_HEAD_DIMuse_gn = cfg.MODEL.ROI_BOX_HEAD.USE_GNself.pooler = poolerself.fc6 = make_fc(input_size, representation_size, use_gn)self.fc7 = make_fc(representation_size, representation_size, use_gn)self.out_channels = representation_sizedef forward(self, x, proposals):x = self.pooler(x, proposals) # 特征提取x = x.view(x.size(0), -1) # 展平为一维x = F.relu(self.fc6(x)) # 计算分类x = F.relu(self.fc7(x)) # 计算回归return x

RoIAlign相比较于roiPooling的优势，RoIAlign并没有取整的过程，可以全程使用浮点数操作，步骤如下：

计算RoI区域的边长，边长不取整；
将ROI区域均匀分成k × k个bin，每个bin的大小不取整；
每个bin的值为其最邻近的Feature Map的四个值通过双线性插值得到；
使用Max Pooling或者Average Pooling得到长度固定的特征向量。
例如输入一张800×800 的图片，经过一个有5次降采样的卷机网络，得到大小为 25×25 的Feature Map。若ROI区域大小是 600×500 ，经过网络之后对应的区域为 60032\frac {600} {32}32600 ∗ 50032\frac {500} {32}32500 = 18.75×15.625，由于无法整除，ROI Pooling采用向下取整的方式，进而得到ROI区域的Feature Map的大小为 18 × 15，这就造成了第一次区域不匹配。
RoI Pooling的下一步是对Feature Map分bin，加入我们需要一个7 × 7的bin，每个bin的大小为 187\frac{18} {7}718 ∗ 187\frac{18} {7}718，由于不能整除，ROI同样采用了向下取整的方式，从而每个bin的大小为 2 × 2，即整个RoI区域的Feature Map的尺寸为14 × 14,第二次区域不匹配问题因此产生。对比ROI Pooling之前的Feature Map，ROI Pooling分别在横向和纵向产生了4.75和1.625的误差，对于物体分类或者物体检测场景来说，这几个像素的位移或许对结果影响不大（但是经过更精细的损失计算box回归效果会更好），但是语义分割任务通常要精确到每个像素点，因此ROI Pooling是不能应用到Mask R-CNN中的。

参考：Mask R-CNN讲解_江南綿雨的博客-CSDN博客_mask rcnn

参考：BINGO Hong：MASK_RCNN代码详解(4)-Losses部分

白话Mask RCNN与代码解析相关推荐

运行mask rcnn训练代码程序在Epoch 1/20时出现程序挂起现象解决办法
问题描述在运行mask rcnn训练代码程序在Epoch 1/20时出现程序挂起现象,即程序运行但是跑不动,也不报错.如下所示: 解决办法将keras版本修正为2.1.6即可,打开cmd,输入下 ...
mask rcnn 超详细代码解读（一）
mask r-cnn 代码解读(一) 文章目录 1 代码架构 2 model.py 的结构 3 train过程代码解析 3.1 Resnet Graph 3.2 Region Proposal Net ...
faster rcnn接口_源码解析faster rcnn （mask rcnn）全过程
1. 总领过程--官方faster cnnn 调用过程 import torchvision, torch # 导入官方faster rcnn 模型 model = torchvision.model ...
Mask RCNN笔记
mask rcnn简介 mask rcnn是何凯明基于以往的faster rcnn架构提出的新的卷积网络,一举完成了object instance segmentation. 该方法在有效地目标的同时 ...
Mask R-CNN
mask rcnn简介 mask rcnn是何凯明基于以往的faster rcnn架构提出的新的卷积网络,一举完成了object instance segmentation. 该方法在有效地目标的同时 ...
Mask R-CNN数据标注和模型训练
文章目录从零开始运行Mask R-CNN 前言学习路线源码 GitHub 数据标注环境配置 1.环境配置 2.下载网络权重数据格式结构说明标注演示注意代码演示预训练权重下载可视化 ...
Mask R-CNN 训练自己的数据集（balloon过程+报错解释）
因项目需要,识别带有多边形标注的图像,舍弃了速度快精度高的yolov3,使用Mask R-CNN网络.作为一名深度学习小白,在摸爬滚打中查找资料修改代码以及不断地调整训练集,途中踩了不少坑,终于达到预 ...
Mask Rcnn代码与原理相结合解析
1:前言文章目录 1:前言 2:图片的预处理 3:整体流程概述 4:搭建特征提取网络 4:anchors的形成 5:RPN网络的搭建 6:Proposal Layer 7:创建标签 8:ROIAli ...
Mask R-CNN用于目标检测和分割代码实现
Mask R-CNN用于目标检测和分割代码实现 Mask R-CNN for object detection and instance segmentation on Keras and Tenso ...

白话Mask RCNN与代码解析

1.backbone + fpn

2 RPN

3. roi heads

白话Mask RCNN与代码解析相关推荐

最新文章

热门文章