读文献--《U-Net: Convolution Networks for Biomedical Image Segmentation 》

声明：作者翻译论文仅为学习，如有侵权请联系作者删除博文，谢谢！

1.Abstract

In this paper, we present a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization. (在本文中，我们提出了一种网络和训练策略，它依赖于大量使用数据增强，以便更有效地使用获得的标注样本。这个架构包括捕获上下文的收缩路径和能够精确定位的对称扩展路径。)

2. Introduction

介绍了以前的研究，之前的卷积分类任务受限于网络和训练图像的数据集大小。然后引出图像的位置分割问题，开始图像的像素级分割，以及前人的显微图像像素级分割等研究。之前的EM segmentation 运用在显微图像分割中，但它运行很慢，因为必须为每个图像块单独运行网，a lot of redundancy due to overlapping patches. 其次，定位准确性与上下文的使用之间存在着权衡，也就是感受野和上下文信息的权衡。最近的有方法提出，多层特征融合，得到比较好的位置和语义准确度。
本文采用全卷积网络，并对其进行改编。详细架构介绍在下文。
The main idea of fully convolution network is to supplement a usual contracting network by successive layers, where pooling operators are replaced by upsampling operators. Hence, these layers increase the resolution of the output. In order to localize, high resolution features from the contracting path are combined with the upsampled output. A successive convolution layer can then learn to assemble a more precise output based on this information.
(全卷积网络的主要思想是通过连续层补充通常的收缩网络，其中的池化运算符由上采样运算符替换。因此，这些层增加了输出的分辨率。为了进行定位，来自收缩路径的高分辨率特征与上采样输出相结合。然后，后续卷积层可以基于该信息学习组装更精确的输出。)
One important modification in our architecture is that in the upsampling part we have also a large number of feature channels, which allow the network to propagate context information to higher resolution layers. As a consequence, the expansive path is more or less symmetric to the contracting path, and yields a u-shaped architecture.
(我们架构中的一个重要修改是在上采样部分中我们还有大量的特征通道，这些通道允许网络将上下文信息传播到具有更高分辨率的层。因此，扩展路径或多或少地与收缩路径对称，并产生U形结构。)
网络中不含有全连接层，使用卷积有效部分，允许通过重叠图像区策略无缝分割任意大小的图像，对于大图像，这是一个很好的解决办法。
This strategy allows the seamless segmentation of arbitrarily large images by an overlap-tile strategy. (为了预测图像边界区域中的像素，通过镜像输入图像来外推缺失的上下文。)
有文献提供这种数据增强不变形的理论依据，许多细胞分割任务是分离同类的接触目标，因此数据用局部来代替，适合于医学图像这种重复性比较高的图像。结果就是本文的做的比EM的效果好。

3. Network architecture

The network architecture is illustrated in Figure 1. It consists of a contracting path (left side) and an expansive path (right side). The contracting path follows the typical architecture of a convolutional network. It consists of the repeated application of two 3x3 convolutions (unpadded convolutions), each followed by a rectified linear unit (ReLU) and a 2x2 max pooling operation with stride 2 for downsampling. At each downsampling step we double the number of feature channels. Every step in the expansive path consists of an upsampling of the feature map followed by a 2x2 convolution (“up-convolution”) that halves the number of feature channels, a concatenation with the correspondingly cropped feature map from the contracting path, and two 3x3 convolutions, each followed by a ReLU. The cropping is necessary due to the loss of border pixels in every convolution. At the final layer a 1x1 convolution is used to map each 64-component feature vector to the desired number of classes. In total the network has 23 convolutional layers.

由一个收缩路径（左侧）和一个扩展路径（右侧）组成。
收缩路径遵循卷积网络的典型架构。它包括重复使用两个3x3卷积（无填充卷积），分别后跟一个线性修正单元（ReLU）和一个2x2最大池化操作，步长为2的下采样。特征通道的数量加倍。
扩展路径中的每一步都包括特征映射的上采样，然后进行2x2卷积（“向上卷积”），将特征通道数量减半，与来自收缩路径的相应裁剪特征映射级联，然后是两个3x3卷积，每个卷积后面接ReLU。由于每一次卷积都会丢失边界像素，因此裁剪是必要的。
在最后一层，使用1x1卷积将每个64分量特征向量映射到所需数量的类别上。网络总共有23个卷积层。

图像1 U-net结构（32×32最简单的例子）。每个蓝色方块代表一个多通道特征图，通道数标注在方块上方，x-y-size 标注在方块左边。白色方块表示复制的特征图，箭头表示不同的操作。
U-net 由编码器和解码器两大部分组成，每个部分都包含四个阶段。每个阶段由两个卷积层和一个下采样层组成。底层(深层)信息：经过多次下采样后的低分辨率信息。能够提供分割目标在整个图像中上下文语义信息，可理解为反应目标和它的环境之间关系的特征。这个特征有助于物体的类别判断（所以分类问题通常只需要低分辨率/深层信息，不涉及多尺度融合）。高层(浅层)信息：经过concatenate操作从encoder直接传递到同高度decoder上的高分辨率信息。能够为分割提供更加精细的特征，如梯度等。U-net结合了低分辨率信息（提供物体类别识别依据）和高分辨率信息（提供精准分割定位依据）。U-net 将跳跃结构发挥到了极致，其解码部分与编码部分相对称，编码器的每个阶段都与解码器相应的阶段做跳跃连接，结构简单，性能优异。

4. Training

The input images and their corresponding segmentation maps are used to train the network with the stochastic gradient descent implementation of Caffe. (输入图像和他们对应的ground truth 输入Caffe实现的神经网络，采用随机梯度下降的方法。)
We favor large input tiles over a large batch size and hence reduce the batch to a single image. Accordingly we use a high momentum (0.99) such that a large number of the previous seen training samples determine the update in the current optimization step. (大批量图像时可以使用较大的图块，从而将批量减少为单个图像。在这里我们使用高动量，大量先前看到的训练样本来决定当前优化步骤中的更新。)
We pre-compute the weight map for each ground truth segmentation to compensate the different frequency of pixels from a certain class in the training data set, and to force the network to learn the small separation borders that we introduce between touching cells (See Figure 3c and d).
我们为每一个真实分割预先计算了权重图，以补偿训练集里某个类别的像素的不同频率，并且迫使网络学习我们在相邻细胞间的引入小的分割边界（参见图3c和d）。
In deep networks with many convolutional layers and different paths through the network, a good initialization of the weights is extremely important. Otherwise, parts of the network might give excessive activations, while other parts never contribute. Ideally the initial weights should be adapted such that each feature map in the network has approximately unit variance. (在具有许多卷积层和通过网络的不同路径的深度网络中，权重的良好初始化非常重要。否则，网络的某些部分可能会进行过多的激活，而其他部分永远不会起作用。理想情况下，初始化权重应该是自适应的，以使网络中的每个特征映射都具有近似的单位方差。)
Data augmentation is essential to teach the network the desired invariance and robustness properties, when only few training samples are available. We generate smooth deformations using random displacement vectors on a coarse 3 by 3 grid. The displacements are sampled from a Gaussian distribution with 10 pixels standard deviation. Per-pixel displacements are then computed using bicubic interpolation. Drop-out layers at the end of the contracting path perform further implicit data augmentation. (当只有少量训练样本可用时，数据增强对于教网络学习所需的不变性和鲁棒性而言至关重要。我们使用在3x3粗糙网格上的随机位移矢量来生成平滑形变。随机位移来自对10个像素标准偏差的高斯分布采样而来。然后使用双三次插值计算每个像素的位移。收缩路径末端的丢弃层执行进一步隐式数据增强。)

5. Experiments

将U-net应用在三个不同的分割任务中，分别是电子显微记录中分割神经元结构，以及光学显微图像中的细胞分割任务。电子显微镜分割神经元结构的任务数据集由EM分割挑战赛提供。其训练集是可以获得的，而分割结果是保密的。可将分割结果给组织者进行获得，在其标准上都获得了很好的结果。而光学显微细胞分割任务是ISBI细胞跟踪挑战赛2014和2015的一部分，有两个数据集均进行了实现。在两个测试中平均IOU都大幅度高于次优算法。（10%和30%的提高）