6. Results

Our results on ILSVRC-2010 are summarized in Table 1. Our network achieves top-1 and top-5 test set error rates of 37.5%37.5\%37.5% and 17.0%17.0\%17.0% ¹. The best performance achieved during the ILSVRC-2010 competition was 47.1%47.1\%47.1% and 28.2%28.2\%28.2% with an approach that averages the predictions produced from six sparse-coding models trained on different features [2], and since then the best published results are 45.7%45.7\%45.7% and 25.7%25.7\%25.7% with an approach that averages the predictions of two classifiers trained on Fisher Vectors (FVs) computed from two types of densely-sampled features [24].

我们在 ILSVRC-2010 上的结果总结在表 1 中。我们的网络达到了 37.5%37.5\%37.5% 和 17.0%17.0\%17.0% 的 top-1 和 top-5 测试集错误率。在 ILSVRC-2010 比赛中，我们通过对六个稀疏编码模型产生的预测进行平均，取得的最佳性能是 47.1%47.1\%47.1% 和 28.2%28.2\%28.2%，这些模型在不同特征上训练[2]；之后，公布的结果是 45.7%45.7\%45.7% 和 25.7%25.7\%25.7%，其方法是对两个分类器的预测进行平均，这些分类器在从两种密集采样特征 [24] 计算的 Fisher 向量 (FV) 上训练。

We also entered our model in the ILSVRC-2012 competition and report our results in Table 2. Since the ILSVRC-2012 test set labels are not publicly available, we cannot report test error rates for all the models that we tried. In the remainder of this paragraph, we use validation and test error rates interchangeably because in our experience they do not differ by more than 0.1% (see Table 2).

我们还在 ILSVRC-2012 竞赛中输入了我们的模型，并在表 2 中报告了我们的结果。由于 ILSVRC-2012 测试集标签不公开，我们无法报告我们尝试的所有模型的测试错误率。在本段的其余部分，我们交替使用验证和测试错误率，因为根据我们的经验，它们的差异不超过 0.1%（见表 2）。

Table 1: Comparison of results on ILSVRC-2010 test set. In italics are best results achieved by others.
表 1：ILSVRC-2010 测试集上的结果比较。斜体是其他人取得的最佳结果。

The CNN described in this paper achieves a top-5 error rate of 18.2%. Averaging the predictions of five similar CNNs gives an error rate of 16.4%. Training one CNN, with an extra sixth convolutional layer over the last pooling layer, to classify the entire ImageNet Fall 2011 release (15M images, 22K categories), and then “fine-tuning” it on ILSVRC-2012 gives an error rate of 16.6%.

本文描述的 CNN 实现了 18.2% 的 top-5 错误率。对五个相似 CNN 的预测进行平均，错误率为 16.4%。训练一个 CNN，在最后一个池化层上增加第六个卷积层，对整个 ImageNet Fall 2011 版本（15M 图像，22K 类别）进行分类，然后在 ILSVRC-2012 上对其进行“微调”，错误率为 16.6 %。

Averaging the predictions of two CNNs that were pre-trained on the entire Fall 2011 release with the aforementioned five CNNs gives an error rate of 15.3%. The second-best contest entry achieved an error rate of 26.2% with an approach that averages the predictions of several classifiers trained on FVs computed from different types of densely-sampled features [7].

将在 2011 年秋季发布的整个 2011 年秋季版本中预训练的两个 CNN 的预测与上述五个 CNN 的预测平均得出 15.3% 的错误率。第二好的参赛作品的错误率达到了 26.2%，其方法是对几个在 FV 上训练的分类器的预测进行平均，这些 FV 是根据不同类型的密集采样特征计算得出的 [7]。

Finally, we also report our error rates on the Fall 2009 version of ImageNet with 10,184 categories and 8.9 million images. On this dataset we follow the convention in the literature of using half of the images for training and half for testing. Since there is no established test set, our split necessarily differs from the splits used by previous authors, but this does not affect the results appreciably.

最后，我们还报告了 2009 年秋季版 ImageNet 的错误率，该版本包含 10,184 个类别和 890 万张图像。在这个数据集上，我们遵循文献中使用一半图像进行训练和一半图像进行测试的惯例。由于没有既定的测试集，我们的拆分必然不同于以前作者使用的拆分，但这不会明显影响结果。

Our top-1 and top-5 error rates on this dataset are 67.4% and 40.9%, attained by the net described above but with an additional, sixth convolutional layer over the last pooling layer. The best published results on this dataset are 78.1% and 60.9% [19].

我们在这个数据集上的 top-1 和 top-5 错误率分别为 67.4% 和 40.9%，由上述网络达到，但在最后一个池化层上增加了第六个卷积层。在这个数据集上发表的最好的结果是 78.1% 和 60.9% [19]。

Table 2: Comparison of error rates on ILSVRC-2012 validation and test sets. In italics are best results achieved by others. Models with an asterisk* were “pre-trained” to classify the entire ImageNet 2011 Fall release. See Section 6 for details.
表 2：ILSVRC-2012 验证集和测试集的错误率比较。斜体是其他人取得的最佳结果。带有星号* 的模型经过“预训练”以对整个 ImageNet 2011 秋季版本进行分类。详见第 6 节。

6.1. Qualitative Evaluations

Figure 3: 96 convolutional kernels of size 11×11×3 learned by the first convolutional layer on the 224×224×3 input images. The top 48 kernels were learned on GPU 1 while the bottom 48 kernels were learned on GPU 2. See Section 6.1 for details.
图 3：第一个卷积层在 224×224×3 输入图像上学习到的 96 个大小为 11×11×3 的卷积核。前 48 个内核是在 GPU 1 上学习的，而后 48 个内核是在 GPU 2 上学习的。有关详细信息，请参阅第 6.1 节。

Figure 3 shows the convolutional kernels learned by the network’s two data-connected layers. The network has learned a variety of frequency- and orientation-selective kernels, as well as various colored blobs. Notice the specialization exhibited by the two GPUs, a result of the restricted connec- tivity described in Section 3.5. The kernels on GPU 1 are largely color-agnostic, while the kernels on on GPU 2 are largely color-specific. This kind of specialization occurs during every run and is independent of any particular random weight initialization (modulo a renumbering of the GPUs).

图 3 显示了网络的两个数据连接层学习的卷积核。该网络已经学习了各种频率和方向选择内核，以及各种彩色斑点。请注意两个 GPU 表现出的专业化，这是第 3.5 节中描述的受限连接的结果。 GPU 1 上的内核很大程度上与颜色无关，而 GPU 2 上的内核很大程度上与颜色相关。这种专门化发生在每次运行期间，并且独立于任何特定的随机权重初始化（以 GPU 的重新编号为模）。

Figure 4: (Left) Eight ILSVRC-2010 test images and the five labels considered most probable by our model. The correct label is written under each image, and the probability assigned to the correct label is also shown with a red bar (if it happens to be in the top 5). (Right) Five ILSVRC-2010 test images in the first column. The remaining columns show the six training images that produce feature vectors in the last hidden layer with the smallest Euclidean distance from the feature vector for the test image.
图 4：（左）八个 ILSVRC-2010 测试图像和我们的模型认为最有可能的五个标签。正确的标签写在每张图像的下方，分配给正确标签的概率也用红色条显示（如果它恰好在前 5 个）。（右）第一列中有五个 ILSVRC-2010 测试图像。其余列显示了在最后一个隐藏层中生成特征向量的六个训练图像，与测试图像的特征向量的欧几里得距离最小。

In the left panel of Figure 4 we qualitatively assess what the network has learned by computing its top-5 predictions on eight test images. Notice that even off-center objects, such as the mite in the top-left, can be recognized by the net. Most of the top-5 labels appear reasonable. For example, only other types of cat are considered plausible labels for the leopard. In some cases (grille, cherry) there is genuine ambiguity about the intended focus of the photograph.

在图 4 的左侧面板中，我们通过计算其对 8 个测试图像的前 5 个预测来定性地评估网络所学到的知识。请注意，即使是偏离中心的物体，例如左上角的螨虫，也可以被网络识别。大多数前 5 个标签看起来都是合理的。例如，只有其他类型的猫才被认为是豹的合理标签。在某些情况下（格栅、樱桃），照片的预期焦点确实存在歧义。

Another way to probe the network’s visual knowledge is to consider the feature activations induced by an image at the last, 4096-dimensional hidden layer. If two images produce feature activation vectors with a small Euclidean separation, we can say that the higher levels of the neural network consider them to be similar.

探测网络视觉知识的另一种方法是考虑由图像在最后一个 4096 维隐藏层引起的特征激活。如果两个图像产生具有小欧几里得分离的特征激活向量，我们可以说神经网络的更高级别认为它们是相似的。

Figure 4 shows five images from the test set and the six images from the training set that are most similar to each of them according to this measure. Notice that at the pixel level, the retrieved training images are generally not close in L2 to the query images in the first column. For example, the retrieved dogs and elephants appear in a variety of poses. We present the results for many more test images in the supplementary material.

图 4 显示了测试集中的五张图像和训练集中的六张图像，根据该度量，它们与每张图像最相似。请注意，在像素级别，检索到的训练图像在 L2 中通常与第一列中的查询图像不接近。例如，检索到的狗和大象以各种姿势出现。我们在补充材料中展示了更多测试图像的结果。

Computing similarity by using Euclidean distance between two 4096-dimensional, real-valued vec- tors is inefficient, but it could be made efficient by training an auto-encoder to compress these vectors to short binary codes. This should produce a much better image retrieval method than applying auto- encoders to the raw pixels [14], which does not make use of image labels and hence has a tendency to retrieve images with similar patterns of edges, whether or not they are semantically similar.

通过使用两个 4096 维实值向量之间的欧几里得距离来计算相似度是低效的，但可以通过训练自动编码器将这些向量压缩为短二进制代码来提高效率。这应该会产生比将自动编码器应用于原始像素 [14] 更好的图像检索方法，它不使用图像标签，因此倾向于检索具有相似边缘模式的图像，无论它们是否在语义上相似的。

7. Discussion

Our results show that a large, deep convolutional neural network is capable of achieving record- breaking results on a highly challenging dataset using purely supervised learning. It is notable that our network’s performance degrades if a single convolutional layer is removed. For example, removing any of the middle layers results in a loss of about 2% for the top-1 performance of the network. So the depth really is important for achieving our results.

我们的结果表明，一个大型的深度卷积神经网络能够使用纯监督学习在极具挑战性的数据集上实现破纪录的结果。值得注意的是，如果移除单个卷积层，我们的网络性能会下降。例如，移除任何中间层会导致网络的 top-1 性能损失约 2%。因此，深度对于实现我们的结果确实很重要。

To simplify our experiments, we did not use any unsupervised pre-training even though we expect that it will help, especially if we obtain enough computational power to significantly increase the size of the network without obtaining a corresponding increase in the amount of labeled data. Thus far, our results have improved as we have made our network larger and trained it longer but we still have many orders of magnitude to go in order to match the infero-temporal pathway of the human visual system.

为了简化我们的实验，我们没有使用任何无监督的预训练，即使我们认为它会有所帮助，特别是如果我们获得足够的计算能力来显着增加网络的大小而没有获得相应增加的标记数据量。到目前为止，我们的结果已经得到了改善，因为我们已经使我们的网络更大并且训练的时间更长，但是为了匹配人类视觉系统的推断时间路径，我们还有很多数量级的工作要做。

Ultimately we would like to use very large and deep convolutional nets on video sequences where the temporal structure provides very helpful information that is missing or far less obvious in static images.

最终，我们希望在视频序列上使用非常大且深度的卷积网络，其中时间结构提供了非常有用的信息，而这些信息在静态图像中缺失或不太明显。

References

[1] R.M.BellandY.Koren.Lessonsfromthenetflixprizechallenge.ACMSIGKDDExplorationsNewsletter, 9(2):75–79, 2007.
[2] A. Berg, J. Deng, and L. Fei-Fei. Large scale visual recognition challenge 2010. www.image- net.org/challenges. 2010.
[3] L. Breiman. Random forests. Machine learning, 45(1):5–32, 2001.
[4] D. Cires ̧an, U. Meier, and J. Schmidhuber. Multi-column deep neural networks for image classification. Arxiv preprint arXiv:1202.2745, 2012.
[5] D.C. Cires ̧an, U. Meier, J. Masci, L.M. Gambardella, and J. Schmidhuber. High-performance neural networks for visual object classification. Arxiv preprint arXiv:1102.0183, 2011.
[6] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09, 2009.
[7] J. Deng, A. Berg, S. Satheesh, H. Su, A. Khosla, and L. Fei-Fei. ILSVRC-2012, 2012. URL http://www.image-net.org/challenges/LSVRC/2012/.
[8] L. Fei-Fei, R. Fergus, and P. Perona. Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. Computer Vision and Image Understanding, 106(1):59–70, 2007.
[9] G. Griffin, A. Holub, and P. Perona. Caltech-256 object category dataset. Technical Report 7694, California Institute of Technology, 2007. URL http://authors.library.caltech.edu/7694.
[10] G.E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R.R. Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580, 2012.
[11] K. Jarrett, K. Kavukcuoglu, M. A. Ranzato, and Y. LeCun. What is the best multi-stage architecture for object recognition? In International Conference on Computer Vision, pages 2146–2153. IEEE, 2009.
[12] A. Krizhevsky. Learning multiple layers of features from tiny images. Master’s thesis, Department of Computer Science, University of Toronto, 2009.
[13] A. Krizhevsky. Convolutional deep belief networks on cifar-10. Unpublished manuscript, 2010.
[14] A. Krizhevsky and G.E. Hinton. Using very deep autoencoders for content-based image retrieval. In ESANN, 2011.
[15] Y. Le Cun, B. Boser, J.S. Denker, D. Henderson, R.E. Howard, W. Hubbard, L.D. Jackel, et al. Hand-written digit recognition with a back-propagation network. In Advances in neural information processing systems, 1990.
[16] Y. LeCun, F.J. Huang, and L. Bottou. Learning methods for generic object recognition with invariance to pose and lighting. In Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on, volume 2, pages II–97. IEEE, 2004.
[17] Y. LeCun, K. Kavukcuoglu, and C. Farabet. Convolutional networks and applications in vision. In Circuits and Systems (ISCAS), Proceedings of 2010 IEEE International Symposium on, pages 253–256. IEEE, 2010.
[18] H. Lee, R. Grosse, R. Ranganath, and A.Y. Ng. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In Proceedings of the 26th Annual International Conference on Machine Learning, pages 609–616. ACM, 2009.
[19] T. Mensink, J. Verbeek, F. Perronnin, and G. Csurka. Metric Learning for Large Scale Image Classification: Generalizing to New Classes at Near-Zero Cost. In ECCV - European Conference on Computer Vision, Florence, Italy, October 2012.
[20] V. Nair and G. E. Hinton. Rectified linear units improve restricted boltzmann machines. In Proc. 27th International Conference on Machine Learning, 2010.
[21] N. Pinto, D.D. Cox, and J.J. DiCarlo. Why is real-world visual object recognition hard? PLoS computational biology, 4(1):e27, 2008.
[22] N. Pinto, D. Doukhan, J.J. DiCarlo, and D.D. Cox. A high-throughput screening approach to discovering good forms of biologically inspired visual representation. PLoS computational biology, 5(11):e1000579, 2009.
[23] B.C. Russell, A. Torralba, K.P. Murphy, and W.T. Freeman. Labelme: a database and web-based tool for image annotation. International journal of computer vision, 77(1):157–173, 2008.
[24] J.SánchezandF.Perronnin.High-dimensionalsignaturecompressionforlarge-scaleimageclassification. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages 1665–1672. IEEE, 2011.
[25] P.Y. Simard, D. Steinkraus, and J.C. Platt. Best practices for convolutional neural networks applied to visual document analysis. In Proceedings of the Seventh International Conference on Document Analysis and Recognition, volume 2, pages 958–962, 2003.
[26] S.C.Turaga,J.F.Murray,V.Jain,F.Roth,M.Helmstaedter,K.Briggman,W.Denk,andH.S.Seung.Convolutional networks can learn to generate affinity graphs for image segmentation. Neural Computation, 22(2):511–538, 2010.

The error rates without averaging predictions over ten patches as described in Section 4.1 are 39.0% and 18.3%. ↩︎

论文研读 —— 6. ImageNet Classification with Deep Convolutional Neural Networks (3/3)相关推荐

经典DL论文研读(part4)--ImageNet Classification with Deep Convolutional Neural Networks
学习笔记,仅供参考,有错必纠文章目录 ImageNet Classification with Deep Convolutional Neural Networks 摘要 Introduction ...
论文研读 —— 6. ImageNet Classification with Deep Convolutional Neural Networks (1/3)
文章目录 Authors and Publishment Authors Bibtex Abstract 1. Introduction 2. The Dataset Authors and Publ ...
论文研读 —— 6. ImageNet Classification with Deep Convolutional Neural Networks (2/3)
文章目录 3. The Architecture 3.1. ReLU Nonlinearity 3.2. Training on Multiple GPUs 3.3. Local Response N ...
论文笔记 - 《ImageNet Classification with Deep Convolutional Neural Networks》精典
基于卷积神经网络的图像分类(经典网络) 作者:Alex Krizhevsky(论文中第一作者的名字为网络名字AlexNet) 单位:加拿大多伦多大学发表会议时间:NIPS 2012 (NIPS:机器 ...
AlexNet论文翻译《ImageNet Classification with Deep Convolutional Neural Networks》
摘要我们训练了一个大型深度卷积神经网络来将ImageNet LSVRC-2010竞赛的120万高分辨率的图像分到1000不同的类别中.在测试数据上,我们得到了top-1 37.5%, top-5 1 ...
《每日论文》ImageNet Classification with Deep Convolutional Neural Networks
这篇论文是剖析 CNN 领域的经典之作,也是入门 CNN 的必读论文.作者训练了一个面向数量为 1.2 百万的高分辨率的图像数据集 ImageNet, 图像的种类为 1000 种的深度卷积神经网络. ...
AlexNet论文翻译（中英文对照版）-ImageNet Classification with Deep Convolutional Neural Networks
图像分类经典论文翻译汇总:[翻译汇总] 翻译pdf文件下载:[下载地址] 此版为中英文对照版,纯中文版请稳步:[AlexNet纯中文版] ImageNet Classification with De ...
ImageNet Classification with Deep Convolutional Neural Networks论文翻译——中文版
文章作者:Tyan 博客:noahsnail.com | CSDN | 简书翻译论文汇总:https://github.com/SnailTyan/deep-learning-papers- ...
论文《ImageNet Classification with Deep Convolutional Neural Networks》阅读及AlexNet的Tensorflow2复现
论文<ImageNet Classification with Deep Convolutional Neural Networks>阅读及AlexNet的Tensorflow2复现论文 ...

论文研读 —— 6. ImageNet Classification with Deep Convolutional Neural Networks (3/3)

文章目录

6. Results

6.1. Qualitative Evaluations

7. Discussion

References

论文研读 —— 6. ImageNet Classification with Deep Convolutional Neural Networks (3/3)相关推荐

最新文章

热门文章