#Written before
This blog is the second one of the series of Notes on MatConvNet.

  1. Notes on MatConvNet(I) – Overview
    Here I will mainly introduce the central of the matcovnet—vl_simplenn.
    This function plays a quite important role in forward-propagation and backward-propagation.
    PS. I have to admit that writing blogs in Chinese would bring much more traffic than writing English blogs. Yet, if I care too much about vain trifles, it is a sad thing.

Something that should be known before

I only introduce BP. How does the derivatives propagate backforward? I sincerely recommend you to read BP Algorithm. This blog is a good beginning of BP. When you have finished reading that, The first blog of this series is also recommended. I will repeat the main computation structure illustrated in Notes(I).

Computation Structure


I make some default rules here.

  • y always represents the output of certain layer. That is to say when it comes to layer i, then y is the output of layer i.
  • z always represents the output of the whole net, or rather, it represents the output of the final layer n.
  • x represents the input of certain layer.

In order to make things more easier, matcovnet takes one simple function as a “layer”. This means when the input goes through a computation strcuture(no matter it is conv structure or just a relu structure), it does computation like the following:
d z d x = f ′ ( x ) ∗ d z d y ( 1 ) \frac{dz}{dx}=f^{'}(x)*\frac{dz}{dy}\qquad(1) dxdz​=f′(x)∗dydz​(1)
d z d x = f w ′ ( x ) ∗ d z d y ( 2 ) c o n d i t i o n \frac{dz}{dx}=f^{'}_w(x)*\frac{dz}{dy}\qquad(2) \ condition dxdz​=fw′​(x)∗dydz​(2) condition
Note:

  • condition means that formula (2) only computes when the layer contains weights computation.
  • f ′ ( x ) f^{'}(x) f′(x) means the derivative of the output with respect to the input x x x.
  • f w ′ ( x ) f^{'}_w(x) fw′​(x) means the derivative of the output with respect to the weights w w w.
  • It is a little different from BP Algorithm where it just takes conv or full connected layer as computation structures. However when you include activations(such as sigmoid, relu etc) and other functional structures(dropout, LRN, ect) into computation structures, you will find it quite easy. It is because every computation part only takes responsible for its own input and output. Every time, it gets an dzdy and its input x, it calculate dzdx using (1). If it represents conv or full connected layers or other layers which just take weights into computation, you have to do more computation with formula (2), for you need to get newer weights to update them. In fact it is where our goal is.

Taking a look at vl_simplenn

The result format

  • res(i+1).x: the output of layer i. Hence res(i).x is the
    network input.

  • res(i+1).dzdx: the derivative of the network output relative
    to the output of layer i. In particular res(i).dzdx is the
    derivative of the network output with respect to the network
    input.

  • res(i+1).dzdw: a cell array containing the derivatives of the
    network output relative to the parameters of layer i. It can
    be a cell array for multiple parameters.

Note: When it comes to the layer i, y means res(i+1).x. For y is the output of a certain layer, and res(i+1).x is indeed the input of layer i+1, so it shows in this type. And if the layer is i, res(i+1).dzdx has the same meaning of dzdy.

##Main types you may use

res = vl_simplenn(net,x); (1)
res = vl_simplenn(net,x,dzdy); (2)
res = vl_simplenn(net, x,dzdy, res, opt, val) (3)
(1) is just forward computation.
(2) is used when back propagation. It is mainly used to compute the derivatives of input or the weights with respect the net’s output z.
(3) is used in cnn_train. It adds some opts but I do not introduce them here.

...
% codes before 'Forward pass' is easy and deserves no explanations.
% -------------------------------------------------------------------------
%                                                              Forward pass
% -------------------------------------------------------------------------for i=1:nif opts.skipForward, break; end;l = net.layers{i} ;res(i).time = tic ;switch l.typecase 'conv'res(i+1).x = vl_nnconv(res(i).x, l.weights{1}, l.weights{2}, ...'pad', l.pad, ...'stride', l.stride, ...l.opts{:}, ...cudnn{:}) ;case 'convt'res(i+1).x = vl_nnconvt(res(i).x, l.weights{1}, l.weights{2}, ...'crop', l.crop, ...'upsample', l.upsample, ...'numGroups', l.numGroups, ...l.opts{:}, ...cudnn{:}) ;case 'pool'res(i+1).x = vl_nnpool(res(i).x, l.pool, ...'pad', l.pad, 'stride', l.stride, ...'method', l.method, ...l.opts{:}, ...cudnn{:}) ;case {'normalize', 'lrn'}res(i+1).x = vl_nnnormalize(res(i).x, l.param) ;case 'softmax'res(i+1).x = vl_nnsoftmax(res(i).x) ;case 'loss'res(i+1).x = vl_nnloss(res(i).x, l.class) ;case 'softmaxloss'res(i+1).x = vl_nnsoftmaxloss(res(i).x, l.class) ;case 'relu'if l.leak > 0, leak = {'leak', l.leak} ; else leak = {} ; endres(i+1).x = vl_nnrelu(res(i).x,[],leak{:}) ;case 'sigmoid'res(i+1).x = vl_nnsigmoid(res(i).x) ;...otherwiseerror('Unknown layer type ''%s''.', l.type) ;end

Codes above show the forward propagation’s main idea.

  % optionally forget intermediate resultsforget = opts.conserveMemory & ~(doder & n >= backPropLim) ;if i > 1lp = net.layers{i-1} ;% forget RELU input, even for BPROP% forget为是否保留中间结果res{i+1}.x, net.layers.precious% 为true则保留中间结果forget = forget & (~doder | (strcmp(l.type, 'relu') & ~lp.precious)) ;forget = forget & ~(strcmp(lp.type, 'loss') || strcmp(lp.type, 'softmaxloss')) ;forget = forget & ~lp.precious ;endif forget  %不保存就让这一层的输入置为空res(i).x = [] ;endif gpuMode && opts.syncwait(gpuDevice) ;endres(i).time = toc(res(i).time) ;
end

Backward pass

It seems no explanations are the best explanations. Because it is quite easy.

% -------------------------------------------------------------------------
%                                                             Backward pass
% -------------------------------------------------------------------------if doderres(n+1).dzdx = dzdy ;for i=n:-1:max(1, n-opts.backPropDepth+1)l = net.layers{i} ;res(i).backwardTime = tic ;switch l.typecase 'conv'[res(i).dzdx, dzdw{1}, dzdw{2}] = ...vl_nnconv(res(i).x, l.weights{1}, l.weights{2}, res(i+1).dzdx, ...'pad', l.pad, ...'stride', l.stride, ...l.opts{:}, ...cudnn{:}) ;case 'convt'[res(i).dzdx, dzdw{1}, dzdw{2}] = ...vl_nnconvt(res(i).x, l.weights{1}, l.weights{2}, res(i+1).dzdx, ...'crop', l.crop, ...'upsample', l.upsample, ...'numGroups', l.numGroups, ...l.opts{:}, ...cudnn{:}) ;case 'pool'res(i).dzdx = vl_nnpool(res(i).x, l.pool, res(i+1).dzdx, ...'pad', l.pad, 'stride', l.stride, ...'method', l.method, ...l.opts{:}, ...cudnn{:}) ;case {'normalize', 'lrn'}res(i).dzdx = vl_nnnormalize(res(i).x, l.param, res(i+1).dzdx) ;case 'softmax'res(i).dzdx = vl_nnsoftmax(res(i).x, res(i+1).dzdx) ;case 'loss'res(i).dzdx = vl_nnloss(res(i).x, l.class, res(i+1).dzdx) ;case 'softmaxloss'res(i).dzdx = vl_nnsoftmaxloss(res(i).x, l.class, res(i+1).dzdx) ;case 'relu'if l.leak > 0, leak = {'leak', l.leak} ; else leak = {} ; endif ~isempty(res(i).x)res(i).dzdx = vl_nnrelu(res(i).x, res(i+1).dzdx, leak{:}) ;else% if res(i).x is empty, it has been optimized away, so we use this% hack (which works only for ReLU):res(i).dzdx = vl_nnrelu(res(i+1).x, res(i+1).dzdx, leak{:}) ;endcase 'sigmoid'res(i).dzdx = vl_nnsigmoid(res(i).x, res(i+1).dzdx) ;case 'noffset'res(i).dzdx = vl_nnnoffset(res(i).x, l.param, res(i+1).dzdx) ;case 'spnorm'res(i).dzdx = vl_nnspnorm(res(i).x, l.param, res(i+1).dzdx) ;case 'dropout'if testModeres(i).dzdx = res(i+1).dzdx ;elseres(i).dzdx = vl_nndropout(res(i).x, res(i+1).dzdx, ...'mask', res(i+1).aux) ;endcase 'bnorm'[res(i).dzdx, dzdw{1}, dzdw{2}, dzdw{3}] = ...vl_nnbnorm(res(i).x, l.weights{1}, l.weights{2}, res(i+1).dzdx) ;% multiply the moments update by the number of images in the batch% this is required to make the update additive for subbatches% and will eventually be normalized awaydzdw{3} = dzdw{3} * size(res(i).x,4) ;case 'pdist'res(i).dzdx = vl_nnpdist(res(i).x, l.class, ...l.p, res(i+1).dzdx, ...'noRoot', l.noRoot, ...'epsilon', l.epsilon, ...'aggregate', l.aggregate) ;case 'custom'res(i) = l.backward(l, res(i), res(i+1)) ;end % layersswitch l.type         case {'conv', 'convt', 'bnorm'}if ~opts.accumulateres(i).dzdw = dzdw ;elsefor j=1:numel(dzdw)res(i).dzdw{j} = res(i).dzdw{j} + dzdw{j} ;endenddzdw = [] ;endif opts.conserveMemory && ~net.layers{i}.precious && i ~= nres(i+1).dzdx = [] ;res(i+1).x = [] ;endif gpuMode && opts.syncwait(gpuDevice) ;endres(i).backwardTime = toc(res(i).backwardTime) ;end
end

Looking into functions direct under the folder matlab

Here I mean functions vl_nnxx. I will just take vl__nnsigmoid and vl__nnrelu for example.

function out = vl_nnsigmoid(x,dzdy)
y = 1 ./ (1 + exp(-x));if nargin <= 1 || isempty(dzdy)out = y ;
elseout = dzdy .* (y .* (1 - y)) ;
end

When it come to a layer, formula (1) is bound to be executed out = dzdy .* (y .* (1 - y)) ;The latter part (y .* (1 - y)) is just f ′ ( x ) f^{'}(x) f′(x).

function y = vl_nnrelu(x,dzdy,varargin)
opts.leak = 0 ;
opts = vl_argparse(opts, varargin) ;if opts.leak == 0if nargin <= 1 || isempty(dzdy)y = max(x, 0) ;elsey = dzdy .* (x > 0) ;  % here used formula (1)end
elseif nargin <= 1 || isempty(dzdy)y = x .* (opts.leak + (1 - opts.leak) * (x > 0)) ;elsey = dzdy .* (opts.leak + (1 - opts.leak) * (x > 0)) ;end
end

Why I don’t show code of structures containing weights’ computation. Because they are coded in cuda C for computation speediness. You can find them in matlab\src. That’s why we should compile matconvnet at the very beginning. We need to compile functions like vl_nnconv into mex files to be called by matlab files.

The third of the series will be mainly introduce cnn_train which is quite interesting.

Notes on MatConvNet(II):vl_simplenn相关推荐

  1. Deep Learning for Computer Vision with MATLAB and cuDNN

    转载自:Deep Learning for Computer Vision with MATLAB and cuDNN | Parallel Forall http://devblogs.nvidia ...

  2. 【Notes】《Thinking in Java》【Chapter 11】Part II

    六.Typical uses of I/O streams java.io.;java.io.;java.io.;java.io.;java.io.;java.io.;java.io.;java.io ...

  3. SAP RETAIL 补货类型RF之初探 II

    SAP RETAIL 补货类型RF之初探 II 1, 补货参数 – TargetCoverage字段不能为空. 如果不维护这个参数值,系统会提示:Target stock calculated dyn ...

  4. 使用matlab版卷及神经网络 MatconvNe和预训练的imageNet进行图像检Image retrieval using MatconvNet and pre-trained imageNet

    代码:CNN-for-Image-Retrieval. 2015/12/31更新:添加对MatConvNet最新版version 1.0-beta17的支持,预训练的模型请到Matconvnet官网下 ...

  5. 【MatConvnet速成】MatConvnet图像分类从模型自定义到测试

    欢迎来到专栏<2小时玩转开源框架系列>,这是我们第10篇,前面已经说过了caffe,tensorflow,pytorch,mxnet,keras,paddlepaddle,cntk,cha ...

  6. 利用MatConvNet进行孪生多分支网络设计

    前面提及到了利用vl_nndist作为多分支网络的特征测度函数,将多个网络的局部输出融合到一起.参见博客:https://blog.csdn.net/shenziheng1/article/detai ...

  7. 关于Matconvnet中模型发布与共享的思考

    1. 前言 现阶段还坚持使用Matconvnet工具箱的一定是MATLAB的死粉.与TensorFlow.Keras相比,matconvnet的便捷性不够.但是,matlab与matconvnet提供 ...

  8. MatConvnet工具箱文档翻译理解(4)

    第3章 包装器和预训练模型 这是很容易的组合第4章"手动"的计算块,但通常通过一个包装器来使用它们可以实现CNN架构给定一个模型规范通常更方便,可用的包装器总结在3.1节.  Ma ...

  9. MatConvnet工具箱文档翻译理解(2)

    1.2 MatConvNet一目了然 MatConvNet具有简单的设计理念. 它不是将CNN包裹在软件的复杂层上,而是暴露了直接作为MATLAB命令的计算CNN构造块的简单函数,例如线性卷积和ReL ...

最新文章

  1. 日志分析工具splunt
  2. GD32F130命名方式
  3. python基础:冒泡和选择排序算法实现
  4. word List 19
  5. [密码学基础][每个信息安全博士生应该知道的52件事][Bristol Cryptography][第37篇]The Number Field Sieve
  6. mnesia mysql性能,Mnesia数据库的存储容量是多少?
  7. LeetCode 453. 最小移动次数使数组元素相等(数学)
  8. Reshape cannot infer the missing input size for an empty tensor unless all specified input sizes are
  9. oracle to_char 数值,oracle to_char格式数值
  10. [秘技]解决QQ音乐超出服务区域问题
  11. Spring 注解中@Resource 和 Authwired 的区别
  12. 不一般的电路设计——什么是电压采集采样?
  13. 斐波那契数列 java
  14. Labelme转VOC格式
  15. 微信小程序上传图片到服务器(java后台以及使用springmvc)
  16. 通俗易懂解释Docker是什么
  17. CurveLane-NAS: Unifying Lane-Sensitive Architecture Search and Adaptive Poin
  18. matlab 单边频率谱,频谱图如何把双边谱改成单边谱?
  19. 物联网开发笔记(63)- 使用Micropython开发ESP32开发板之控制ILI9341 3.2寸TFT-LCD触摸屏进行LVGL图形化编程:显示中文
  20. Optiver宣布成立主要战略投资团队

热门文章

  1. 2.Moving Problematic Files To A Separate Folder
  2. 绿茶的全球与中国市场2022-2028年:技术、参与者、趋势、市场规模及占有率研究报告
  3. 谈谈海外移动支付普及的一点障碍
  4. Token Contrast for Weakly-Supervised Semantic Segmentation
  5. 纽约时报的主编竟是一个叫做Blossom的机器人
  6. nssl 1488.上升子序列
  7. 12306网站、抢票插件以及偶写的自动订票小程序
  8. 找规律:墨菲定律、欧姆定律、帕金森定律、马太效应
  9. [转]条形码基本知识教程 Code39码
  10. 把握经济脉搏 振兴会杜振国 地摊经济如何影响到普通人