#Written before
This blog is the second one of the series of Notes on MatConvNet.

Notes on MatConvNet(I) – Overview
Here I will mainly introduce the central of the matcovnet—vl_simplenn.
This function plays a quite important role in forward-propagation and backward-propagation.
PS. I have to admit that writing blogs in Chinese would bring much more traffic than writing English blogs. Yet, if I care too much about vain trifles, it is a sad thing.

Something that should be known before

I only introduce BP. How does the derivatives propagate backforward? I sincerely recommend you to read BP Algorithm. This blog is a good beginning of BP. When you have finished reading that, The first blog of this series is also recommended. I will repeat the main computation structure illustrated in Notes(I).

Computation Structure

I make some default rules here.

y always represents the output of certain layer. That is to say when it comes to layer i, then y is the output of layer i.
z always represents the output of the whole net, or rather, it represents the output of the final layer n.
x represents the input of certain layer.

In order to make things more easier, matcovnet takes one simple function as a “layer”. This means when the input goes through a computation strcuture(no matter it is conv structure or just a relu structure), it does computation like the following:
d z d x = f ′ ( x ) ∗ d z d y ( 1 ) \frac{dz}{dx}=f^{'}(x)*\frac{dz}{dy}\qquad(1) dxdz=f′(x)∗dydz(1)
d z d x = f w ′ ( x ) ∗ d z d y ( 2 ) c o n d i t i o n \frac{dz}{dx}=f^{'}_w(x)*\frac{dz}{dy}\qquad(2) \ condition dxdz=fw′(x)∗dydz(2) condition
Note:

condition means that formula (2) only computes when the layer contains weights computation.
f ′ ( x ) f^{'}(x) f′(x) means the derivative of the output with respect to the input x x x.
f w ′ ( x ) f^{'}_w(x) fw′(x) means the derivative of the output with respect to the weights w w w.
It is a little different from BP Algorithm where it just takes conv or full connected layer as computation structures. However when you include activations(such as sigmoid, relu etc) and other functional structures(dropout, LRN, ect) into computation structures, you will find it quite easy. It is because every computation part only takes responsible for its own input and output. Every time, it gets an dzdy and its input x, it calculate dzdx using (1). If it represents conv or full connected layers or other layers which just take weights into computation, you have to do more computation with formula (2), for you need to get newer weights to update them. In fact it is where our goal is.

Taking a look at vl_simplenn

The result format

res(i+1).x: the output of layer i. Hence res(i).x is the
network input.
res(i+1).dzdx: the derivative of the network output relative
to the output of layer i. In particular res(i).dzdx is the
derivative of the network output with respect to the network
input.
res(i+1).dzdw: a cell array containing the derivatives of the
network output relative to the parameters of layer i. It can
be a cell array for multiple parameters.

Note: When it comes to the layer i, y means res(i+1).x. For y is the output of a certain layer, and res(i+1).x is indeed the input of layer i+1, so it shows in this type. And if the layer is i, res(i+1).dzdx has the same meaning of dzdy.

##Main types you may use

res = vl_simplenn(net,x); (1)
res = vl_simplenn(net,x,dzdy); (2)
res = vl_simplenn(net, x,dzdy, res, opt, val) (3)
(1) is just forward computation.
(2) is used when back propagation. It is mainly used to compute the derivatives of input or the weights with respect the net’s output z.
(3) is used in cnn_train. It adds some opts but I do not introduce them here.

...
% codes before 'Forward pass' is easy and deserves no explanations.
% -------------------------------------------------------------------------
%                                                              Forward pass
% -------------------------------------------------------------------------for i=1:nif opts.skipForward, break; end;l = net.layers{i} ;res(i).time = tic ;switch l.typecase 'conv'res(i+1).x = vl_nnconv(res(i).x, l.weights{1}, l.weights{2}, ...'pad', l.pad, ...'stride', l.stride, ...l.opts{:}, ...cudnn{:}) ;case 'convt'res(i+1).x = vl_nnconvt(res(i).x, l.weights{1}, l.weights{2}, ...'crop', l.crop, ...'upsample', l.upsample, ...'numGroups', l.numGroups, ...l.opts{:}, ...cudnn{:}) ;case 'pool'res(i+1).x = vl_nnpool(res(i).x, l.pool, ...'pad', l.pad, 'stride', l.stride, ...'method', l.method, ...l.opts{:}, ...cudnn{:}) ;case {'normalize', 'lrn'}res(i+1).x = vl_nnnormalize(res(i).x, l.param) ;case 'softmax'res(i+1).x = vl_nnsoftmax(res(i).x) ;case 'loss'res(i+1).x = vl_nnloss(res(i).x, l.class) ;case 'softmaxloss'res(i+1).x = vl_nnsoftmaxloss(res(i).x, l.class) ;case 'relu'if l.leak > 0, leak = {'leak', l.leak} ; else leak = {} ; endres(i+1).x = vl_nnrelu(res(i).x,[],leak{:}) ;case 'sigmoid'res(i+1).x = vl_nnsigmoid(res(i).x) ;...otherwiseerror('Unknown layer type ''%s''.', l.type) ;end

Codes above show the forward propagation’s main idea.

  % optionally forget intermediate resultsforget = opts.conserveMemory & ~(doder & n >= backPropLim) ;if i > 1lp = net.layers{i-1} ;% forget RELU input, even for BPROP% forget为是否保留中间结果res{i+1}.x, net.layers.precious% 为true则保留中间结果forget = forget & (~doder | (strcmp(l.type, 'relu') & ~lp.precious)) ;forget = forget & ~(strcmp(lp.type, 'loss') || strcmp(lp.type, 'softmaxloss')) ;forget = forget & ~lp.precious ;endif forget  %不保存就让这一层的输入置为空res(i).x = [] ;endif gpuMode && opts.syncwait(gpuDevice) ;endres(i).time = toc(res(i).time) ;
end

Backward pass

It seems no explanations are the best explanations. Because it is quite easy.

% -------------------------------------------------------------------------
%                                                             Backward pass
% -------------------------------------------------------------------------if doderres(n+1).dzdx = dzdy ;for i=n:-1:max(1, n-opts.backPropDepth+1)l = net.layers{i} ;res(i).backwardTime = tic ;switch l.typecase 'conv'[res(i).dzdx, dzdw{1}, dzdw{2}] = ...vl_nnconv(res(i).x, l.weights{1}, l.weights{2}, res(i+1).dzdx, ...'pad', l.pad, ...'stride', l.stride, ...l.opts{:}, ...cudnn{:}) ;case 'convt'[res(i).dzdx, dzdw{1}, dzdw{2}] = ...vl_nnconvt(res(i).x, l.weights{1}, l.weights{2}, res(i+1).dzdx, ...'crop', l.crop, ...'upsample', l.upsample, ...'numGroups', l.numGroups, ...l.opts{:}, ...cudnn{:}) ;case 'pool'res(i).dzdx = vl_nnpool(res(i).x, l.pool, res(i+1).dzdx, ...'pad', l.pad, 'stride', l.stride, ...'method', l.method, ...l.opts{:}, ...cudnn{:}) ;case {'normalize', 'lrn'}res(i).dzdx = vl_nnnormalize(res(i).x, l.param, res(i+1).dzdx) ;case 'softmax'res(i).dzdx = vl_nnsoftmax(res(i).x, res(i+1).dzdx) ;case 'loss'res(i).dzdx = vl_nnloss(res(i).x, l.class, res(i+1).dzdx) ;case 'softmaxloss'res(i).dzdx = vl_nnsoftmaxloss(res(i).x, l.class, res(i+1).dzdx) ;case 'relu'if l.leak > 0, leak = {'leak', l.leak} ; else leak = {} ; endif ~isempty(res(i).x)res(i).dzdx = vl_nnrelu(res(i).x, res(i+1).dzdx, leak{:}) ;else% if res(i).x is empty, it has been optimized away, so we use this% hack (which works only for ReLU):res(i).dzdx = vl_nnrelu(res(i+1).x, res(i+1).dzdx, leak{:}) ;endcase 'sigmoid'res(i).dzdx = vl_nnsigmoid(res(i).x, res(i+1).dzdx) ;case 'noffset'res(i).dzdx = vl_nnnoffset(res(i).x, l.param, res(i+1).dzdx) ;case 'spnorm'res(i).dzdx = vl_nnspnorm(res(i).x, l.param, res(i+1).dzdx) ;case 'dropout'if testModeres(i).dzdx = res(i+1).dzdx ;elseres(i).dzdx = vl_nndropout(res(i).x, res(i+1).dzdx, ...'mask', res(i+1).aux) ;endcase 'bnorm'[res(i).dzdx, dzdw{1}, dzdw{2}, dzdw{3}] = ...vl_nnbnorm(res(i).x, l.weights{1}, l.weights{2}, res(i+1).dzdx) ;% multiply the moments update by the number of images in the batch% this is required to make the update additive for subbatches% and will eventually be normalized awaydzdw{3} = dzdw{3} * size(res(i).x,4) ;case 'pdist'res(i).dzdx = vl_nnpdist(res(i).x, l.class, ...l.p, res(i+1).dzdx, ...'noRoot', l.noRoot, ...'epsilon', l.epsilon, ...'aggregate', l.aggregate) ;case 'custom'res(i) = l.backward(l, res(i), res(i+1)) ;end % layersswitch l.type         case {'conv', 'convt', 'bnorm'}if ~opts.accumulateres(i).dzdw = dzdw ;elsefor j=1:numel(dzdw)res(i).dzdw{j} = res(i).dzdw{j} + dzdw{j} ;endenddzdw = [] ;endif opts.conserveMemory && ~net.layers{i}.precious && i ~= nres(i+1).dzdx = [] ;res(i+1).x = [] ;endif gpuMode && opts.syncwait(gpuDevice) ;endres(i).backwardTime = toc(res(i).backwardTime) ;end
end

Looking into functions direct under the folder matlab

Here I mean functions vl_nnxx. I will just take vl__nnsigmoid and vl__nnrelu for example.

function out = vl_nnsigmoid(x,dzdy)
y = 1 ./ (1 + exp(-x));if nargin <= 1 || isempty(dzdy)out = y ;
elseout = dzdy .* (y .* (1 - y)) ;
end

When it come to a layer, formula (1) is bound to be executed out = dzdy .* (y .* (1 - y)) ;The latter part (y .* (1 - y)) is just f ′ ( x ) f^{'}(x) f′(x).

function y = vl_nnrelu(x,dzdy,varargin)
opts.leak = 0 ;
opts = vl_argparse(opts, varargin) ;if opts.leak == 0if nargin <= 1 || isempty(dzdy)y = max(x, 0) ;elsey = dzdy .* (x > 0) ;  % here used formula (1)end
elseif nargin <= 1 || isempty(dzdy)y = x .* (opts.leak + (1 - opts.leak) * (x > 0)) ;elsey = dzdy .* (opts.leak + (1 - opts.leak) * (x > 0)) ;end
end

Why I don’t show code of structures containing weights’ computation. Because they are coded in cuda C for computation speediness. You can find them in matlab\src. That’s why we should compile matconvnet at the very beginning. We need to compile functions like vl_nnconv into mex files to be called by matlab files.

The third of the series will be mainly introduce cnn_train which is quite interesting.

Notes on MatConvNet(II):vl_simplenn相关推荐

Deep Learning for Computer Vision with MATLAB and cuDNN
转载自:Deep Learning for Computer Vision with MATLAB and cuDNN | Parallel Forall http://devblogs.nvidia ...
【Notes】《Thinking in Java》【Chapter 11】Part II
六.Typical uses of I/O streams java.io.;java.io.;java.io.;java.io.;java.io.;java.io.;java.io.;java.io ...
SAP RETAIL 补货类型RF之初探 II
SAP RETAIL 补货类型RF之初探 II 1, 补货参数 – TargetCoverage字段不能为空. 如果不维护这个参数值,系统会提示:Target stock calculated dyn ...
使用matlab版卷及神经网络 MatconvNe和预训练的imageNet进行图像检Image retrieval using MatconvNet and pre-trained imageNet
代码:CNN-for-Image-Retrieval. 2015/12/31更新:添加对MatConvNet最新版version 1.0-beta17的支持,预训练的模型请到Matconvnet官网下 ...
【MatConvnet速成】MatConvnet图像分类从模型自定义到测试
欢迎来到专栏<2小时玩转开源框架系列>,这是我们第10篇,前面已经说过了caffe,tensorflow,pytorch,mxnet,keras,paddlepaddle,cntk,cha ...
利用MatConvNet进行孪生多分支网络设计
前面提及到了利用vl_nndist作为多分支网络的特征测度函数,将多个网络的局部输出融合到一起.参见博客:https://blog.csdn.net/shenziheng1/article/detai ...
关于Matconvnet中模型发布与共享的思考
1. 前言现阶段还坚持使用Matconvnet工具箱的一定是MATLAB的死粉.与TensorFlow.Keras相比,matconvnet的便捷性不够.但是,matlab与matconvnet提供 ...
MatConvnet工具箱文档翻译理解（4）
第3章包装器和预训练模型这是很容易的组合第4章"手动"的计算块,但通常通过一个包装器来使用它们可以实现CNN架构给定一个模型规范通常更方便,可用的包装器总结在3.1节. Ma ...
MatConvnet工具箱文档翻译理解（2）
1.2 MatConvNet一目了然 MatConvNet具有简单的设计理念. 它不是将CNN包裹在软件的复杂层上,而是暴露了直接作为MATLAB命令的计算CNN构造块的简单函数,例如线性卷积和ReL ...

Notes on MatConvNet(II):vl_simplenn