PCA and Whitening Exercise

本文将对主成分分析(Pricipal Components Analaysis) 和白化(Whitening) 两种数据预处理方法做实验分析。理论参考文档：http://deeplearning.stanford.edu/wiki/index.php/PCA，实验数据：http://deeplearning.stanford.edu/wiki/index.php/Exercise:PCA_and_Whitening。

主成分分析是对输入数据降维的一个过程，就是通过一个正交变换矩阵将输入数据映射到另外一个多维坐标系下，而该正交变换矩阵即是输入数据协方差矩阵的特征向量的集合。所谓的映射就是数据在各特征向量上的投影。通过选取前k 个特征向量（对应前k 个数据变化的主方向）完成降维的功能。

白化是解决输入数据冗余问题。对于输入时自然图像来说，输入特征（像素）与周边特征大多是相关的，白化的目的就是降低特征之间的相关性，并且让所有特征具有相同的方差。消除相关性通过主成分分析法已解决，对于让所有特征具有相同方差，可以通过来缩放每个特征，其中λ 表示输入数据协方差矩阵的特征值，i 表示特征的维数，ε 是由于特征值太小，容易除法溢出而添加的一个正规化量。

上面讲的白化是PCA 的白化，对于ZCA(Zero Components Analaysis) 白化是对PCA 白化结果进行一个旋转处理即可。这样处理后的数据更加倾向于原始数据。

对于ZCA 白化与PCA 白化的比较，详见：http://stats.stackexchange.com/questions/117427/what-is-the-difference-between-zca-whitening-and-pca-whitening

实验代码如下：

%%================================================================
clc, clear, close all;
%% Step 0a: Load data
%  Here we provide the code to load natural image data into x.
%  x will be a 144 * 10000 matrix, where the kth column x(:, k) corresponds to
%  the raw image data from the kth 12x12 image patch sampled.
%  You do not need to change the code below.x = sampleIMAGESRAW();
figure('name','Raw images');
randsel = randi(size(x,2),200,1); % A random selection of samples for visualization
display_network(x(:,randsel));%%================================================================
%% Step 0b: Zero-mean the data (by row)
%  You can make use of the mean and repmat/bsxfun functions.% -------------------- YOUR CODE HERE --------------------
avg = mean(x, 1);
x = x - repmat(avg, size(x, 1), 1);%%================================================================
%% Step 1a: Implement PCA to obtain xRot
%  Implement PCA to obtain xRot, the matrix in which the data is expressed
%  with respect to the eigenbasis of sigma, which is the matrix U.% -------------------- YOUR CODE HERE --------------------
xRot = zeros(size(x)); % You need to compute this
[U, S, V] = svd(x * x' ./ size(x, 2));
xRot = U' * x;%%================================================================
%% Step 1b: Check your implementation of PCA
%  The covariance matrix for the data expressed with respect to the basis U
%  should be a diagonal matrix with non-zero entries only along the main
%  diagonal. We will verify this here.
%  Write code to compute the covariance matrix, covar.
%  When visualised as an image, you should see a straight line across the
%  diagonal (non-zero entries) against a blue background (zero entries).% -------------------- YOUR CODE HERE --------------------
covar = zeros(size(x, 1)); % You need to compute this
covar = cov(xRot');% Visualise the covariance matrix. You should see a line across the
% diagonal against a blue background.
figure('name','Visualisation of covariance matrix');
imagesc(covar);%%================================================================
%% Step 2: Find k, the number of components to retain
%  Write code to determine k, the number of components to retain in order
%  to retain at least 99% of the variance.% -------------------- YOUR CODE HERE --------------------
k = 0; % Set k accordingly
covariance_k = cumsum(diag(S)) ./ sum(diag(S));
k = min(find(covariance_k >= 0.99));%%================================================================
%% Step 3: Implement PCA with dimension reduction
%  Now that you have found k, you can reduce the dimension of the data by
%  discarding the remaining dimensions. In this way, you can represent the
%  data in k dimensions instead of the original 144, which will save you
%  computational time when running learning algorithms on the reduced
%  representation.
%
%  Following the dimension reduction, invert the PCA transformation to produce
%  the matrix xHat, the dimension-reduced data with respect to the original basis.
%  Visualise the data and compare it to the raw data. You will observe that
%  there is little loss due to throwing away the principal components that
%  correspond to dimensions with low variation.% -------------------- YOUR CODE HERE --------------------
xHat = zeros(size(x));  % You need to compute this
xHat = U(:, 1 : k) * xRot(1 : k, :);% Visualise the data, and compare it to the raw data
% You should observe that the raw and processed data are of comparable quality.
% For comparison, you may wish to generate a PCA reduced image which
% retains only 90% of the variance.figure('name',['PCA processed images ',sprintf('(%d / %d dimensions)', k, size(x, 1)),'']);
display_network(xHat(:,randsel));
% For comparison, retains only 90% of the variance
k1 = min(find(covariance_k >= 0.90));
xHat1 = zeros(size(x));  % You need to compute this
xHat1 = U(:, 1 : k1) * xRot(1 : k1, :);
figure('name',['PCA processed images ',sprintf('(%d / %d dimensions)', k1, size(x, 1)),'']);
display_network(xHat1(:,randsel));figure('name','Raw images');
display_network(x(:,randsel));%%================================================================
%% Step 4a: Implement PCA with whitening and regularisation
%  Implement PCA with whitening and regularisation to produce the matrix
%  xPCAWhite. epsilon = [0.01, 0.1, 1];
for i = 1 : length(epsilon)
xPCAWhite = zeros(size(x));
% -------------------- YOUR CODE HERE --------------------
xPCAWhite = diag(1 ./ sqrt(diag(S) + epsilon(i))) * xRot;%%================================================================
%% Step 4b: Check your implementation of PCA whitening
%  Check your implementation of PCA whitening with and without regularisation.
%  PCA whitening without regularisation results a covariance matrix
%  that is equal to the identity matrix. PCA whitening with regularisation
%  results in a covariance matrix with diagonal entries starting close to
%  1 and gradually becoming smaller. We will verify these properties here.
%  Write code to compute the covariance matrix, covar.
%
%  Without regularisation (set epsilon to 0 or close to 0),
%  when visualised as an image, you should see a red line across the
%  diagonal (one entries) against a blue background (zero entries).
%  With regularisation, you should see a red line that slowly turns
%  blue across the diagonal, corresponding to the one entries slowly
%  becoming smaller.% -------------------- YOUR CODE HERE --------------------
covar = cov(xPCAWhite');% Visualise the covariance matrix. You should see a red line across the
% diagonal against a blue background.
figure('name','Visualisation of covariance matrix');
title(['epsilon = ' num2str(epsilon(i))]);
imagesc(covar);%%================================================================
%% Step 5: Implement ZCA whitening
%  Now implement ZCA whitening to produce the matrix xZCAWhite.
%  Visualise the data and compare it to the raw data. You should observe
%  that whitening results in, among other things, enhanced edges.xZCAWhite = zeros(size(x));% -------------------- YOUR CODE HERE --------------------
xZCAWhite = U * xPCAWhite;
% Visualise the data, and compare it to the raw data.
% You should observe that the whitened images have enhanced edges.
figure('name','ZCA whitened images');
display_network(xZCAWhite(:,randsel));
title(['epsilon = ' num2str(epsilon(i))]);
end
figure('name','Raw images');
display_network(x(:,randsel));

其效果如下：

原始图片集：

均值之后的图片集：

保留99%的方差后，PCA还原图片集：

ZCA 白化后的图片集（针对不同的epsilon值）：

相比于原始图片集：

可以看出，加入ε 参量能够起到低通滤波（除噪）效果，但是也不宜过大，否则边缘（特征）将会模糊掉。

PCA and Whitening Exercise相关推荐

UFLDL教程：Exercise:PCA in 2D PCA and Whitening
相关文章 PCA的原理及MATLAB实现 UFLDL教程:Exercise:PCA in 2D & PCA and Whitening python-A comparison of vario ...
Deep learning：十一(PCA和whitening在二维数据中的练习)
前言: 这节主要是练习下PCA,PCA Whitening以及ZCA Whitening在2D数据上的使用,2D的数据集是45个数据点,每个数据点是2维的.参考的资料是:Exercise:PCA in ...
Deep learning：十(PCA和whitening)
PCA: PCA的具有2个功能,一是维数约简(可以加快算法的训练速度,减小内存消耗等),一是数据的可视化. PCA并不是线性回归,因为线性回归是保证得到的函数是y值方面误差最小,而PCA是保证得到的函 ...
PCA和whitening
PCA: PCA的具有2个功能,一是维数约简(可以加快算法的训练速度,减小内存消耗等),一是数据的可视化. PCA并不是线性回归,因为线性回归是保证得到的函数是y值方面误差最小,而PCA是保证得到的函 ...
数据预处理之白化（Whitening transformation）
版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明. 本文链接: https://blog.csdn.net/hjimce/article/deta ...
Python数据处理 PCA/ZCA 白化(UFLDL教程:Exercise:PCA_in_2DPCA_and_Whitening)
Python数据处理 PCA/ZCA 白化参考材料 PCA.白化以及一份别人的课后作业答案 UFLDL教程答案(3):Exercise:PCA_in_2D&PCA_and_Whitenin ...
PCA的原理及MATLAB实现
相关文章 PCA的原理及MATLAB实现 UFLDL教程:Exercise:PCA in 2D & PCA and Whitening python-A comparison of vario ...
deeplearning URL
Deep learning:五十一(CNN的反向求导及练习) 摘要: 前言: CNN作为DL中最成功的模型之一,有必要对其更进一步研究它.虽然在前面的博文Stacked CNN简单介绍中有大概介绍过C ...
Deep Learning 教程（斯坦福深度学习研究团队）
http://www.zhizihua.com/blog/post/602.html 说明:本教程将阐述无监督特征学习和深度学习的主要观点.通过学习,你也将实现多个功能学习/深度学习算法,能看到它们为 ...

PCA and Whitening Exercise

PCA and Whitening Exercise相关推荐

最新文章

热门文章