sklearn - Dimensionality reduction
参考:
1、http://scikit-learn.org/stable/
2、http://scikit-learn.org/stable/modules/decomposition.html#decompositions
2.5. Decomposing signals in components (matrix factorization problems)
2.5.1. Principal component analysis (PCA)
2.5.1.1. Exact PCA and probabilistic interpretation
PCA用于分解解释最大量方差的一组连续正交分量中的多变量数据集。 在scikit-learn中,PCA
被实现为一个变压器对象,它可以在其拟合方法中学习n个组件,并且可以使用新数据来投影这些组件。
可选参数whiten=True使得可以将数据投影到单个空间上,同时将每个组件缩放到单位方差。 如果下游模型对信号的各向同性作出强烈的假设,这通常是有用的:例如,具有RBF内核和K-Means聚类算法的支持向量机的情况。
以下是 iris 数据集的一个例子,它由4个特征组成,它们在二维上预测出可解释大多数方差:
PCA
对象还提供了PCA的概率解释,可以根据其解释的方差量给出数据的可能性。 因此,它实现了可用于交叉验证的分数方法:
Examples:
- Comparison of LDA and PCA 2D projection of Iris dataset
- Model selection with Probabilistic PCA and Factor Analysis (FA)
Comparison of LDA and PCA 2D projection of Iris dataset
应用于此数据的主成分分析(PCA)标识了数据中最大差异的属性(主要组分或特征空间中的方向)的组合。 这里我们绘制2个第一主成分的不同样本。
线性判别分析(LDA)尝试识别考虑类之间差异最大的属性。 特别地,LDA与PCA相反,是使用已知类标签的监督方法。
print(__doc__)import matplotlib.pyplot as pltfrom sklearn import datasets from sklearn.decomposition import PCA from sklearn.discriminant_analysis import LinearDiscriminantAnalysisiris = datasets.load_iris()X = iris.data y = iris.target target_names = iris.target_namespca = PCA(n_components=2) X_r = pca.fit(X).transform(X)lda = LinearDiscriminantAnalysis(n_components=2) X_r2 = lda.fit(X, y).transform(X)# Percentage of variance explained for each components print('explained variance ratio (first two components): %s' % str(pca.explained_variance_ratio_))plt.figure() colors = ['navy', 'turquoise', 'darkorange'] lw = 2 for color, i, target_name in zip(colors, [0, 1, 2], target_names):plt.scatter(X_r[y == i, 0], X_r[y == i, 1], color=color, alpha=.8, lw=lw, label=target_name) plt.legend(loc='best', shadow=False, scatterpoints=1) plt.title('PCA of IRIS dataset')plt.figure() for color, i, target_name in zip(colors, [0, 1, 2], target_names):plt.scatter(X_r2[y == i, 0], X_r2[y == i, 1], alpha=.8, color=color, label=target_name) plt.legend(loc='best', shadow=False, scatterpoints=1) plt.title('LDA of IRIS dataset')plt.show()
Model selection with Probabilistic PCA and Factor Analysis (FA)
# Authors: Alexandre Gramfort # Denis A. Engemann # License: BSD 3 clause import numpy as np import matplotlib.pyplot as plt from scipy import linalgfrom sklearn.decomposition import PCA, FactorAnalysis from sklearn.covariance import ShrunkCovariance, LedoitWolf from sklearn.model_selection import cross_val_score from sklearn.model_selection import GridSearchCVprint(__doc__)# ############################################################################# # Create the data n_samples, n_features, rank = 1000, 50, 10 sigma = 1. rng = np.random.RandomState(42) U, _, _ = linalg.svd(rng.randn(n_features, n_features)) X = np.dot(rng.randn(n_samples, rank), U[:, :rank].T)# Adding homoscedastic noise X_homo = X + sigma * rng.randn(n_samples, n_features)# Adding heteroscedastic noise sigmas = sigma * rng.rand(n_features) + sigma / 2. X_hetero = X + rng.randn(n_samples, n_features) * sigmas# ############################################################################# # Fit the models n_components = np.arange(0, n_features, 5) # options for n_components def compute_scores(X):pca = PCA(svd_solver='full')fa = FactorAnalysis()pca_scores, fa_scores = [], []for n in n_components:pca.n_components = nfa.n_components = npca_scores.append(np.mean(cross_val_score(pca, X)))fa_scores.append(np.mean(cross_val_score(fa, X)))return pca_scores, fa_scoresdef shrunk_cov_score(X):shrinkages = np.logspace(-2, 0, 30)cv = GridSearchCV(ShrunkCovariance(), {'shrinkage': shrinkages})return np.mean(cross_val_score(cv.fit(X).best_estimator_, X))def lw_score(X):return np.mean(cross_val_score(LedoitWolf(), X))for X, title in [(X_homo, 'Homoscedastic Noise'), (X_hetero, 'Heteroscedastic Noise')]:pca_scores, fa_scores = compute_scores(X)n_components_pca = n_components[np.argmax(pca_scores)]n_components_fa = n_components[np.argmax(fa_scores)]pca = PCA(svd_solver='full', n_components='mle')pca.fit(X)n_components_pca_mle = pca.n_components_print("best n_components by PCA CV = %d" % n_components_pca)print("best n_components by FactorAnalysis CV = %d" % n_components_fa)print("best n_components by PCA MLE = %d" % n_components_pca_mle)plt.figure()plt.plot(n_components, pca_scores, 'b', label='PCA scores')plt.plot(n_components, fa_scores, 'r', label='FA scores')plt.axvline(rank, color='g', label='TRUTH: %d' % rank, linestyle='-')plt.axvline(n_components_pca, color='b', label='PCA CV: %d' % n_components_pca, linestyle='--')plt.axvline(n_components_fa, color='r', label='FactorAnalysis CV: %d' % n_components_fa, linestyle='--')plt.axvline(n_components_pca_mle, color='k', label='PCA MLE: %d' % n_components_pca_mle, linestyle='--')# compare with other covariance estimators plt.axhline(shrunk_cov_score(X), color='violet', label='Shrunk Covariance MLE', linestyle='-.')plt.axhline(lw_score(X), color='orange', label='LedoitWolf MLE' % n_components_pca_mle, linestyle='-.')plt.xlabel('nb of components')plt.ylabel('CV scores')plt.legend(loc='lower right')plt.title(title)plt.show()
2.5.1.2. Incremental PCA
PCA
对象非常有用,但对大型数据集有一定限制。 最大的限制是PCA只支持批处理,这意味着要处理的所有数据必须适合主内存。 IncrementalPCA
对象使用不同的处理形式,并允许部分计算几乎完全匹配PCA的结果,同时以迷你形式处理数据。 IncrementalPCA使得可以通过以下方式实现核心主成分分析:
对于从本地硬盘驱动器或网络数据库顺序提取的数据块,使用partial_fit
方法。
使用numpy.memmap在内存映射文件上调用其拟合方法。
IncrementalPCA仅存储组件和噪声方差的估计,按顺序更新explained_variance_ratio_。 这就是为什么内存使用取决于每个批次的样本数,而不是数据集中要处理的样本数。
Examples:
- Incremental PCA
print(__doc__)# Authors: Kyle Kastner # License: BSD 3 clause import numpy as np import matplotlib.pyplot as pltfrom sklearn.datasets import load_iris from sklearn.decomposition import PCA, IncrementalPCAiris = load_iris() X = iris.data y = iris.targetn_components = 2 ipca = IncrementalPCA(n_components=n_components, batch_size=10) X_ipca = ipca.fit_transform(X)pca = PCA(n_components=n_components) X_pca = pca.fit_transform(X)colors = ['navy', 'turquoise', 'darkorange']for X_transformed, title in [(X_ipca, "Incremental PCA"), (X_pca, "PCA")]:plt.figure(figsize=(8, 8))for color, i, target_name in zip(colors, [0, 1, 2], iris.target_names):plt.scatter(X_transformed[y == i, 0], X_transformed[y == i, 1], color=color, lw=2, label=target_name)if "Incremental" in title:err = np.abs(np.abs(X_pca) - np.abs(X_ipca)).mean()plt.title(title + " of iris dataset\nMean absolute unsigned error " "%.6f" % err)else:plt.title(title + " of iris dataset")plt.legend(loc="best", shadow=False, scatterpoints=1)plt.axis([-4, 4, -1.5, 1.5])plt.show()
2.5.1.3. PCA using randomized SVD
将数据投影到保留大部分方差的较低维空间,通过丢弃与较低奇异值相关联的分量的奇异向量,通常很有意义。
例如,如果我们使用64x64像素的灰度级图像进行人脸识别,则数据的维度为4096,并且在这样大的数据上训练RBF支持向量机的速度很慢。此外,我们知道数据的固有维数远低于4096,因为人脸的所有照片都看起来有点相似。样品位于更低维度的歧管上(例如约200左右)。 PCA算法可以用于线性变换数据,同时降低维数,同时保留大多数解释的方差。
在这种情况下,与可选参数svd_solver='randomized'一起使用的PCA
类非常有用:因为我们将要删除大部分奇异向量,所以将计算限制为单个向量的近似估计将更有效,我们将保持实际执行转换。
注意:即使在whiten = False(默认)时,使用svd_solver ='randomized'的PCA中的inverse_transform的实现也不是transform
的精确逆变换。
Examples:
- Faces recognition example using eigenfaces and SVMs
- Faces dataset decompositions
Faces recognition example using eigenfaces and SVMs
from __future__ import print_functionfrom time import time import logging import matplotlib.pyplot as pltfrom sklearn.model_selection import train_test_split from sklearn.model_selection import GridSearchCV from sklearn.datasets import fetch_lfw_people from sklearn.metrics import classification_report from sklearn.metrics import confusion_matrix from sklearn.decomposition import PCA from sklearn.svm import SVCprint(__doc__)# Display progress logs on stdout logging.basicConfig(level=logging.INFO, format='%(asctime)s %(message)s')# ############################################################################# # Download the data, if not already on disk and load it as numpy arrays lfw_people = fetch_lfw_people(min_faces_per_person=70, resize=0.4)# introspect the images arrays to find the shapes (for plotting) n_samples, h, w = lfw_people.images.shape# for machine learning we use the 2 data directly (as relative pixel # positions info is ignored by this model) X = lfw_people.data n_features = X.shape[1]# the label to predict is the id of the person y = lfw_people.target target_names = lfw_people.target_names n_classes = target_names.shape[0]print("Total dataset size:") print("n_samples: %d" % n_samples) print("n_features: %d" % n_features) print("n_classes: %d" % n_classes)# ############################################################################# # Split into a training set and a test set using a stratified k fold # split into a training and testing set X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)# ############################################################################# # Compute a PCA (eigenfaces) on the face dataset (treated as unlabeled # dataset): unsupervised feature extraction / dimensionality reduction n_components = 150 print("Extracting the top %d eigenfaces from %d faces" % (n_components, X_train.shape[0])) t0 = time() pca = PCA(n_components=n_components, svd_solver='randomized', whiten=True).fit(X_train) print("done in %0.3fs" % (time() - t0))eigenfaces = pca.components_.reshape((n_components, h, w))print("Projecting the input data on the eigenfaces orthonormal basis") t0 = time() X_train_pca = pca.transform(X_train) X_test_pca = pca.transform(X_test) print("done in %0.3fs" % (time() - t0))# ############################################################################# # Train a SVM classification model print("Fitting the classifier to the training set") t0 = time() param_grid = {'C': [1e3, 5e3, 1e4, 5e4, 1e5], 'gamma': [0.0001, 0.0005, 0.001, 0.005, 0.01, 0.1], } clf = GridSearchCV(SVC(kernel='rbf', class_weight='balanced'), param_grid) clf = clf.fit(X_train_pca, y_train) print("done in %0.3fs" % (time() - t0)) print("Best estimator found by grid search:") print(clf.best_estimator_)# ############################################################################# # Quantitative evaluation of the model quality on the test set print("Predicting people's names on the test set") t0 = time() y_pred = clf.predict(X_test_pca) print("done in %0.3fs" % (time() - t0))print(classification_report(y_test, y_pred, target_names=target_names)) print(confusion_matrix(y_test, y_pred, labels=range(n_classes)))# ############################################################################# # Qualitative evaluation of the predictions using matplotlib def plot_gallery(images, titles, h, w, n_row=3, n_col=4):"""Helper function to plot a gallery of portraits""" plt.figure(figsize=(1.8 * n_col, 2.4 * n_row))plt.subplots_adjust(bottom=0, left=.01, right=.99, top=.90, hspace=.35)for i in range(n_row * n_col):plt.subplot(n_row, n_col, i + 1)plt.imshow(images[i].reshape((h, w)), cmap=plt.cm.gray)plt.title(titles[i], size=12)plt.xticks(())plt.yticks(())# plot the result of the prediction on a portion of the test set def title(y_pred, y_test, target_names, i):pred_name = target_names[y_pred[i]].rsplit(' ', 1)[-1]true_name = target_names[y_test[i]].rsplit(' ', 1)[-1]return 'predicted: %s\ntrue: %s' % (pred_name, true_name)prediction_titles = [title(y_pred, y_test, target_names, i)for i in range(y_pred.shape[0])]plot_gallery(X_test, prediction_titles, h, w)# plot the gallery of the most significative eigenfaces eigenface_titles = ["eigenface %d" % i for i in range(eigenfaces.shape[0])] plot_gallery(eigenfaces, eigenface_titles, h, w)plt.show()
Faces dataset decompositions
print(__doc__)# Authors: Vlad Niculae, Alexandre Gramfort # License: BSD 3 clause import logging from time import timefrom numpy.random import RandomState import matplotlib.pyplot as pltfrom sklearn.datasets import fetch_olivetti_faces from sklearn.cluster import MiniBatchKMeans from sklearn import decomposition# Display progress logs on stdout logging.basicConfig(level=logging.INFO, format='%(asctime)s %(levelname)s %(message)s') n_row, n_col = 2, 3 n_components = n_row * n_col image_shape = (64, 64) rng = RandomState(0)# ############################################################################# # Load faces data dataset = fetch_olivetti_faces(shuffle=True, random_state=rng) faces = dataset.datan_samples, n_features = faces.shape# global centering faces_centered = faces - faces.mean(axis=0)# local centering faces_centered -= faces_centered.mean(axis=1).reshape(n_samples, -1)print("Dataset consists of %d faces" % n_samples)def plot_gallery(title, images, n_col=n_col, n_row=n_row):plt.figure(figsize=(2. * n_col, 2.26 * n_row))plt.suptitle(title, size=16)for i, comp in enumerate(images):plt.subplot(n_row, n_col, i + 1)vmax = max(comp.max(), -comp.min())plt.imshow(comp.reshape(image_shape), cmap=plt.cm.gray, interpolation='nearest', vmin=-vmax, vmax=vmax)plt.xticks(())plt.yticks(())plt.subplots_adjust(0.01, 0.05, 0.99, 0.93, 0.04, 0.)# ############################################################################# # List of the different estimators, whether to center and transpose the # problem, and whether the transformer uses the clustering API. estimators = [('Eigenfaces - PCA using randomized SVD', decomposition.PCA(n_components=n_components, svd_solver='randomized', whiten=True), True), ('Non-negative components - NMF', decomposition.NMF(n_components=n_components, init='nndsvda', tol=5e-3), False), ('Independent components - FastICA', decomposition.FastICA(n_components=n_components, whiten=True), True), ('Sparse comp. - MiniBatchSparsePCA', decomposition.MiniBatchSparsePCA(n_components=n_components, alpha=0.8, n_iter=100, batch_size=3, random_state=rng), True), ('MiniBatchDictionaryLearning', decomposition.MiniBatchDictionaryLearning(n_components=15, alpha=0.1, n_iter=50, batch_size=3, random_state=rng), True), ('Cluster centers - MiniBatchKMeans', MiniBatchKMeans(n_clusters=n_components, tol=1e-3, batch_size=20, max_iter=50, random_state=rng), True), ('Factor Analysis components - FA', decomposition.FactorAnalysis(n_components=n_components, max_iter=2), True), ]# ############################################################################# # Plot a sample of the input data plot_gallery("First centered Olivetti faces", faces_centered[:n_components])# ############################################################################# # Do the estimation and plot it for name, estimator, center in estimators:print("Extracting the top %d %s..." % (n_components, name))t0 = time()data = facesif center:data = faces_centeredestimator.fit(data)train_time = (time() - t0)print("done in %0.3fs" % train_time)if hasattr(estimator, 'cluster_centers_'):components_ = estimator.cluster_centers_else:components_ = estimator.components_# Plot an image representing the pixelwise variance provided by the # estimator e.g its noise_variance_ attribute. The Eigenfaces estimator, # via the PCA decomposition, also provides a scalar noise_variance_ # (the mean of pixelwise variance) that cannot be displayed as an image # so we skip it. if (hasattr(estimator, 'noise_variance_') and estimator.noise_variance_.ndim > 0): # Skip the Eigenfaces case plot_gallery("Pixelwise variance", estimator.noise_variance_.reshape(1, -1), n_col=1, n_row=1)plot_gallery('%s - Train time %.1fs' % (name, train_time), components_[:n_components])plt.show()
2.5.1.4. Kernel PCA
KernelPCA
是通过使用内核实现非线性维数降低的PCA的扩展( Pairwise metrics, Affinities and Kernels )。 它具有许多应用,包括去噪,压缩和结构预测(内核依赖估计)。 KernelPCA支持 transform
and inverse_transform
。
Examples:
- Kernel PCA
print(__doc__)# Authors: Mathieu Blondel # Andreas Mueller # License: BSD 3 clause import numpy as np import matplotlib.pyplot as pltfrom sklearn.decomposition import PCA, KernelPCA from sklearn.datasets import make_circlesnp.random.seed(0)X, y = make_circles(n_samples=400, factor=.3, noise=.05)kpca = KernelPCA(kernel="rbf", fit_inverse_transform=True, gamma=10) X_kpca = kpca.fit_transform(X) X_back = kpca.inverse_transform(X_kpca) pca = PCA() X_pca = pca.fit_transform(X)# Plot results plt.figure() plt.subplot(2, 2, 1, aspect='equal') plt.title("Original space") reds = y == 0 blues = y == 1 plt.scatter(X[reds, 0], X[reds, 1], c="red", s=20, edgecolor='k') plt.scatter(X[blues, 0], X[blues, 1], c="blue", s=20, edgecolor='k') plt.xlabel("$x_1$") plt.ylabel("$x_2$")X1, X2 = np.meshgrid(np.linspace(-1.5, 1.5, 50), np.linspace(-1.5, 1.5, 50)) X_grid = np.array([np.ravel(X1), np.ravel(X2)]).T # projection on the first principal component (in the phi space) Z_grid = kpca.transform(X_grid)[:, 0].reshape(X1.shape) plt.contour(X1, X2, Z_grid, colors='grey', linewidths=1, origin='lower')plt.subplot(2, 2, 2, aspect='equal') plt.scatter(X_pca[reds, 0], X_pca[reds, 1], c="red", s=20, edgecolor='k') plt.scatter(X_pca[blues, 0], X_pca[blues, 1], c="blue", s=20, edgecolor='k') plt.title("Projection by PCA") plt.xlabel("1st principal component") plt.ylabel("2nd component")plt.subplot(2, 2, 3, aspect='equal') plt.scatter(X_kpca[reds, 0], X_kpca[reds, 1], c="red", s=20, edgecolor='k') plt.scatter(X_kpca[blues, 0], X_kpca[blues, 1], c="blue", s=20, edgecolor='k') plt.title("Projection by KPCA") plt.xlabel("1st principal component in space induced by $\phi$") plt.ylabel("2nd component")plt.subplot(2, 2, 4, aspect='equal') plt.scatter(X_back[reds, 0], X_back[reds, 1], c="red", s=20, edgecolor='k') plt.scatter(X_back[blues, 0], X_back[blues, 1], c="blue", s=20, edgecolor='k') plt.title("Original space after inverse transform") plt.xlabel("$x_1$") plt.ylabel("$x_2$")plt.subplots_adjust(0.02, 0.10, 0.98, 0.94, 0.04, 0.35)plt.show()
2.5.1.5. Sparse principal components analysis (SparsePCA and MiniBatchSparsePCA)
SparsePCA
是PCA的一个变体,其目的是提取最能重建数据的稀疏组件集合。
迷你批量稀疏PCA(MiniBatchSparsePCA
)是SparsePCA
的一个变体,它更快,但不太准确。 通过迭代一组特征的小块来达到增加的速度,对于给定的迭代次数。
主成分分析(PCA)的缺点是通过该方法提取的成分具有独特的密集表达式,即当表示为原始变量的线性组合时,它们具有非零系数。 这可以使解释变得困难。 在许多情况下,真正的基础组件可以更自然地想象为稀疏向量; 例如在面部识别中,组件可能自然地映射到面部的部分。
稀疏的主成分产生更简洁,可解释的表示,明确强调哪些原始特征有助于样本之间的差异。
注意;虽然本着在线算法的精神,MiniBatchSparsePCA
类没有实现partial_fit
,因为算法沿着要素方向在线,而不是样本方向。
Examples:
- Faces dataset decompositions
2.5.2. Truncated singular value decomposition and latent semantic analysis
TruncatedSVD
实现了仅计算k个最大奇异值的奇异值分解(SVD)的变体,其中k是用户指定的参数。
当截断的SVD被应用于术语文档矩阵(由CountVectorizer或TfidfVectorizer返回)时,该变换被称为潜在语义分析(LSA),因为它将这样的矩阵转换为低维度的“语义”空间。 特别地,LSA已知能够抵抗同义词和多义词的影响(两者大致意味着每个单词有多重含义),这导致术语文档矩阵过度稀疏,并且在诸如余弦相似性的度量下表现出差的相似性。
Examples:
- Clustering text documents using k-means
2.5.3. Dictionary Learning
2.5.3.1. Sparse coding with a precomputed dictionary
SparseCoder
对象是一种估计器,可用于将信号转换为来自固定的预计算字典(例如离散小波)的原子的稀疏线性组合。 因此,该对象不实现拟合方法。 该转换相当于稀疏编码问题:将数据的表示尽可能少的字典原子的线性组合。 字典学习的所有变体实现以下变换方法,可通过transform_method初始化参数进行控制:
正交匹配追踪(正交匹配追踪(OMP))
最小角度回归(最小角度回归)
Lasso通过最小角度回归计算
Lasso使用坐标下降(Lasso)
阈值
Examples:
- Sparse coding with a precomputed dictionary
2.5.3.2. Generic dictionary learning
词典学习( DictionaryLearning
)是一个矩阵因式分解问题,相当于找到一个(通常是不完整的)字典,它会对拟合数据进行稀疏编码。
Examples:
- Image denoising using dictionary learning
2.5.3.3. Mini-batch dictionary learning
MiniBatchDictionaryLearning
实现了更适合大型数据集的字典学习算法的更快,但不太准确的版本。
默认情况下,MiniBatchDictionaryLearning
将数据分成小批量,并通过在指定次数的迭代中循环使用小批量,以在线方式进行优化。 但是,目前它没有实现停止条件。
估计器还实现了partial_fit,它通过在一个迷你批处理中仅迭代一次来更新字典。 当数据从一开始就不容易获得,或者当数据不适合内存时,这可以用于在线学习。
Example: Online learning of a dictionary of parts of faces
2.5.4. Factor Analysis
Examples:
- Model selection with Probabilistic PCA and Factor Analysis (FA)
2.5.5. Independent component analysis (ICA)
独立分量分析将多变量信号分解为最大独立的加性子组件。 它使用Fast ICA
算法在scikit-learn中实现。 通常,ICA不用于降低维度,而是用于分离叠加信号。 由于ICA模型不包括噪声项,因此要使模型正确,必须应用美白。 这可以在内部使用whiten参数或手动使用其中一种PCA变体进行。
通常用于分离混合信号(称为盲源分离的问题),如下例所示:
Examples:
- Blind source separation using FastICA
- FastICA on 2D point clouds
- Faces dataset decompositions
2.5.6. Non-negative matrix factorization (NMF or NNMF)
2.5.6.1. NMF with the Frobenius norm
2.5.6.2. NMF with a beta-divergence
Examples:
- Faces dataset decompositions
- Topic extraction with Non-negative Matrix Factorization and Latent Dirichlet Allocation
- Beta-divergence loss functions
2.5.7. Latent Dirichlet Allocation (LDA)
Latent Dirichlet Allocation是离散数据集(如文本语料库)的集合的生成概率模型。 它也是一个主题模型,用于从文档集合中发现抽象主题。
Examples:
- Topic extraction with Non-negative Matrix Factorization and Latent Dirichlet Allocation
sklearn - Dimensionality reduction相关推荐
- Dimensionality Reduction
Dimensionality Reduction --Hands-on Machine Learning with Scikit-Learn and TensorFlow -Chapter 8 Int ...
- 文献记录(part17)--VARCLUST: clustering variables using dimensionality reduction
学习笔记,仅供参考,有错必究 关键词:子空间聚类,降维,主成分分析,贝叶斯信息准则,k-centroids VARCLUST: clustering variables using dimension ...
- An Indexable Time Series Dimensionality Reduction Method for Maximum Deviation Reduction and Simi...
An Indexable Time Series Dimensionality Reduction Method for Maximum Deviation Reduction and Simi... ...
- Dimensionality Reduction - Principle Component Analysis problem formulation
摘要: 本文是吴恩达 (Andrew Ng)老师<机器学习>课程,第十五章<降维>中第117课时<主成分分析问题规划>的视频原文字幕.为本人在视频学习过程中记录下来 ...
- [论文翻译]A Global Geometric Framework for Nonlinear Dimensionality Reduction
论文题目:A Global Geometric Framework for Nonlinear Dimensionality Reduction 论文来源:Science 290, 2319 (200 ...
- Review of Feature Selection, Dimensionality Reduction and Classification for Chronic Disease Diagnos
Review of Feature Selection, Dimensionality Reduction and Classification for Chronic Disease Diagnos ...
- 【论文阅读】Dimensionality Reduction by Learning an Invariant Mapping
1.背景 对比学习算是比较早就已经提出了一种技术.其中,早期比较有名的一篇文章就是Lecun等在<Dimensionality Reduction by Learning an Invarian ...
- 【论文翻译】:Nonlinear Dimensionality Reduction by Locally Linear Embedding
[论文题目]:Nonlinear Dimensionality Reduction by Locally Linear Embedding [论文来源]:Nonlinear Dimensionalit ...
- 【论文翻译】Nonlinear Dimensionality Reduction by Locally Linear Embedding
论文题目:Nonlinear Dimensionality Reduction by Locally Linear Embedding 论文来源:http://science.sciencemag.o ...
最新文章
- 长尾关键词挖掘工具和使用方法
- java学习笔记(五)----super用法,final用法,抽象类,接口
- 23. C# -- 封装
- c++入门之浅入浅出类——分享给很多想形象理解的人
- 日志分析系统分类有哪些_SEO优化中网站日志起到的重要性作用
- Linux内核逻辑结构
- floatvalue 重写_Java Number floatValue()方法与示例
- javascript:void(0) 含义
- AJAX做一个动态进度条
- 云智能,助力警务大数据云平台建设
- office VBA开发视频教程
- Robocode简介
- C++中的矩阵特征值
- Linux - Yocto: 创建toolchain
- chart.js使用学习——混合图表
- 点击箭头 切图 html,css写箭头
- 抖音xlog算法解密过程
- HDI PCB,这篇文章告诉与HDI相关的知识与工艺
- 文件处理与垃圾回收机制
- 【计算机基础-二进制的原码,反码,补码,真值】
热门文章
- linux下验证软件MD5值
- 核心乐理---和弦基础
- MySQL8.0修改用户密码验证
- 8通道同步并行数据采集PCI模块的设计
- [BZOJ2844]albus就是要第一个出场
- Qt5简单函数计算器
- 【计算机组成原理】计算机系统概述总结——基本知识要点汇总
- 以太坊源码分析(53)以太坊测试网络Clique_PoA介绍
- mysql不可重复读和重复读_MySql隔离级别:RU / RC / RR / S + 脏读 / 不可重复读 / 幻读 / 可重复读...
- plsql 删除一直在执行_固态硬盘开启Trim后,删除数据究竟能不能恢复?