【原文网址】https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html

class sklearn.neighbors.KNeighborsClassifier(n_neighbors=5weights=’uniform’algorithm=’auto’leaf_size=30p=2metric=’minkowski’metric_params=Nonen_jobs=None**kwargs)[source]

Classifier implementing the k-nearest neighbors vote.

Read more in the User Guide.

Parameters:

n_neighbors : int, optional (default = 5)

Number of neighbors to use by default for kneighbors queries.

weights : str or callable, optional (default = ‘uniform’)

weight function used in prediction. Possible values:

  • ‘uniform’ : uniform weights. All points in each neighborhood are weighted equally.
  • ‘distance’ : weight points by the inverse of their distance. in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.
  • [callable] : a user-defined function which accepts an array of distances, and returns an array of the same shape containing the weights.

algorithm : {‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}, optional

Algorithm used to compute the nearest neighbors:

  • ‘ball_tree’ will use BallTree
  • ‘kd_tree’ will use KDTree
  • ‘brute’ will use a brute-force search.
  • ‘auto’ will attempt to decide the most appropriate algorithm based on the values passed to fit method.

Note: fitting on sparse input will override the setting of this parameter, using brute force.

leaf_size : int, optional (default = 30)

Leaf size passed to BallTree or KDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem.

p : integer, optional (default = 2)

Power parameter for the Minkowski metric. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used.

metric : string or callable, default ‘minkowski’

the distance metric to use for the tree. The default metric is minkowski, and with p=2 is equivalent to the standard Euclidean metric. See the documentation of the DistanceMetric class for a list of available metrics.

metric_params : dict, optional (default = None)

Additional keyword arguments for the metric function.

n_jobs : int or None, optional (default=None)

The number of parallel jobs to run for neighbors search. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details. Doesn’t affect fit method.

See also

RadiusNeighborsClassifierKNeighborsRegressorRadiusNeighborsRegressor,NearestNeighbors

Notes

See Nearest Neighbors in the online documentation for a discussion of the choice of algorithm and leaf_size.

Warning

Regarding the Nearest Neighbors algorithms, if it is found that two neighbors, neighbor k+1 and k, have identical distances but different labels, the results will depend on the ordering of the training data.

https://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm

Examples

>>>

>>> X = [[0], [1], [2], [3]]
>>> y = [0, 0, 1, 1]
>>> from sklearn.neighbors import KNeighborsClassifier
>>> neigh = KNeighborsClassifier(n_neighbors=3)
>>> neigh.fit(X, y)
KNeighborsClassifier(...)
>>> print(neigh.predict([[1.1]]))
[0]
>>> print(neigh.predict_proba([[0.9]]))
[[0.66666667 0.33333333]]

Methods

fit(X, y) Fit the model using X as training data and y as target values
get_params([deep]) Get parameters for this estimator.
kneighbors([X, n_neighbors, return_distance]) Finds the K-neighbors of a point.
kneighbors_graph([X, n_neighbors, mode]) Computes the (weighted) graph of k-Neighbors for points in X
predict(X) Predict the class labels for the provided data
predict_proba(X) Return probability estimates for the test data X.
score(X, y[, sample_weight]) Returns the mean accuracy on the given test data and labels.
set_params(**params) Set the parameters of this estimator.

__init__(n_neighbors=5weights=’uniform’algorithm=’auto’leaf_size=30p=2metric=’minkowski’metric_params=Nonen_jobs=None**kwargs)[source]

fit(Xy)[source]

Fit the model using X as training data and y as target values

Parameters:

X : {array-like, sparse matrix, BallTree, KDTree}

Training data. If array or matrix, shape [n_samples, n_features], or [n_samples, n_samples] if metric=’precomputed’.

y : {array-like, sparse matrix}

Target values of shape = [n_samples] or [n_samples, n_outputs]

get_params(deep=True)[source]

Get parameters for this estimator.

Parameters:

deep : boolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params : mapping of string to any

Parameter names mapped to their values.

kneighbors(X=Nonen_neighbors=Nonereturn_distance=True)[source]

Finds the K-neighbors of a point. Returns indices of and distances to the neighbors of each point.

Parameters:

X : array-like, shape (n_query, n_features), or (n_query, n_indexed) if metric == ‘precomputed’

The query point or points. If not provided, neighbors of each indexed point are returned. In this case, the query point is not considered its own neighbor.

n_neighbors : int

Number of neighbors to get (default is the value passed to the constructor).

return_distance : boolean, optional. Defaults to True.

If False, distances will not be returned

Returns:

dist : array

Array representing the lengths to points, only present if return_distance=True

ind : array

Indices of the nearest points in the population matrix.

Examples

In the following example, we construct a NeighborsClassifier class from an array representing our data set and ask who’s the closest point to [1,1,1]

>>>

>>> samples = [[0., 0., 0.], [0., .5, 0.], [1., 1., .5]]
>>> from sklearn.neighbors import NearestNeighbors
>>> neigh = NearestNeighbors(n_neighbors=1)
>>> neigh.fit(samples)
NearestNeighbors(algorithm='auto', leaf_size=30, ...)
>>> print(neigh.kneighbors([[1., 1., 1.]]))
(array([[0.5]]), array([[2]]))

As you can see, it returns [[0.5]], and [[2]], which means that the element is at distance 0.5 and is the third element of samples (indexes start at 0). You can also query for multiple points:

>>>

>>> X = [[0., 1., 0.], [1., 0., 1.]]
>>> neigh.kneighbors(X, return_distance=False)
array([[1],[2]]...)

kneighbors_graph(X=Nonen_neighbors=Nonemode=’connectivity’)[source]

Computes the (weighted) graph of k-Neighbors for points in X

Parameters:

X : array-like, shape (n_query, n_features), or (n_query, n_indexed) if metric == ‘precomputed’

The query point or points. If not provided, neighbors of each indexed point are returned. In this case, the query point is not considered its own neighbor.

n_neighbors : int

Number of neighbors for each sample. (default is value passed to the constructor).

mode : {‘connectivity’, ‘distance’}, optional

Type of returned matrix: ‘connectivity’ will return the connectivity matrix with ones and zeros, in ‘distance’ the edges are Euclidean distance between points.

Returns:

A : sparse matrix in CSR format, shape = [n_samples, n_samples_fit]

n_samples_fit is the number of samples in the fitted data A[i, j] is assigned the weight of edge that connects i to j.

See also

NearestNeighbors.radius_neighbors_graph

Examples

>>>

>>> X = [[0], [3], [1]]
>>> from sklearn.neighbors import NearestNeighbors
>>> neigh = NearestNeighbors(n_neighbors=2)
>>> neigh.fit(X)
NearestNeighbors(algorithm='auto', leaf_size=30, ...)
>>> A = neigh.kneighbors_graph(X)
>>> A.toarray()
array([[1., 0., 1.],[0., 1., 1.],[1., 0., 1.]])

predict(X)[source]

Predict the class labels for the provided data

Parameters:

X : array-like, shape (n_query, n_features), or (n_query, n_indexed) if metric == ‘precomputed’

Test samples.

Returns:

y : array of shape [n_samples] or [n_samples, n_outputs]

Class labels for each data sample.

predict_proba(X)[source]

Return probability estimates for the test data X.

Parameters:

X : array-like, shape (n_query, n_features), or (n_query, n_indexed) if metric == ‘precomputed’

Test samples.

Returns:

p : array of shape = [n_samples, n_classes], or a list of n_outputs

of such arrays if n_outputs > 1. The class probabilities of the input samples. Classes are ordered by lexicographic order.

score(Xysample_weight=None)[source]

Returns the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:

X : array-like, shape = (n_samples, n_features)

Test samples.

y : array-like, shape = (n_samples) or (n_samples, n_outputs)

True labels for X.

sample_weight : array-like, shape = [n_samples], optional

Sample weights.

Returns:

score : float

Mean accuracy of self.predict(X) wrt. y.

set_params(**params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:

self

sklearn K近邻KNeighborsClassifier参数详解相关推荐

  1. sklearn中的RandomForestClassifier参数详解

    https://blog.csdn.net/qq_16633405/article/details/58596368 相当有思考的文章 sklearn随机森林分类类RandomForestClassi ...

  2. Lesson 8.3Lesson 8.4 ID3、C4.5决策树的建模流程CART回归树的建模流程与sklearn参数详解

    Lesson 8.3 ID3.C4.5决策树的建模流程 ID3和C4.5作为的经典决策树算法,尽管无法通过sklearn来进行建模,但其基本原理仍然值得讨论与学习.接下来我们详细介绍关于ID3和C4. ...

  3. Sklearn参数详解—聚类算法

    总第115篇 前言 聚类是一种非监督学习,是将一份给定数据集划分成k类,这一份数据集可能是某公司的一批用户,也可能是某媒体网站的一系列文章,如果是某公司的一批用户,那么k-means做的就是根据用户的 ...

  4. sklearn 随机分割数据_sklearn.ensemble.RandomForestClassifier 随机深林参数详解

    随机森林是一种元估计量,它适合数据集各个子样本上的许多决策树分类器,并使用平均数来提高预测准确性和控制过度拟合.子样本大小由max_samples参数bootstrap=True (default)控 ...

  5. Lesson 8.1Lesson 8.2 决策树的核心思想与建模流程CART分类树的建模流程与sklearn评估器参数详解

    Lesson 8.1 决策树的核心思想与建模流程 从本节课开始,我们将介绍经典机器学习领域中最重要的一类有监督学习算法--树模型(决策树). 可此前的聚类算法类似,树模型也同样不是一个模型,而是一类模 ...

  6. 系列 《使用sklearn进行集成学习——理论》 《使用sklearn进行集成学习——实践》 目录 1 Random Forest和Gradient Tree Boosting参数详解 2 如何调参?

    系列 <使用sklearn进行集成学习--理论> <使用sklearn进行集成学习--实践> 目录 1 Random Forest和Gradient Tree Boosting ...

  7. SKlearn参数详解—随机森林

    总第114篇 前言 随机森林(RandomForest,简称RF)是集成学习bagging的一种代表模型,随机森林模型正如他表面意思,是由若干颗树随机组成一片森林,这里的树就是决策树. 在GBDT篇我 ...

  8. Sklearn参数详解—GBDT

    总第113篇 前言 这篇介绍Boosting的第二个模型GBDT,GBDT和Adaboost都是Boosting模型的一种,但是略有不同,主要有以下两点不同: GBDT使用的基模型是CART决策树,且 ...

  9. Sklearn参数详解—Adaboost

    总第112篇 前言 今天这篇讲讲集成学习,集成学习就是将多个弱学习器集合成一个强学习器,你可以理解成现在有好多道判断题(判断对错即01),如果让学霸去做这些题,可能没啥问题,几乎全部都能做对,但是现实 ...

最新文章

  1. vue中的时间过滤器
  2. @responseBody java_java-如何使用@ResponseBody从Spring Controller返回JSON数据
  3. 提交客户端证书_MQTT X v1.3.3 正式发布 - 跨平台 MQTT 5.0 桌面测试客户端
  4. 总结了24个C++的大坑,看你能躲过几个?
  5. 【python】SOCK_STREAM和SOCK_DGRAM两种类型的区别【转】
  6. 团伙 并查集_BZOJ 1370 Baltic2003 Gang团伙 并查集
  7. Linux c modbus 线程,Modbus TCP Slave Thread - 设置和获取寄存器值
  8. 期货市场十赌九输,钱都去哪里了?
  9. Python 01:Pyton历史和入门介绍
  10. d3js绘制y坐标轴_如何用D3绘制各类样式的x坐标轴
  11. Python练习猜拳,利用while循环自定义函数,结果数据存入excel表格
  12. 贷款客户资源获取,一文了解贷款行业怎么获取高效精准客户
  13. Android之如何分析手机系统相册图片和视频删除后保存的位置
  14. intel固态硬盘误删文件该如何进行恢复
  15. react-activation缓存React.lazy异步组件问题记录
  16. earlier的意思_earlier和before都有之前的意思?有什么区别吗
  17. Unity加载并展示PPT
  18. 回文子串是什么意思?
  19. 一文读懂cpu cache
  20. Vue介绍以及练手案例——音乐播放器(搜索音乐、听歌、看评论、看mv等)(持续更新)

热门文章

  1. freebsd SSH配置详解
  2. 打开Jupyter报错:EnvironmentLocationNotFound: Not a conda environment
  3. 华为手机备忘录资料备份
  4. 【Rust 笔记】09-特型与泛型
  5. 上班摸鱼看小说的最佳软件
  6. Unity 制作鼠标光标拖尾
  7. 【ElementUI】DateTimePicker 日期时间选择器,设置 disabledDate 禁用今天之后的时间后,今天的日期选择不了的问题
  8. js 日期判断是否是今天
  9. Python 提取音乐频谱并可视化
  10. 圆环显示数据html,圆环图怎么默认显示数据?