机器学习——KNN分类器的学习
kNN 的特点:
- 简单. 没有学习过程, 也被称为惰性学习 lazy learning. 类似于开卷考试, 在已有数据中去找答案.
- 本源. 找相似, 正是人类认识事物的常用方法, 隐藏于人类或者其他动物的基因里面. 当然, 人类也会上当,例如有人把邻居的滴水观音误认为是芋头, 偷食后中毒.
- 效果好. 永远不要小视 kNN, 对于很多数据, 你很难设计算法超越它.
- 适应性强. 可用于分类, 回归. 可用于各种数据.
- 可扩展性强. 设计不同的度量, 可获得意想不到的效果.
- 一般需要对数据归一化.
- 复杂度高. 这也是 kNN 最重要的缺点. 对于每一个测试数据, 复杂度为 O ( ( m + k ) n ) , 其中 n 为训练数据个数, m为条件属性个数, k为邻居个数. 代码见 computeNearests().
代码:
package machinelearning.knn;import weka.core.*;import java.io.FileNotFoundException;
import java.io.FileReader;
import java.util.Arrays;
import java.util.Random;public class KnnClassification {//曼哈顿距离,|x|+|y|public static final int MANHATTAN = 0;//欧氏距离public static final int EUCLIDEAN = 1;//距离衡量方式public int distanceMeasure = EUCLIDEAN;//一个随机实例public static final Random random = new Random();//邻居的数量int numNeighbors = 7;//存储整个数据集Instances dataset;//训练集,由数据索引表示int[] trainingSet;//测试集,由数据索引表示int[] testingSet;//预测结果int[] predictions;public KnnClassification(String paraFilename) {try {FileReader fileReader = new FileReader(paraFilename);dataset = new Instances(fileReader);//最后一个属性是类别dataset.setClassIndex(dataset.numAttributes() - 1);fileReader.close();} catch (Exception e) {System.out.println("Error occurred while trying to read \'" + paraFilename+ "\' in KnnClassification constructor.\r\n" + e);System.exit(0);}}/*** 获得一个随机索引用于数据随机化** @param paraLength 数据的长度* @return 返回一个索引数组*/public static int[] getRandomIndices(int paraLength) {int[] resultIndices = new int[paraLength];//1. 初始化for (int i = 0; i < paraLength; i++) {resultIndices[i] = i;}//2. 随机交换int tempFirst, tempSecond, tempValue;for (int i = 0; i < paraLength; i++) {//产生两个随机索引tempFirst = random.nextInt(paraLength);tempSecond = random.nextInt(paraLength);//交换tempValue = resultIndices[tempFirst];resultIndices[tempFirst] = resultIndices[tempSecond];resultIndices[tempSecond] = tempValue;}return resultIndices;}/*** 将数据分为训练集与测试集** @param paraTrainingFraction 训练集所占比例*/public void splitTrainingTesting(double paraTrainingFraction) {int tempSize = dataset.numInstances();//数据集所含数据的数量int[] tempIndices = getRandomIndices(tempSize);int tempTrainingSize = (int) (tempSize * paraTrainingFraction);trainingSet = new int[tempTrainingSize];testingSet = new int[tempSize - tempTrainingSize];for (int i = 0; i < tempTrainingSize; i++) {trainingSet[i] = tempIndices[i];}for (int i = 0; i < tempSize - tempTrainingSize; i++) {testingSet[i] = tempIndices[tempTrainingSize + i];}}/*** 预测整个测试集,结果存储在预测集中*/public void predict() {predictions = new int[testingSet.length];for (int i = 0; i < predictions.length; i++) {predictions[i] = predict(testingSet[i]);}}/*** 预测给定的实例** @param paraIndex* @return 预测的结果*/private int predict(int paraIndex) {int[] tempNeighbors = computeNearests(paraIndex);int resultPrediction = simpleVoting(tempNeighbors);return resultPrediction;}/*** 两个实例之间的距离** @param paraI 第一个实例的索引* @param paraJ 第二个实例的索引* @return 距离*/public double distance(int paraI, int paraJ) {double resultDistance = 0;double tempDifference;switch (distanceMeasure) {case MANHATTAN:for (int i = 0; i < dataset.numAttributes() - 1; i++) {tempDifference = dataset.instance(paraI).value(i) - dataset.instance(paraJ).value(i);if (tempDifference < 0) {resultDistance -= tempDifference;} else {resultDistance += tempDifference;}}break;case EUCLIDEAN:for (int i = 0; i < dataset.numAttributes() - 1; i++) {tempDifference = dataset.instance(paraI).value(i) - dataset.instance(paraJ).value(i);resultDistance += tempDifference * tempDifference;}break;default:System.out.println("Unsupported distance measure: " + distanceMeasure);}return resultDistance;}/*** 获取分类器的准确度** @return*/public double getAccuracy() {double tempCorrect = 0;for (int i = 0; i < predictions.length; i++) {if (predictions[i] == dataset.instance(testingSet[i]).classValue()) {tempCorrect++;}}return tempCorrect / testingSet.length;}/*** 计算最近的n个邻居** @param paraCurrent 最近的实例* @return 最近实例的索引*/private int[] computeNearests(int paraCurrent) {int[] resultNearests = new int[numNeighbors];boolean[] tempSelected = new boolean[trainingSet.length];double tempMinimalDistance;int tempMinimalIndex = 0;double[] tempDistances = new double[trainingSet.length];for (int i = 0; i < trainingSet.length; i++) {tempDistances[i] = distance(paraCurrent, trainingSet[i]);}//选择最近的k个索引for (int i = 0; i < numNeighbors; i++) {tempMinimalDistance = Double.MAX_VALUE;for (int j = 0; j < trainingSet.length; j++) {if (tempSelected[j]) {continue;}if (tempDistances[j] < tempMinimalDistance) {tempMinimalDistance = tempDistances[j];tempMinimalIndex = j;}}resultNearests[i] = trainingSet[tempMinimalIndex];tempSelected[tempMinimalIndex] = true;}System.out.println("The nearest of " + paraCurrent + " are: " + Arrays.toString(resultNearests));return resultNearests;}/*** 投票** @param paraNeighbors* @return*/private int simpleVoting(int[] paraNeighbors) {int[] tempVotes = new int[dataset.numClasses()];for (int i = 0; i < paraNeighbors.length; i++) {tempVotes[(int) dataset.instance(paraNeighbors[i]).classValue()]++;}int tempMaximalVotingIndex = 0;int tempMaximalVoting = 0;for (int i = 0; i < dataset.numClasses(); i++) {if (tempVotes[i] > tempMaximalVoting) {tempMaximalVoting = tempVotes[i];tempMaximalVotingIndex = i;}}return tempMaximalVotingIndex;}public void setDistanceMeasure(int paraType) {if (paraType == 0) {distanceMeasure = MANHATTAN;} else if (paraType == 1) {distanceMeasure = EUCLIDEAN;} else {System.out.println("Wrong Distance Measure!!!");}}public void setNumNeighbors(int paraNumNeighbors) {if (paraNumNeighbors > dataset.numInstances()) {System.out.println("out of range");return;}this.numNeighbors = paraNumNeighbors;}public static void main(String args[]) {KnnClassification tempClassifier = new KnnClassification("D:\\研究生学习\\iris.arff");tempClassifier.splitTrainingTesting(0.8);tempClassifier.predict();System.out.println("The accuracy of the classifier is: " + tempClassifier.getAccuracy());}
}
运行结果:
The nearest of 120 are: [143, 140, 124, 144, 112, 139, 102]
The nearest of 3 are: [29, 2, 45, 12, 38, 42, 34]
The nearest of 64 are: [82, 79, 88, 99, 59, 92, 89]
The nearest of 37 are: [34, 9, 1, 12, 29, 45, 2]
The nearest of 148 are: [136, 115, 147, 140, 137, 124, 144]
The nearest of 30 are: [29, 34, 9, 45, 12, 1, 11]
The nearest of 126 are: [123, 127, 138, 146, 83, 63, 72]
The nearest of 117 are: [131, 105, 109, 122, 125, 107, 118]
The nearest of 55 are: [66, 96, 94, 78, 95, 99, 84]
The nearest of 47 are: [2, 42, 6, 29, 38, 12, 45]
The nearest of 90 are: [94, 96, 89, 99, 67, 95, 92]
The nearest of 71 are: [97, 82, 92, 61, 99, 74, 67]
The nearest of 132 are: [128, 104, 103, 111, 112, 140, 147]
The nearest of 49 are: [7, 39, 0, 28, 17, 40, 34]
The nearest of 134 are: [103, 83, 111, 137, 119, 72, 108]
The nearest of 35 are: [1, 2, 40, 28, 34, 9, 7]
The nearest of 10 are: [48, 27, 36, 19, 5, 16, 20]
The nearest of 130 are: [107, 102, 125, 129, 105, 122, 108]
The nearest of 15 are: [33, 14, 5, 16, 32, 48, 19]
The nearest of 8 are: [38, 42, 13, 12, 45, 2, 29]
The nearest of 133 are: [83, 72, 123, 127, 63, 111, 77]
The nearest of 18 are: [5, 48, 20, 16, 31, 36, 33]
The nearest of 69 are: [80, 89, 81, 92, 82, 53, 67]
The nearest of 135 are: [105, 102, 107, 122, 125, 109, 118]
The nearest of 25 are: [34, 9, 1, 12, 45, 29, 7]
The nearest of 46 are: [19, 21, 48, 4, 27, 32, 44]
The nearest of 110 are: [147, 115, 77, 137, 141, 139, 127]
The nearest of 116 are: [137, 103, 147, 111, 128, 112, 104]
The nearest of 145 are: [141, 147, 139, 112, 115, 140, 128]
The nearest of 149 are: [127, 138, 142, 101, 70, 83, 121]
The accuracy of the classifier is: 0.9666666666666667
机器学习——KNN分类器的学习相关推荐
- 机器学习knn算法学习笔记使用sklearn库 ,莺尾花实例
** 机器学习knn算法学习笔记使用sklearn库 ,莺尾花实例. 具体knn算法是怎样的我这里就不再详细论述.在这里我注意总结我使用knn算法进行一个分类的分析 ** 分析过程 1.前期准备 引入 ...
- 机器学习(Machine Learning)深度学习(Deep Learning)资料汇总
本文来源:https://github.com/ty4z2008/Qix/blob/master/dl.md 机器学习(Machine Learning)&深度学习(Deep Learning ...
- 机器学习(Machine Learning)深度学习(Deep Learning)资料集合
机器学习(Machine Learning)&深度学习(Deep Learning)资料 原文链接:https://github.com/ty4z2008/Qix/blob/master/dl ...
- 初学者教程:KNN分类器
作者|Rashida Nasrin Sucky 编译|VK 来源|Towards Data Science KNN分类器是一种非常流行的监督机器学习技术.本文将用一个例子来解释KNN分类器 什么是监督 ...
- 机器学习--KNN(K- Nearest Neighbor)
1.KNN(K- Nearest Neighbor)法-----K最邻近法 KNN算法的核心思想是:如果一个样本在特征空间中的K个最相邻的样本中的大多数属于某一个类别,则该样本也属于这个类别,并具有这 ...
- 【1】机器学习实战peter Harrington——学习笔记
机器学习实战peter Harrington--学习笔记 综述 数据挖掘十大算法 本书结构 一.机器学习基础 1.1 机器学习 1.2 关键术语 1.3 机器学习主要任务 1.4 如何选择合适的算法 ...
- 机器学习 —— KNN算法简单入门
机器学习 -- KNN算法简单入门 第1关:手动实现简单kNN算法 1 KNN算法简介 1.1 kNN 算法的算法流程 1.2 kNN 算法的优缺点 1.3 编程要求+参数解释 2. 代码实现 3. ...
- (转)机器学习(Machine Learning)深度学习(Deep Learning)资料
原文链接:https://github.com/ty4z2008/Qix/blob/master/dl.md 机器学习(Machine Learning)&深度学习(Deep Learning ...
- 机器学习 KNN算法实践
作者 | 叶庭云 来源 | 修炼Python 头图 | 下载于视觉中国 KNN算法简介 KNN(K-Nearest Neighbor)最邻近分类算法是数据挖掘分类(classification)技术中 ...
最新文章
- 【framework】spring3-mvc实例-信息转换
- TCP/IP详解--第十章
- netty:IO模型
- STM8L编程环境官方库+STVD+COSMIC+ST-Link
- 【ArcGIS风暴】ArcGIS10.6获取栅格影像边界范围的三种方法案例详解
- Kotlin 1.2 新特性
- activemq消息丢失_Kafka or RabbitMQ:消息中间件选型深入分析
- python打印不同颜色的字_Python 根据日志级别打印不同颜色的日志的方法示例
- hdu1556 Color the ball
- 全国dns服务器地址
- 企业如何确定需要什么样的产品经理
- 解决微信小程序要求TLS版本不低于1.2问题
- c语言 error c2227,error C2227: left of '-first' must point to class/struct/union
- PR转场预设 鼠标拖拽视频画面滑动转场特效PR预设
- NO JVM installation found. please install a 64-bit JDK,解决方法 Error launching android studio NO J
- Flink链接kafka并解析Json文件(三)
- jQuery Fancybox插件介绍
- 如何 重构网络 SDN架构实现
- 520表白网站(附源码与修改教程)
- HTML5 canvas 行星环绕