The unknown Word

The First column The Second Column
Nearest neighbor algorithm 近邻算法
K - Nearest neighbor algorithm \(\sqrt{{\sum_{i=1}^n}(a_i-b_j)^2}\)

Decision Tree

According to some features,each node asks a question.By judging ,the data is divided into two categories,and then continue to ask questions.These problems are learned from existion data.When new data is added,the data can be divided into appropriate leaves according to the problem on the tree.

Random Forest

Randomly select data in the source data to form several subsets

The S matrix is the source data,with 1-N data,A B C is the feature,and the last column C is the category.

M sub-matrices are randomly generated by S


These M subsets get M decision trees

Put the new data into the M trees,get M classification results,count to see which category is the most predictive,and use this category as the final prediction result.

Logistic regression

When the prediction target is probabilistic,the value domain needs to satisfy greater than or equal to 0.when less than or equal to 1,at this time,a simple linear model cannot be used because the value range is beyond when the domain is not within a certain range.

So it is better to have a model of this shape at this time

So how do you get such a model?
This model needs to satisfy two conditions:greater than or equal to 0,less than or equal to 1
Models greater than or equal to 0 can choose absolute value,square,square value,here using exponential function,must be greater than 0
Less than or equal to 1 with division,the numerator is itself,and the denominator is itself plus 1,which must be less than 1.

After doing another deformation,I got the logistic regression model.

The corresponding coefficients can be obtained by calculating the source data

In the end,there is the logistic graph

SVM(Support vector machine)

To separate the two types,you want to get a hyperplane.The optimal hyperplane is the maximum of the two types of margins.The margin is the distance between the distance between the hyperplane and the nearest point.As shown below,\(Z_2>Z_1\),so the green super Plane is better.

Express this hyperplane as a linear equation,one above the line is greater than or equal to 1,and the other is less than or equal to -1

The point-to-face distance is calculated according to the formula in the figure.

So the expression for the total margin is as follows,the goal is to maximize this margin,you need to minimize the denominator,so it becomes an optimization problem.

Giving three examples,we find the optimal hperplane,define the weight vector=(2,3)-(1,1)

Obtainning the weight vector as (a,2a),substitube two points into the equation,substitute (2,3) and another value=1,substitute (1,1) and another value=-1 to solve for a and truncation \(W_0\).The value,which in turn gives the expression of the hyperplane.

After getting out A,substituting (a,2a) is the support vector
The equation for a and $W_0$ substituting the hyperplane is support vector machine.

Naive Bayes

Giving an application in NLP
Giving paregraph of text,return the emotional classification,the attitude of this text is positive,or negative.

In order to solve this problem,you can just look at some of the words.

This text will only be represented by some words and their counts

The orginal question is :give you a sentense,which category it belongs to?Become a simpler and easier question through bayer rules

The question becomes,what is the probability of this sentence appearing in this category,of course,don't forget the other towo probabilities in the fomula
For example,The probability that the word love appears in the positive case is 0.1,and the probability in the negative case is 0.001.

K nearest neighbors

When giving a new data,which of the k points closest to it is more,which class does the data belong to?
For example,to distinguish between cats and dogs,the shape of the claws and sound is judged.The circles and triangles are known to be classified.What kind of star does this represent?

When k=3,the points connecting the three lines are the last three points,so the circle is more,so this star belongs to the cat.

K-means

I want to divide a set of data into three categories,with large pink values and small yellow values.
Initially initialized first,here is the simplest 3,2,1 as the initial value of each type.In the rest of the data,each calculates the distance from the three initial values and then classifies it into the category of the initial value closest to it.

After classifying the class,calculate the average of each class as the center point of the new round.

After a few rounds,the group no longer changes,you can stop


Adaboost

Adaboost is one of the methods of bosting
Bosting is combine several classifiers with poor classification effects,and get a better classifier
The following picture,the left and right decision trees,the single look is not very good,but put the same data into it,add the two results together,it will increase the credibility.

For example, the Adaboost's handwriting recongnition,which can capture a lot of features on the artboard,such as the direction of the starting point,the distance between the starting point and the ending point,etc.

When training, you get the weight of each feature. For example, the beginning of 2 and 3 is very similar. This feature has little effect on the classification, and its weight is also small.

And this alpha angle is very recognizable, the weight of this feature will be larger, and the final prediction is the result of considering these features.

Neural Network

Neural Networks is suitable for an input that may fall into at least two categories
NN consists of several layers of neurons, and the connections between them
The first layer is the input layer, and the last layer is the output layer.
Both the hidden layer and the output layer have their own classifier

The input is input to the network, activated, the calculated score is passed to the next layer, the subsequent neural layer is activated, and the score on the node of the output layer represents the scores belonging to each class. The following example shows the classification result as class 1
The same input is transmitted to different nodes, and the different results are obtained because the respective nodes have different weights and biases.
This is forward propagation

Markov

Markov Chains consists of state and transitions
Chestnut, according to the phrase ‘the quick brown fox jumps over the lazy dog’, to get the markov chain
Step, first set each word to a state, then calculate the probability of transition between states

Markov Chains consists of state and transitions
Chestnut, according to the phrase ‘the quick brown fox jumps over the lazy dog’, to get the markov chain
Step, first set each word to a state, then calculate the probability of transition between states

In life, the alternative result of the keyboard input method is the same principle, and the model will be more advanced.

转载于:https://www.cnblogs.com/hugeng007/p/9609679.html

Easy to understand the top ten commonly used algorithms for machine learning相关推荐

  1. Linux命令TOP TEN

    TOP TEN 命令: history | awk '{CMD[$2]++;count++;} END { for(a in CMD) {print CMD[a] " " CMD[ ...

  2. 十大经典Java手机游戏 Top Ten Best Java Mobile Games

    手机上的Java即J2ME(Java 2 Micro Edition)是Sun公司专门用于嵌入式设备的Java软件,开发的软件和游戏可以实现跨平台使用,具有良好的兼容性.当今Java游戏已经有了非常华 ...

  3. 易读代码的艺术之Code Should Be Easy to Understand

    1.Code should be easy to understand. ---- 代码应该易读. 2.Code should be written to minimize the time it w ...

  4. Super easy to understand decision trees (part one)

    文章目录 The preface What can it do how it work The data format Information entropy and GINI coefficient ...

  5. 中科院计算所开源Easy Machine Learning:让机器学习应用开发简单快捷 By 机器之心2017年6月13日 13:05 今日,中科院计算所研究员徐君在微博上宣布「中科院计算所开源了

    中科院计算所开源Easy Machine Learning:让机器学习应用开发简单快捷 By 机器之心2017年6月13日 13:05 今日,中科院计算所研究员徐君在微博上宣布「中科院计算所开源了 E ...

  6. 十大WEB安全问题(OWASP Top Ten Project-2017)

    开放式Web应用程序安全项目(OWASP,Open Web Application Security Project)是一个组织,它提供有关计算机和互联网应用程序的公正.实际.有成本效益的信息.其目的 ...

  7. Portal top ten

    TOP1 独立网店系统 ShopEx,是上海商派网络科技有限公司推出的一个网上商店系列程序.是目前网店软件行业内比较知名的公司.ShopEx旗下的网上商店系统.网上商城系统以及丰富的网商工具,以专业的 ...

  8. 我心目中的Top Ten 之 运动篇

    虽然不是运动员,也没多动症,但我想自己应该是一个地道的体育迷.闲暇之余,我的活动应该有一半的时间与体育有关.篮球.台球.游泳,虽然不够精通,但喜欢掺和.当然,更多的时间是坐在电视机前看CCTV5. 接 ...

  9. 【译】Using Machine Learning to Understand the Ethereum Blockchain

    ConsenSys的 定量开发人员 Paul Lintilhac 目前,数据科学分析的温床研究领域是机器学习,一种使用算法研究大量数据的AI形式. 它用于从测序DNA到研究金融市场和脑机接口的所有事情 ...

最新文章

  1. mysql 分库分表,真的能支持服务无限扩容么?
  2. 第二章kNN分类算法sorted函数
  3. NYOJ 417 死神来了
  4. [微软面试100题]61-70
  5. 每隔10秒钟打印一个“Helloworld”
  6. apache camel_发掘Apache Camel的力量
  7. VS Code HtmlFindClass 插件介绍
  8. 澳门大学健康科学学院生物信息核心实验中心高薪诚聘研究助理
  9. 编写c语言程序的可视化编程环境有哪些,C语言可视化编程环境设计及实现.pdf
  10. 21天Jenkins打卡Day7-打包git代码
  11. androidx.preference.PreferenceScreen 去除左边空白
  12. pwnable.tw dubblesort
  13. RS485总线应用与选型指南
  14. python opencv 函数库说明
  15. SRE(运维工程师)一文详解技术体系和架构师成长之路
  16. 利用Apache Tika分页解析pdf文件内容
  17. 跨期套利交易系统策略
  18. oswatch的安装和使用(转)
  19. 论文笔记:BING and BING++(论文+程序)
  20. html5禁用右侧滚轮条,鼠标滚轮乱跳,教您鼠标滚轮乱跳怎么修复

热门文章

  1. 手机室内地磁定位软件_一种基于地磁的智能手机实时定位方法与流程
  2. 累乘计算问题(C语言程序设计)
  3. css如何让不自动换行,css实现强制不换行/自动换行/强制换行
  4. windows网络流量监控
  5. 超出限定字段截断,鼠标悬停显示全部文字
  6. 通达信行情数据获取--python
  7. <硬件>——Arduino继电器控制实例
  8. 通过安装插件: reset-css 初始化浏览器css样式
  9. sa蛋OpenCV参数说明
  10. weka 贝叶斯 java_NaiveBayes朴素贝叶斯分类器weka实现