Easy to understand the top ten commonly used algorithms for machine learning

The unknown Word

The First column	The Second Column
Nearest neighbor algorithm	近邻算法
K - Nearest neighbor algorithm	$\sqrt{{\sum_{i=1}^n}(a_i-b_j)^2}$

Decision Tree

According to some features,each node asks a question.By judging ,the data is divided into two categories,and then continue to ask questions.These problems are learned from existion data.When new data is added,the data can be divided into appropriate leaves according to the problem on the tree.

Random Forest

Randomly select data in the source data to form several subsets

The S matrix is the source data,with 1-N data,A B C is the feature,and the last column C is the category.

M sub-matrices are randomly generated by S

These M subsets get M decision trees

Put the new data into the M trees,get M classification results,count to see which category is the most predictive,and use this category as the final prediction result.

Logistic regression

When the prediction target is probabilistic,the value domain needs to satisfy greater than or equal to 0.when less than or equal to 1,at this time,a simple linear model cannot be used because the value range is beyond when the domain is not within a certain range.

So it is better to have a model of this shape at this time

So how do you get such a model?
This model needs to satisfy two conditions:greater than or equal to 0,less than or equal to 1
Models greater than or equal to 0 can choose absolute value,square,square value,here using exponential function,must be greater than 0
Less than or equal to 1 with division,the numerator is itself,and the denominator is itself plus 1,which must be less than 1.

After doing another deformation,I got the logistic regression model.

The corresponding coefficients can be obtained by calculating the source data

In the end,there is the logistic graph

SVM(Support vector machine)

To separate the two types,you want to get a hyperplane.The optimal hyperplane is the maximum of the two types of margins.The margin is the distance between the distance between the hyperplane and the nearest point.As shown below,$Z_2>Z_1$,so the green super Plane is better.

Express this hyperplane as a linear equation,one above the line is greater than or equal to 1,and the other is less than or equal to -1

The point-to-face distance is calculated according to the formula in the figure.

So the expression for the total margin is as follows,the goal is to maximize this margin,you need to minimize the denominator,so it becomes an optimization problem.

Giving three examples,we find the optimal hperplane,define the weight vector=(2,3)-(1,1)

Obtainning the weight vector as (a,2a),substitube two points into the equation,substitute (2,3) and another value=1,substitute (1,1) and another value=-1 to solve for a and truncation $W_0$.The value,which in turn gives the expression of the hyperplane.

After getting out A,substituting (a,2a) is the support vector
The equation for a and $W_0$ substituting the hyperplane is support vector machine.

Naive Bayes

Giving an application in NLP
Giving paregraph of text,return the emotional classification,the attitude of this text is positive,or negative.

In order to solve this problem,you can just look at some of the words.

This text will only be represented by some words and their counts

The orginal question is :give you a sentense,which category it belongs to?Become a simpler and easier question through bayer rules

The question becomes,what is the probability of this sentence appearing in this category,of course,don't forget the other towo probabilities in the fomula
For example,The probability that the word love appears in the positive case is 0.1,and the probability in the negative case is 0.001.

K nearest neighbors

When giving a new data,which of the k points closest to it is more,which class does the data belong to?
For example,to distinguish between cats and dogs,the shape of the claws and sound is judged.The circles and triangles are known to be classified.What kind of star does this represent?

When k=3,the points connecting the three lines are the last three points,so the circle is more,so this star belongs to the cat.

K-means

I want to divide a set of data into three categories,with large pink values and small yellow values.
Initially initialized first,here is the simplest 3,2,1 as the initial value of each type.In the rest of the data,each calculates the distance from the three initial values and then classifies it into the category of the initial value closest to it.

After classifying the class,calculate the average of each class as the center point of the new round.

After a few rounds,the group no longer changes,you can stop

Adaboost

Adaboost is one of the methods of bosting
Bosting is combine several classifiers with poor classification effects,and get a better classifier
The following picture,the left and right decision trees,the single look is not very good,but put the same data into it,add the two results together,it will increase the credibility.

For example, the Adaboost's handwriting recongnition,which can capture a lot of features on the artboard,such as the direction of the starting point,the distance between the starting point and the ending point,etc.

When training, you get the weight of each feature. For example, the beginning of 2 and 3 is very similar. This feature has little effect on the classification, and its weight is also small.

And this alpha angle is very recognizable, the weight of this feature will be larger, and the final prediction is the result of considering these features.

Neural Network

Neural Networks is suitable for an input that may fall into at least two categories
NN consists of several layers of neurons, and the connections between them
The first layer is the input layer, and the last layer is the output layer.
Both the hidden layer and the output layer have their own classifier

The input is input to the network, activated, the calculated score is passed to the next layer, the subsequent neural layer is activated, and the score on the node of the output layer represents the scores belonging to each class. The following example shows the classification result as class 1
The same input is transmitted to different nodes, and the different results are obtained because the respective nodes have different weights and biases.
This is forward propagation

Markov

Markov Chains consists of state and transitions
Chestnut, according to the phrase ‘the quick brown fox jumps over the lazy dog’, to get the markov chain
Step, first set each word to a state, then calculate the probability of transition between states

Markov Chains consists of state and transitions
Chestnut, according to the phrase ‘the quick brown fox jumps over the lazy dog’, to get the markov chain
Step, first set each word to a state, then calculate the probability of transition between states

In life, the alternative result of the keyboard input method is the same principle, and the model will be more advanced.

转载于:https://www.cnblogs.com/hugeng007/p/9609679.html

Easy to understand the top ten commonly used algorithms for machine learning相关推荐

Linux命令TOP TEN
TOP TEN 命令: history | awk '{CMD[$2]++;count++;} END { for(a in CMD) {print CMD[a] " " CMD[ ...
十大经典Java手机游戏 Top Ten Best Java Mobile Games
手机上的Java即J2ME(Java 2 Micro Edition)是Sun公司专门用于嵌入式设备的Java软件,开发的软件和游戏可以实现跨平台使用,具有良好的兼容性.当今Java游戏已经有了非常华 ...
易读代码的艺术之Code Should Be Easy to Understand
1.Code should be easy to understand. ---- 代码应该易读. 2.Code should be written to minimize the time it w ...
Super easy to understand decision trees (part one）
文章目录 The preface What can it do how it work The data format Information entropy and GINI coefficient ...
中科院计算所开源Easy Machine Learning：让机器学习应用开发简单快捷 By 机器之心2017年6月13日 13:05 今日，中科院计算所研究员徐君在微博上宣布「中科院计算所开源了
中科院计算所开源Easy Machine Learning:让机器学习应用开发简单快捷 By 机器之心2017年6月13日 13:05 今日,中科院计算所研究员徐君在微博上宣布「中科院计算所开源了 E ...
十大WEB安全问题（OWASP Top Ten Project-2017）
开放式Web应用程序安全项目(OWASP,Open Web Application Security Project)是一个组织,它提供有关计算机和互联网应用程序的公正.实际.有成本效益的信息.其目的 ...
Portal top ten
TOP1 独立网店系统 ShopEx,是上海商派网络科技有限公司推出的一个网上商店系列程序.是目前网店软件行业内比较知名的公司.ShopEx旗下的网上商店系统.网上商城系统以及丰富的网商工具,以专业的 ...
我心目中的Top Ten 之运动篇
虽然不是运动员,也没多动症,但我想自己应该是一个地道的体育迷.闲暇之余,我的活动应该有一半的时间与体育有关.篮球.台球.游泳,虽然不够精通,但喜欢掺和.当然,更多的时间是坐在电视机前看CCTV5. 接 ...
【译】Using Machine Learning to Understand the Ethereum Blockchain
ConsenSys的定量开发人员 Paul Lintilhac 目前,数据科学分析的温床研究领域是机器学习,一种使用算法研究大量数据的AI形式. 它用于从测序DNA到研究金融市场和脑机接口的所有事情 ...

Easy to understand the top ten commonly used algorithms for machine learning

The unknown Word

Decision Tree

Random Forest

Logistic regression

SVM(Support vector machine)

Naive Bayes

K nearest neighbors

K-means

Adaboost

Neural Network

Markov

Easy to understand the top ten commonly used algorithms for machine learning相关推荐

最新文章

热门文章