深度学习算法和机器学习算法_63种机器学习算法介绍

深度学习算法和机器学习算法

Data Science and analytics are transforming businesses. It has penetrated into all departments be it Finance, Marketing, Operations, HR, Designing, etc. It is becoming increasingly important for B-school students to have analytical skills and be well versed with Machine Learning and Statistics. Data is being called the new gold. The fastest growing companies in the coming period will be the ones who can make the most sense of data they collect. As through the power of Data a business can do targeted marketing, transforming the way they convert sales and satisfy demand.

数据科学和分析正在改变业务。它已渗透到财务，市场营销，运营，人力资源，设计等各个部门。对于B学校学生来说，掌握分析技能并精通机器学习和统计学正变得越来越重要。数据被称为新黄金。未来一段时间内发展最快的公司将是最能充分利用其收集的数据的公司。就像通过数据的力量一样，企业可以进行有针对性的营销，从而改变他们转换销售和满足需求的方式。

But there is a catch, Machine Learning is complex and for those starting out into this field, learning it first time in B-school it seems tough to grasp these concepts together with hectic schedule. B-school student who has no prior experience in coding, machine learning is difficult, one gets lost in all the different algorithms and branches of supervised vs unsupervised learning. The mathematics behind them is tough to understand and has a steep learning curve. For start python or R itself seems like a rough sea which requires some dedicated practice. But, it is of critical importance for a business manager to have knowledge of these. New generation of MBA’s is learning it and older generation should learn it.

但是有一个问题，机器学习是复杂的，对于刚进入该领域的人来说，这是第一次在B学校中学习它，似乎很难将这些概念与繁忙的日程一起掌握。以前没有编码经验的B-school学生，机器学习很困难，一个人会迷失在有监督学习与无监督学习的所有不同算法和分支中。它们背后的数学很难理解并且学习曲线陡峭。对于开始，python或R本身似乎是一个波涛汹涌的大海，需要一些专门的实践。但是，对于业务经理而言，掌握这些知识至关重要。新一代的MBA正在学习它，而老一代则应该学习它。

My blog series aims to explain these algorithms in simple to understand manner, so that someone with basic knowledge of python can implement them and benefit in their lives and businesses.

我的博客系列旨在以简单易懂的方式解释这些算法，以便具有python基本知识的人可以实现它们，并从他们的生活和业务中受益。

So, I decided to ditch the mathematics and dive right into how that algorithm works, why is different from others and why as a businessman I should bother about them. In this article I will explain about 11 branches of machine learning and will introduce each of the branch briefly. In the upcoming articles we will look into detailed description of each node, differences among them and use cases of each.

因此，我决定放弃数学，直接研究该算法的工作原理，为什么与众不同以及为什么作为商人我应该为它们而烦恼。在本文中，我将解释大约11个机器学习分支，并简要介绍每个分支。在接下来的文章中，我们将研究每个节点的详细说明，它们之间的差异以及每个节点的用例。

什么是机器学习？ (What is Machine Learning?)

Machine Learning is the sub-field of computer science that gives “computers the ability to learn without being explicitly programmed.” ~ Arthur Samuel

机器学习是计算机科学的一个子领域，它使“计算机无需明确编程即可学习。” 〜亚瑟·塞缪尔

It is Netflix telling you watch this movie next, Spotify playing good songs without you touching your phone, its your keyboard in phone, it is how they predict next years sales. Machine learning in its simplest form is learning from data and then predicting or dividing it into meaning parts to make sense of it in a easier and usable fashion.

就是Netflix告诉您接下来要看这部电影，Spotify会在不触摸手机的情况下播放出色的歌曲，它是手机的键盘，这就是他们预测明年销量的方式。机器学习最简单的形式是从数据中学习，然后将其预测或分成有意义的部分，以更容易使用的方式理解它。

Your computer can learn from data using algorithms which work on mathematics and statistics to perform the required function. Algorithms find and apply patterns to the data they try to minimize the loss of accuracy in predictions while applying a certain pattern, and then they give us back the best pattern that they could learn from the data.

您的计算机可以使用在数学和统计上起作用的算法来执行所需的功能，从数据中学习。算法找到数据并将其应用于数据，他们试图在应用某种模式的同时最大程度地减少预测准确性的损失，然后它们将可以从数据中学到的最佳模式反馈给我们。

If you tell your algorithm what each data point means than it is called a supervised learning algorithm whereas if you do not give any labels then algorithm tries to find patterns itself and it is called unsupervised machine learning.

如果您告诉算法每个数据点意味着什么，那么它称为监督学习算法，而如果您不给出任何标签，则算法会尝试自行查找模式，这被称为无监督机器学习。

11家分店 (The 11 Branches)

Machine Learning algorithms can be divided into 11 branches, based on underlying mathematical model:

基于基础的数学模型，机器学习算法可以分为11个分支：

Bayesian — Bayesian machine learning models are based on Bayes theorem which is nothing but calculation of probability of something happening knowing something else has happened, e.g. probability that Yuvraj (Cricketer) will hit six sixes knowing that he ate curry-rice today. We use machine learning to apply Bayesian statistics on our data and we are assuming in these algorithms that there is some independence in our independent variables. These models start with some belief about data and then the models update that belief based on data. There are various applications of Bayesian statistics in classification as I did in my Twitter Project using Naive Bayes Classifier. Also, in business calculating probability of success of certain marketing plan based on data points and historical parameters of other marketing strategies.

贝叶斯(Bayesian)—贝叶斯机器学习模型基于贝叶斯定理 ( Bayes theorem) ，它只不过是在知道某事已经发生的情况下发生某事的概率的计算，例如Yuvraj(板球运动员)知道他今天吃了咖喱饭的概率达到了六分之六。我们使用机器学习在数据上应用贝叶斯统计，并且在这些算法中假设自变量存在一定的独立性。这些模型从对数据的某种信念开始，然后这些模型根据数据更新该信念。正如我在使用Naive Bayes分类器的Twitter项目中所做的那样，贝叶斯统计在分类中有多种应用。同样，在业务中，根据其他营销策略的数据点和历史参数来计算某些营销计划成功的概率。
Decision Tree — Decision tree as the name suggests is used to come to a decision using a tree. It uses estimates and probabilities based on which we calculate the likely outcomes. Tree’s structure has root node which gets divided into Internal nodes and then leafs. What is there on these nodes is data classification variables. Our models learns from our labelled data and finds the best variables to split our data on so as to minimize the classification error. It can either give us classified data or even predict value for our data points based on the learning it got from our training data. Decision Tree’s are used in finance in option pricing, in Marketing and Business planning to find the best plan or the overall impact on business of various possibilities.

决策树-顾名思义，决策树用于使用树进行决策。它使用估计和概率来计算可能的结果。树的结构具有根节点，该根节点分为内部节点，然后叶子。这些节点上是数据分类变量。我们的模型从标记的数据中学习，并找到最佳变量来分割数据，以最大程度地减少分类错误。它可以为我们提供分类数据，甚至可以根据从训练数据中获得的知识为我们的数据点预测价值。决策树用于期权定价，市场营销和业务计划的财务中，以找到最佳方案或各种可能性对业务的总体影响。
Dimensionality Reduction — Imagine you got data which has 1000 features or you conducted a survey with 25 questions and are having a hard time now making sense of which question is answering what. That is where the family of dimensionality reduction algorithms come into picture. As the name suggests they help us in reducing the dimensions of our data which in turn reduces the over-fitting in our model and reduces high variance on our training set so that we can make better predictions on our test set. In market research survey often it is used to categorize questions into topics which can then easily be made sense of.

降维—想象您获得的数据具有1000个功能，或者您进行了包含25个问题的调查，现在很难理解哪个问题在回答什么。这就是降维算法系列的用武之地。顾名思义，它们可以帮助我们减少数据的维数，从而减少模型的过度拟合并减少训练集的高方差，以便我们可以对测试集做出更好的预测。在市场研究调查中，通常将问题归类为主题，然后可以很容易地理解它们。
Instance Based — This supervised machine learning algorithm performs operations after comparing current instances with previously trained instances that are stored in memory. This algorithm is called instance based because it is using instances created using training data. k-nearest neighbors is one such example where new location for neighbor is updated all the time based on the number of neighbors we want for our data points and it is done using the previous instance of neighbor and its position which was stored in memory. Websites recommend us new products or movies working on these instance based algorithms and mix of more crazy algorithms.

基于实例-这种受监督的机器学习算法在将当前实例与存储在内存中的先前训练过的实例进行比较之后，执行操作。该算法称为基于实例，因为它使用的是使用训练数据创建的实例。 k最近邻居就是这样一个例子，其中邻居的新位置始终根据我们想要的数据点邻居数来更新，并且使用邻居的先前实例及其存储在内存中的位置来完成。网站向我们推荐使用这些基于实例的算法以及更多疯狂算法的新产品或电影。
Clustering — Making bunch of similar type of things is called clustering. The difference here is that we are clustering points based on the data we have. This is an unsupervised machine learning algorithm where algorithm itself makes sense of whatever gibberish we give it. Algorithm clusters the data based on those inputs and then we can make sense of data and find out what all things or points fit together better. Some of the business applications include bundling of products based on customer data of purchase of products. Clustering consumers on basis of their reviews about a service or product into difference categories. These insights help in business decisions.

聚类-使一堆类似类型的事物称为聚类。此处的区别在于，我们基于已有数据对点进行聚类。这是一种无监督的机器学习算法，其中算法本身可以理解我们提供的任何胡言乱语。算法根据这些输入对数据进行聚类，然后我们可以理解数据并找出所有事物或点之间更好地融合在一起的事物。一些业务应用程序包括基于产品购买的客户数据捆绑产品。根据对服务或产品的评论，将消费者分为不同的类别。这些见解有助于业务决策。
Regression — In statistics often we come across problems which require us to find a relationship between two variables in our data. We explore how change in one variable can affect the other variable. That is where we use regression. In these algorithms our machine tries to find the best line that can be fit into our data something similar to slope of a line. Our algorithm tries to find the line with best slope to minimize error in our data. This line can be used then by us to make predictions be it in form of values or in the form of probability

回归-在统计中，我们经常遇到一些问题，这些问题要求我们在数据中找到两个变量之间的关系。我们探索一个变量的变化如何影响另一个变量。那就是我们使用回归的地方。在这些算法中，我们的机器会尝试找到最适合与数据斜率相似的数据的线。我们的算法试图找到斜率最佳的线，以最大程度地减少数据中的误差。然后我们可以使用这条线以值的形式或概率的形式进行预测
Rule System — Rule based machine learning algorithms work on set of rules that are either predefined by us or they develop those rules themselves. These algorithms are less agile when creating a model or making predictions based on that model. But due to their less agility they are faster in doing what they are set to do. These are used to analyze huge chunks of data or even data which is constantly growing. They are also used for classification and can work faster than other classification algorithms. Accuracy might take a hit here but in machine learning its always a trade-off between accuracy and speed.

规则系统-基于规则的机器学习算法可以处理由我们预先定义的规则集，也可以自行开发这些规则。在创建模型或基于该模型进行预测时，这些算法的敏捷性较低。但是，由于他们的敏捷性较差，因此他们可以更快地完成自己打算做的事情。这些用于分析大量数据甚至是不断增长的数据。它们还用于分类，并且可以比其他分类算法更快地工作。准确性在这里可能会受到打击，但是在机器学习中，它总是在准确性和速度之间进行权衡。
Regularization — These techniques or algorithms are used in conjunction with regression or classification algorithms to reduce the effect of over-fitting in data. Tweaking of these algorithm allows to find the right balance between training the model well and the way it predicts. Many times we have too many variables or their effect on modelling is huge in those cases regularization works to reduce that high variance in our model.

正则化-这些技术或算法与回归或分类算法结合使用，以减少数据过度拟合的影响。通过对这些算法进行调整，可以在很好地训练模型和预测模型之间找到适当的平衡。很多时候，我们有太多的变量，否则它们对建模的影响很大，在这种情况下，正则化可以减少模型中的高方差。
Ensemble — This method of machine learning combines various models to produce one optimal predictive model. They are usually better than single models as they are combining different models to achieve higher accuracy. More like a perfect life partner. Only drawback being that they might be slow in running. Sometimes when speed is required over accuracy we can switch over to rule-based algorithms or regression.

集成-这种机器学习方法结合了各种模型以产生一个最佳的预测模型。它们通常比单个模型更好，因为它们结合了不同的模型以实现更高的精度。更像是完美的生活伴侣。唯一的缺点是它们可能运行缓慢。有时，当需要速度而不是准确性时，我们可以切换到基于规则的算法或回归。
Neural Networks — Based on the principle of working of neurons in brain, Neural networks are complex algorithms that work in layers. These layers take input from previous layer and do processing. More layers increase the accuracy but make algorithm slow. They work better than other algorithms but due to their computationally expensive characteristics did not gain popularity in past. But now they are back in business as the processors have improved. They are being used for sales forecasting, financial predictions, anomaly detection in data and language processing.

神经网络-基于大脑中神经元工作的原理，神经网络是可以分层工作的复杂算法。这些层从上一层获取输入并进行处理。多层可以提高准确性，但会使算法变慢。它们比其他算法更好地工作，但是由于其计算量大的特性在过去并未获得普及。但是现在随着处理器的改进，它们又重新开始营业。它们被用于销售预测，财务预测，数据和语言处理中的异常检测。
Deep Learning — Deep learning algorithms use neural networks and constantly evolve the model they work on using new data. They learn and better themselves just like a human being would. Self-driving cars are based on these algorithms. I know what you are thinking here, it is what AI is based on. The real terminator will be based on this algorithm but we are way far away from it. There are full businesses running on deep learning algorithms. New delivery systems are under development which use these algorithms, Google’s AlphaGo is another example. Deep learning structures algorithms in layers and uses them to make decisions.

深度学习-深度学习算法使用神经网络，并使用新数据不断发展其工作的模型。他们像人类一样学习和改善自己。自动驾驶汽车基于这些算法。我知道您在这里想的是AI的基础。真正的终结器将基于此算法，但是我们离它很远。有大量的企业在使用深度学习算法。正在开发使用这些算法的新交付系统，谷歌的AlphaGo是另一个例子。深度学习将算法分层构建，并使用它们来制定决策。

Read more: I will write soon on each algorithm and its sub categories in detail. Meanwhile you can explore these interesting projects on:

：我将很快详细介绍每种算法及其子类别。同时，您可以在以下位置探索这些有趣的项目：

Twitter Project- Viral Tweets using K-Nearest Neighbor

Twitter项目-使用K最近邻居的病毒推文
Twitter Project- Classification using Naive Bayes Classifier

Twitter项目-使用朴素贝叶斯分类器进行分类

翻译自: https://medium.com/swlh/63-machine-learning-algorithms-introduction-5e8ea4129644

深度学习算法和机器学习算法

查看全文

http://www.taodudu.cc/news/show-3155475.html

武田呈报的ALK+ NSCLC长期数据显示，ALUNBRIG(R) (brigatinib)在2年随访期之后继续展示一线治疗中的优效性
pdf解密软件pdf password remover
PDF文件破解打开密码
解密PDF文件打开密码
解密PDF文件口令密码
解决加密PDF的破解软件
PDF文件怎么免费解密?
受保护的PDF文件如何编辑【PDF解密软件】
解密PDF文件的打开密码
一招教你如何大批量解密pdf
【Java习题1】模拟豆机游戏
漫看影视：豆瓣2019年评分最高外语电影推荐
获取豆瓣用户看过的电影名以及评分，短评，标签等
27 《给人好印象的秘诀:如何让别人信任你、喜欢你、帮助你》 -豆瓣评分6.6
豆瓣Top250电影数据分析报告
数据分析-豆瓣电影Top250
未明学院学员报告:「看电影攻略」之豆瓣电影票房与口碑分析，这几类电影不易踩雷！
[安卓端] 豆瓣小组聊天机器人，模拟器可用
豆瓣电影TOP250抓取
unity3d游戏3d局域网联机吃球游戏完整项目源码分享
方舟建服务器局域网显示,《方舟：生存进化》局域网怎么联机局域网联机教程分享...
网络无法找到计算机6,电脑中玩文明6局域网联机游戏找不到房间如何解决
unity基于NetWork的局域网对战游戏制作
rust 局域网联机_腐蚀怎么进行局域网联机方式酷跑加速器和你畅玩游戏世界
局域网联机_343分享大量《光环无限》情报本地分屏与局域网联机都不会少
mcjava盗版联机_盗版我的世界怎么局域网联机
局域网steam联机_适合和基友联机一起玩的单机游戏（1）
局域网联机_【进击的巨人21】【全DLCs整合】【局域网联机】【免安装解压即玩】免费分享...
求生之路2不显示局域网服务器,求生之路2怎么局域网联机求生之路2局域网联机教程...
局域网联机游戏找不到服务器,N2N组建虚拟局域网联机遇到搜不到房间的问题一例...