机器学习ml_机器学习简介（ML）

机器学习ml

Here you will get introduction to machine learning.

在这里，您将获得机器学习的介绍。

Hello there. Many of you must be aware of this term but some might be wondering what the heck is this? Another technical jargon only? Let’s make this simple for you, Machine Learning is made up of two different words Machine and Learning which literally means “making machines learn”. Again how is this possible? We would talk about this later in this very post. Stay tuned.

你好。你们中的许多人必须意识到这个术语，但有些人可能想知道这到底是什么？只是另一个技术术语？让我们为您简化这件事， 机器学习由机器和学习这两个不同的词组成，字面意思是“使机器学习”。同样，这怎么可能？我们将在本文的稍后部分讨论这一点。敬请关注。

Image Source

图片来源

If you eager to know some interesting points about Machine Learning (ML) we’ve got you covered. Let’s dive deeper.

如果您想了解有关机器学习(ML)的一些有趣的知识，我们将为您服务。让我们更深入。

ML is a vast field and very often related with AI (Artificial Intelligence), whereas some people use these two terms interchangeably. But according to data scientists these two are quite distinct from each other in many aspects. In other words ML is a subset of AI.

ML是一个广阔的领域，并且经常与AI(人工智能)相关，而有些人可以互换使用这两个术语。但是根据数据科学家的说法，这两个方面在很多方面都是截然不同的。换句话说，ML是AI的子集。

现实生活中的机器学习(ML)示例 (Real Life Machine Learning (ML) Examples)

Example 1:

范例1：

We all use email services of Gmail on almost regular basis, but have you ever wondered why is there a section named ‘SPAM’ and there exist some mails in it. Here is where ML come into action, with the application of ML Gmail programs it’s product to differentiate between legit and spam mails. Sounds interesting? Let’s see some more examples.

我们几乎都定期使用Gmail的电子邮件服务，但是您是否想知道为什么会有一个名为“ SPAM”的部分并且其中存在一些邮件。这是ML发挥作用的地方，通过ML Gmail程序的应用，该产品可以区分合法邮件和垃圾邮件。听起来不错？让我们看看更多示例。

Example 2:

范例2：

Have you guys ever noticed that after you surf any product selling site you start seeing very similar advertisement across the web? Suppose you surfed a clothing site, right from that moment you will start noticing ads very similar to the product you searched for. This motive of big companies is accomplished by the application of ML only.

你们有没有注意到在浏览任何产品销售网站后，您开始在网络上看到非常相似的广告？假设您浏览了服装网站，从那一刻开始，您将开始注意到与您搜索的产品非常相似的广告。大公司的动机仅通过ML的应用即可实现。

Not only these, ML is functional almost everywhere from Facebook to astronomy to predicting your credit score. Though ML practices are not evolved that much yet, but is definitely among one of the hottest topics of the decade. Also the career options in this domain would be supposed to be a wise decision on the basis of current scenario.

不仅如此，从Facebook到天文学到预测您的信用评分，ML几乎在所有地方都可以使用。尽管机器学习实践还没有发展那么多，但是绝对是近十年来最热门的主题之一。而且，根据当前情况，在该领域的职业选择应该被认为是明智的决定。

Image Source

图片来源

As we can infer from the image above that the machine is made to learn from ‘experience’, i.e. we feed the machines with bulk of data related to any function/work that we expect it to do. The machine primarily tries to recognize the patterns in the input data and learns the pattern. Later then when machine come across any similar pattern it delivers the intended result.

从上图可以推断出，机器是从“经验”中学习的，即，我们向机器提供了与我们期望其执行的任何功能/工作相关的大量数据。机器主要尝试识别输入数据中的图案并学习图案。之后，当机器遇到任何类似的模式时，它就会提供预期的结果。

Let’s understand this with an example, suppose we want to make our machine tell us the breed of the dog when we click a picture of any dog with camera. First we need to train our machine with abundance of dog related data i.e. how a breed looks like, what they eat, height of the breed, friendliness with human etc. The machine try to form some pattern from this data and trains itself from previous experiences. Next time when your machine come across any dog it will be able to tell you the breed (though not 100 percent accurately).

让我们用一个例子来理解这一点，假设我们想让我们的机器在用相机点击任何狗的图片时告诉我们狗的品种。首先，我们需要使用大量与狗相关的数据来训练我们的机器，例如一个犬种的外观，它们的饮食，该犬的身高，与人类的友善度等。该机器试图从这些数据中形成某种模式，并根据以前的经验进行自我训练。下次当您的机器碰到任何狗时，它都能告诉您该品种(尽管不是100％准确)。

机器学习(ML)入门 (Getting Started with Machine Learning (ML))

As we are already aware about the fact that ML is a subset of AI right? So talking about the Artificial Intelligence, this term is not very new to us. Researches on AI is old thing i.e. scientists were trying to develop an artificial brain since 1940s and 50s which led the foundation of AI. Coming back to ML, it is an advancement in the AI’s domain with possibility of the products like Human Robot, Driverless cars etc.

就像我们已经知道ML是AI的子集一样，对吗？因此，在谈论人工智能时，这个术语对我们来说并不是一个陌生的词。对AI的研究是陈旧的事情，即自1940年代和50年代以来，科学家一直在尝试开发人造大脑，这导致了AI的建立。回到ML领域，它是AI领域的一项进步，它可能提供人类机器人，无人驾驶汽车等产品。

Let’s have a look on what are the prime contents in ML:

让我们看一下ML的主要内容：

Finding the Dataset查找数据集
Which language to opt for MLML选择哪种语言
Development Environment (IDE) for MLML开发环境(IDE)
Important Packages & Libraries重要软件包和库
Supervised Learning监督学习
Classification分类
Regression回归
Unsupervised Learning无监督学习
Clustering聚类
ML Models机器学习模型
Data Mining数据挖掘
Natural Language processing自然语言处理

Note: Apart from these we do have some bonus tips and suggestions for our readers, which will be provide in between the learning process.

注意：除了这些，我们还有一些针对读者的奖励技巧和建议，它们将在学习过程中提供。

The topics mentioned above cover most of the machine learning and are vast enough to accommodate in one, two or three blog posts. So, we will be publishing the posts on regular intervals to let our readers get a grasp over ML. Hope you guys enjoy learning with us. So let’s dive together.

上面提到的主题涵盖了大多数机器学习，并且涉及一个，两个或三个博客文章。因此，我们将定期发布文章，以使我们的读者了解ML。希望你们喜欢和我们一起学习。因此，让我们一起潜水。

查找数据集 (Finding Dataset)

Image Source

图片来源

The very first step in the process of ML is finding a relevant dataset for your machine accompanied by data cleaning and pre-processing. Datasets contain abundant of data as you can see in the example above that are used as experiences for the machine and machine tries to develop some patterns from them.

机器学习过程的第一步就是为您的机器找到相关的数据集，并进行数据清理和预处理。数据集包含大量数据，如您在上面的示例中看到的那样，这些数据集被用作机器的体验，并且机器尝试从中开发出一些模式。

You can find a dataset according to your needs very easily and essentially for free most of the times. Here are some of the open repositories that we would like to suggest our readers to have their intended datasets.

您可以很容易且基本上免费地根据需要找到数据集。以下是一些开放的存储库，我们希望建议读者拥有他们想要的数据集。

Data World Repository

数据世界资料库
UCI Repository

UCI资料库
Kaggle Repository

Kaggle资料库

Here we’ve mentioned a few online free repositories where you can find your datasets. You just need to visit the websites and download the required dataset in .csv format.

在这里，我们提到了一些在线免费存储库，您可以在其中找到数据集。您只需要访问网站并以.csv格式下载所需的数据集即可。

After successful downloading of the dataset, the data cleaning and pre-processing steps come into consideration which we will be studying in later posts.

成功下载数据集后，将考虑数据清理和预处理步骤，我们将在以后的文章中进行研究。

ML的哪种语言？ (Which Language for ML?)

We can use any of the language like R, Python, Java, Scala etc. But in this course we will be focusing on one of the procedural language R and one object oriented language Python. Also these two languages are the most beloved and preferred languages among data scientists.

我们可以使用R，Python，Java，Scala等任何一种语言。但是在本课程中，我们将重点介绍过程语言R中的一种和面向对象语言Python中的一种 。同样，这两种语言也是数据科学家中最受欢迎和首选的语言。

Let’s do a comparative study of R and Python and find out what they are good for and what not:

让我们对R和Python进行比较研究，找出它们的优点和缺点：

R语言 (R Language)

Why Good?

为什么好？

In-Depth Statistical Analysis: R being a language designed for statisticians, it is no point denying that R is practically very mush suited for Statistical Analysis. It adds value to your motive whether you are working data derived from sensors from an IOT device or prediction in financial models. Another reason why R is loved by the data scientists is the fact that it contains CRAN repository, which is the house of thousands of outstanding packages to allow for more elaborate analysis and visualization tasks.

深入的统计分析： R是为统计学家设计的语言，毫无疑问，R实际上非常适合统计分析。无论您是从物联网设备的传感器获得的工作数据还是财务模型中的预测，它都能为您增添价值。数据科学家喜欢R的另一个原因是它包含CRAN存储库，它是成千上万个出色软件包的内部，可用于进行更精细的分析和可视化任务。
High-Quality Imaging: R is a well-known language for producing high quality graphs and charts. The important packages that adds more value to R’s this functionality are ggplot2, googleVis, rCharts There exists a Shiny framework in R which can be used to turn visuals into interactive web applications.

高质量成像： R是用于产生高质量图形和图表的众所周知的语言。 ggplot2，googleVis，rCharts是为R的此功能增加更多价值的重要软件包。R中存在Shiny框架，可用于将视觉效果转变为交互式Web应用程序。

Why not Good?

为什么不好？

Learning Ease: R is a language for which it is said that if the programmer have a background in mathematics or statistics it would be pretty easy for him/her to get a grasp of the language otherwise it is more likely to appear counter intuitive.

轻松学习： R是一种语言，据说如果程序员具有数学或统计学背景，那么他/她将很容易掌握该语言，否则它很可能会显得与直觉相反。
Processing Large data: The flexibility provided by R when it comes to processing and creation of large-scale data products is not appreciated. Rather the data scientists prefer to go with languages like Python or Java when actual product is to be made.

处理大数据： R在处理和创建大规模数据产品方面所提供的灵活性并没有得到赞赏。相反，数据科学家在实际生产产品时更喜欢使用Python或Java之类的语言。
Performance: When compared with other languages, the performance delivered by R is not up to the mark because R was designed with data scientists in mind, not the computers. It is observed that R is relatively slower than Java or Python.

性能：与其他语言相比，R提供的性能达不到标准，因为R是为数据科学家而不是计算机而设计的。可以看出，R相对比Java或Python慢。

Python语言 (Python Language)

Why Good?

为什么好？

Smooth Workflow: Python provides a workflow integration and is thus popular among the developers and data scientists when it comes for applying statistical techniques or when these tasks need to be integrated with web apps or production environments. In order to manage their entire data-related workflow data scientists choose Python as their first priority.

流畅的工作流： Python提供了工作流集成，因此在应用统计技术或需要将这些任务与Web应用程序或生产环境集成时，在开发人员和数据科学家中很流行。为了管理与数据相关的整个工作流程，数据科学家选择Python作为其首要任务。
Beneficial in ML: Various libraries that is being provided by python like Scikit-learn, Tensorflow, Pandas, Numpy, PyBrain etc. and a flexibility of python makes this language suitable for application of ML techniques and developing sophisticated models and prediction engines.

有利于ML： python提供的各种库(例如Scikit-learn，Tensorflow，Pandas，Numpy，PyBrain等)以及python的灵活性使该语言适用于ML技术的应用以及开发复杂的模型和预测引擎。

Why not Good?

为什么不好？

Not suitable for specialized data tasks: Though Python is well known for it’s flexibility but there are still hundreds of such R packages that do not have equivalent Python substitutes. If very specific tasks have to be done, R is preferred over Python.

不适合于专门的数据任务：尽管Python以其灵活性而闻名，但仍然有数百个此类R软件包没有等效的Python替代品。如果必须完成非常特定的任务，则R优先于Python。

So, in this blog post we’ve covered a few topics pertaining to ML and we will be learning all of the remaining topics in the later posts of this course. Hope you guys enjoying learning with us. Stay tuned for more blog posts like this.

因此，在此博客文章中，我们涵盖了与ML有关的一些主题，我们将在本课程的后续文章中学习所有其余主题。希望你们喜欢和我们一起学习。请继续关注更多类似这样的博客文章。

Comment below if you have any queries related to above introduction to machine learning.

如果您对以上机器学习入门有任何疑问，请在下面评论。

翻译自: https://www.thecrazyprogrammer.com/2017/11/introduction-machine-learning.html

机器学习ml