机器学习术语

Till this day, my favorite definition of a Machine is ; something that makes work easier. At its simplest, a machine is an invention that does a job better and faster and more powerfully than a human being. With regards to machine learning, this is the why. There is a need to preform a task more efficiently and at a faster rate. What is the task? to make decisions. Hence what then is Machine learning??

直到今天,我最喜欢的机器定义是; 使工作更轻松的东西。 最简单地说,机器是一项比人类更好,更快,更强大地完成工作的发明。 关于机器学习,这就是原因。 需要更有效和更快地执行任务。 任务是什么? 做出决定。 因此,机器学习又是什么呢?

Before I answer that, a quick introduction. In my journey to becoming a data scientist, I found myself having to learn a lot of new terminologies. Even certain terms that already existed in my vocabulary, took on a new meaning. A lot of these terminologies can be wordy and somewhat intimidating. My aim in this write up is to provide as much as possible layman definitions for the basic terminologies associated with machine learning that I have come across.

在我回答之前,先进行快速介绍。 在成为数据科学家的过程中,我发现自己必须学习许多新术语。 甚至我词汇中已经存在的某些术语也具有新的含义。 这些术语中的许多术语可能有些罗word,有些令人生畏。 我写这篇文章的目的是为我遇到的与机器学习相关的基本术语提供尽可能多的外行定义。

Data science in its essence is the skill of using information available to gain insight and improve processes. It does this using a blend of machine learning algorithms, statistics, business intelligence, and programming. It aims to discover patterns from the raw data, which in turn provides insights into any processes.

数据科学从本质上讲就是使用可用信息来获得洞察力和改进流程的技能。 它结合了机器学习算法,统计数据,商业智能和编程来完成此任务。 它旨在从原始数据中发现模式,进而提供对任何流程的见解。

Now back to the question, what is machine learning?

现在回到问题,什么是机器学习?

Machine learning is a field in technology that allows machine to learn from data and self improve. Machine-learning algorithms use statistics and other mathematical tools to find patterns in data.

机器学习是技术领域,允许机器从数据中学习并自我完善。 机器学习算法使用统计数据和其他数学工具来查找数据模式。

Machine Learning can be separated into three groups:

机器学习可以分为三类:

Supervised learning, is a type of machine learning, where data is labeled to tell the machine exactly what patterns it should look for. Under the umbrella of supervised learning:

监督学习是机器学习的一种类型,其中标记数据以告知机器确切应寻找的模式。 在监督学习的保护下:

  • Classification: In classification tasks, the machine learning program must draw a conclusion from observed values and determine to

    分类 :在分类任务中,机器学习程序必须从观察值得出结论并确定

    what category new observations belong

    新观测值属于什么类别

  • Regression: In regression tasks, the machine learning program must estimate and understand the relationships among variables.Regression analysis focuses on one dependent variable and a series of other changing variables.

    回归 :在回归任务中,机器学习程序必须估计并了解变量之间的关系。回归分析着重于一个因变量和一系列其他变化的变量。

  • Forecasting: Forecasting is the process of making predictions about the future based on the past and present data,

    预测 :预测是根据过去和现在的数据对未来进行预测的过程,

Unsupervised learning, here the data has no labels. The machine just looks for whatever patterns it can find.Under the umbrella of Unsupervised learning:

无监督学习,这里的数据没有标签。 机器只会寻找可以找到的任何模式。在无监督学习的保护下:

  • Clustering: Clustering involves grouping sets of similar data (based on defined criteria).After which you can analyze and find patterns

    聚类 :聚类涉及将相似数据集(基于定义的标准)进行分组,然后您可以分析和查找模式

  • Dimension reduction: Dimension reduction reduces the number of variables being considered to find the exact information required.

    降维 :降维减少了为了找到所需的确切信息而要考虑的变量数量。

Reinforcement learning, learns by trial and error to achieve a clear objective. It tries out lots of different things and is rewarded or penalized depending on whether its behaviors help or hinder it from reaching its objective.

强化学习,通过反复试验来学习,以达到明确的目标。 它尝试许多不同的事物,并根据其行为是帮助还是阻碍其实现目标而受到奖励或惩罚。

Machine learning Algorithm

机器学习算法

An ‘algorithm’ is a series of steps to complete a task.

算法”是完成任务的一系列步骤。

An algorithm in machine learning is a procedure that is run on data to create a machine learning “model.

机器学习中的算法是在数据上运行以创建机器学习模型的过程。

Machine learning algorithms perform “pattern recognition.” Algorithms “learn” from data, or are “fit” on a dataset.

机器学习算法执行“ 模式识别” 。 算法从数据中“ 学习 ”,或“ 适合 ”数据集。

A “Model” in machine learning is the output of a machine learning algorithm run on data.

机器学习中的“ 模型 ”是在数据上运行的机器学习算法的输出。

A model represents what was learned by a machine learning algorithm.

模型代表通过机器学习算法学习到的内容。

流行的机器学习算法 (Popular Machine Learning Algorithms)

  • Linear regression (Supervised Learning/Regression): Linear regression is the most basic type of regression. Simple linear regression allows us to understand the relationships between two continuous variables.

    线性回归 (监督学习/回归):线性回归是最基本的回归类型。 简单的线性回归使我们能够理解两个连续变量之间的关系。

  • Logistic regression (Supervised learning — Classification): Logistic regression focuses on estimating the probability of an event occurring based on the previous data provided. It is used to cover a binary dependent variable, that is where only two values, 0 and 1, represent outcomes.

    Logistic回归 (监督学习-分类): Logistic回归专注于根据提供的先前数据估算事件发生的概率。 它用于覆盖二进制因变量,即只有两个值0和1表示结果。

  • Naive Bayes (Supervised Learning — Classification): The Naïve Bayes classifier is based on Bayes’ theorem and classifies every value as independent of any other value. It allows us to predict a class/category, based on a given set of features, using probability.

    朴素贝叶斯 (监督学习-分类):朴素贝叶斯分类器基于贝叶斯定理,将每个值分类为与任何其他值无关。 它使我们能够使用概率基于给定的一组特征来预测类别/类别。

  • K-nearest neighbor algorithm (Supervised Learning): The Neighbor algorithm estimates how likely a data point is to be a member of one group or another. It essentially looks at the data points around a single data point to determine what group it is actually in.

    K近邻算法 (监督学习): Neighbor算法估计数据点成为一个或另一个组的成员的可能性。 它实质上是查看单个数据点周围的数据点,以确定其实际位于哪个组中。

  • Decision trees (Supervised Learning — Classification/Regression): A decision tree is a flow-chart-like tree structure that uses a branching method to illustrate every possible outcome of a decision. Each node within the tree represents a test on a specific variable and each branch is the outcome of that test.

    决策树 (监督学习-分类/回归):决策树是类似于流程图的树结构,使用分支方法来说明决策的每种可能结果。 树中的每个节点代表对特定变量的测试,每个分支都是该测试的结果。

  • Random Forests (Supervised Learning — Classification/Regression): Random forest, like its name implies, consists of a large number of individual decision trees that operate as an ensemble. Each individual tree in the random forest spits out a class prediction and the class with the most votes becomes our model’s prediction

    随机森林 (监督学习-分类/回归):随机森林,顾名思义,是由大量独立的决策树组成的 。 随机森林中的每棵树都会发出类别预测,而投票数最多的类别将成为我们模型的预测

  • Support Vector Machines (Supervised Learning — Classification); Support Vector Machine algorithms are supervised learning models that analyze data used for classification and regression analysis. They essentially filter data into categories, which is achieved by providing a set of training examples, each set marked as belonging to one or the other of the two categories. The algorithm then works to build a model that assigns new values to one category or the other.

    支持向量机 (监督学习-分类); 支持向量机算法是有监督的学习模型,可以分析用于分类和回归分析的数据。 它们实质上将数据过滤到类别中,这是通过提供一组训练示例来实现的,每组训练示例都标记为属于两个类别中的一个或另一个。 然后,该算法将构建一个将新值分配给一个类别或另一个类别的模型。

  • K Means Clustering Algorithm (Unsupervised Learning — Clustering)

    K均值聚类算法 (无监督学习—聚类)

    The algorithm works by finding groups within the data, with the number of groups represented by the variable K. It then works iteratively to assign each data point to one of K groups based on the features provided.

    该算法通过查找数据中的组(用变量K表示的组数)进行工作。然后,该算法根据提供的功能迭代地将每个数据点分配给K个组之一。

  • Artificial Neural Networks (Reinforcement Learning) : An artificial neural network (ANN) comprises ‘units’ arranged in a series of layers, each of which connects to layers on either side. ANNs are inspired by biological systems, such as the brain, and how they process information. ANNs are essentially a large number of interconnected processing elements, working in unison to solve specific problems.

    人工神经网络 (强化学习):人工神经网络(ANN)包括布置在一系列层中的“单元”,每个单元连接到任一侧的层。 人工神经网络受到诸如大脑之类的生物系统以及它们如何处理信息的启发。 人工神经网络本质上是大量相互连接的处理元素,它们协同工作以解决特定问题。

Other useful terminologies when talking about machine learning include:

在谈论机器学习时,其他有用的术语包括:

Ensemble learning method, combining multiple algorithms to generate better results for classification, regression and other tasks. Each individual classifier is weak, but when combined with others, can produce excellent results.

集成学习方法 ,结合多种算法为分类,回归和其他任务生成更好的结果。 每个单独的分类器都很弱,但是与其他分类器结合使用时,可以产生出色的结果。

Artificial Intelligence (AI) refers to machines that can learn, reason, and act for themselves. They can make their own decisions when faced with new situations, in the same way that humans and animals can.

人工智能 (AI)是指可以自行学习,推理并采取行动的机器。 面对新的情况,他们可以像人类和动物一样做出自己的决定。

Data are characteristics or information that are collected through observation

数据是通过观察收集的特征或信息

Data Cleaning refers to the steps needed to take to prepare you data for use. Here you detect incomplete, incorrect, inaccurate or irrelevant data from your dataset and then you choose either to replace, modify, delete or coarse the data as needed

数据清理是指准备使用数据所需采取的步骤。 在这里,您可以从数据集中检测不完整,不正确,不准确或不相关的数据,然后根据需要选择替换,修改,删除或粗化数据

Exploratory data analysis (EDA):This refers to the critical process of performing initial investigations on data so as to discover patterns, to spot anomalies, to test hypothesis and to check assumptions with the help of summary statistics and graphical representations.

探索性数据分析 (EDA):这是对数据进行初步调查以发现模式,发现异常情况,检验假设并在汇总统计信息和图形表示的帮助下检查假设的关键过程。

Training data is the main and most important data which helps machines to learn and make the predictions. This data set is used by machine learning engineer to develop your algorithm and more than 70% of your total data used in the project.

训练数据是主要和最重要的数据,可帮助机器学习并做出预测。 机器学习工程师使用此数据集来开发您的算法,并在项目中使用总数据的70%以上。

Validation Data is the second type of data set used to validate the machine learning model before final delivery of project. ML model validation is important to ensure the accuracy of model prediction to develop a right application. Using this type of data helps to know whether model can correctly identify the new examples or not.

验证数据是第二种数据集,用于在最终交付项目之前验证机器学习模型。 ML模型验证对于确保模型预测的准确性以开发正确的应用程序非常重要。 使用此类数据有助于了解模型是否可以正确识别新示例。

Testing data is the final and last type of data helps to check the prediction level of machine learning and AI model.

测试数据是最终的数据类型,也是最后一种数据类型,它有助于检查机器学习和AI模型的预测水平。

The world of machine learning and data science is vast and ever growing. It is easy to view it as an insurmountable endeavor. I’ll like to encourage anyone at wishing to take the path down this road not to be intimidated. A lot of these terminologies only sound incomprehensible but once you discover its very essence, everything becomes clear. Again, good things take time and great ones take even more time, so do not weary and keep pushing forward.

机器学习和数据科学的世界广阔且不断增长。 很容易将其视为无法克服的努力。 我想鼓励任何想走这条路的人不要被吓到。 这些术语中的许多听起来仅是难以理解的,但是一旦发现其本质,一切就变得清晰起来。 同样,美好的事物需要时间,伟大的事物需要更多的时间,因此不要疲倦并继续前进。

翻译自: https://medium.com/@chibuzo.ugonabo/machine-learning-terminologies-demystified-6aa1aa81a57b

机器学习术语


http://www.taodudu.cc/news/show-863760.html

相关文章:

  • centos有趣软件包_这5个软件包使学习R变得有趣
  • 求解决方法_解决方法
  • xml格式是什么示例_什么是对抗示例?
  • mlflow_在生产中设置MLflow
  • 神秘实体ALIMA
  • mnist数据集彩色图像_使用MNIST数据集构建多类图像分类模型。
  • bert使用做文本分类_使用BERT进行深度学习的多类文本分类
  • 垃圾邮件分类器_如何在10个步骤中构建垃圾邮件分类器
  • ai 图灵测试_适用于现代AI系统的“视觉图灵测试”
  • pytorch图像分类_使用PyTorch和Streamlit创建图像分类Web应用
  • 深度学习之对象检测_深度学习时代您应该阅读的12篇文章,以了解对象检测
  • python 梯度下降_Python解释的闭合形式和梯度下降回归
  • 内容管理系统_内容
  • opencv图像深度-1_OpenCV空间AI竞赛之旅(第1部分-初始设置+深度)
  • 概率编程编程_概率编程语言的温和介绍
  • TensorFlow 2.X中的动手NLP深度学习模型准备
  • 时间序列 线性回归 区别_时间序列分析的完整介绍(带R)::线性过程I
  • 深度学习学习7步骤_如何通过4个简单步骤为深度学习标记音频
  • 邮件伪造_伪造品背后的数学
  • 图像匹配与OpenCV模板匹配
  • 边缘计算边缘计算edge_Edge AI-边缘上的计算机视觉推理
  • arduino 入门套件_计算机视觉入门套件
  • 了解LSTM和GRU
  • 使用TensorFlow 2.0+和Keras实现AlexNet CNN架构
  • power bi_如何将Power BI模型的尺寸减少90%!
  • 使用Optuna的XGBoost模型的高效超参数优化
  • latex 表格中虚线_如何识别和修复表格识别中的虚线
  • 构建强化学习_如何构建强化学习项目(第1部分)
  • sam服务器是什么_使用SAM CLI将机器学习模型部署到无服务器后端
  • pca 主成分分析_六分钟的主成分分析(PCA)的直观说明。

机器学习术语_机器学习术语神秘化。相关推荐

  1. 机器学习 可视化_机器学习-可视化

    机器学习 可视化 机器学习导论 (Introduction to machine learning) In the traditional hard-coded approach, we progra ...

  2. 机器学习指南_机器学习-快速指南

    机器学习指南 机器学习-快速指南 (Machine Learning - Quick Guide) 机器学习-简介 (Machine Learning - Introduction) Today's ...

  3. 机器学习:分类_机器学习基础:K最近邻居分类

    机器学习:分类 In the previous stories, I had given an explanation of the program for implementation of var ...

  4. 机器学习 导论_机器学习导论

    机器学习 导论 什么是机器学习? (What is Machine Learning?) Machine learning can be vaguely defined as a computers ...

  5. 如何准备机器学习数据集_机器学习演练第一部分:准备数据

    如何准备机器学习数据集 Cleaning and preparing data is a critical first step in any machine learning project. In ...

  6. 贝叶斯推理和机器学习中文版_机器学习如何使AI忘记知识表示和推理

    贝叶斯推理和机器学习中文版 In my early days working as a data scientist in AI, I was taught one thing above all: ...

  7. 机器学习 社交网络_机器学习从业人员在社交媒体上的自我推广会是什么样子?...

    机器学习 社交网络 意见 (Opinion) "When you're good at something, you'll tell everyone. When you're great ...

  8. 神经网络相关术语_神经网络术语的初学者词汇表

    神经网络相关术语 One of the greatest road-blocks for newcomers to data science and deep learning is the mult ...

  9. 机器学习 数学_机器学习的数学先决条件

    机器学习 数学 Hi everyone, welcome to my second post! This is going to be the continuation from my first p ...

最新文章

  1. 干掉 Postman?测试接口直接生成API文档,这工具真香!
  2. 五月记事 2005-05-03
  3. 【深度学习入门到精通系列】什么是消融实验(Ablation experiment)
  4. django rest framework------得心应手
  5. Openstack Object Store(Swift)设置公有存储的方法
  6. AndroidStudio_开发工具的设置_代码编辑器使用_新特性---Android原生开发工作笔记73
  7. 用java写一个if语句
  8. Linux内存管理学习资料
  9. 【toplink】 位居第一的Java对象关系可持续性体系结构
  10. 用户控件中得到CurrentUser
  11. ai的预览模式切换_ai预览快捷键是什么,Adobe Illustrator预览快捷键是什么?
  12. 使用FFmpeg视频缩略图实现
  13. R语言apply族函数详解
  14. Java导入Excel文档到数据库
  15. 有关H5第二章排列页面内容介绍
  16. 屏保:毛雷尔玫瑰屏保
  17. 2021-07-07 - 使用脚本批量下载网页视频[如哔哩哔哩] - 学习/实践
  18. python替换文本
  19. Codeforeces #710 div3题解报告
  20. MySQL的Logo为 标志_MySQL 的Logo为[     ]标志,海豚代表了速度、动力、精确等MySQL所拥有的特性。_国际贸易基础知识答案_学小易找答案...

热门文章

  1. MyGeneration学习笔记(5) :在Web Service中使用dOOdad(中)
  2. JavaScript中的document.cookie的使用
  3. 《WCF技术内幕》翻译31:第2部分_第6章_通道:概述与正确认识通道。
  4. Android SDK Manager 更新慢解决办法
  5. Subversion 1.5 安装配置指南
  6. Linux测量进程内存峰值,linux / unix进程的峰值内存使用情况
  7. 两个分布的特征映射_DDC:直接对齐特征空间进行领域自适应
  8. python程序员脱单攻略_作为一只程序员,如何脱单?
  9. Just h-index(主席树+二分)
  10. Linux 没有主清单属性,maven编译正常,运行报错:中没有主清单属性