通用大数据架构-

有关深层学习的FAU讲义 (FAU LECTURE NOTES ON DEEP LEARNING)

These are the lecture notes for FAU’s YouTube Lecture “Deep Learning”. This is a full transcript of the lecture video & matching slides. We hope, you enjoy this as much as the videos. Of course, this transcript was created with deep learning techniques largely automatically and only minor manual modifications were performed. If you spot mistakes, please let us know!

这些是FAU YouTube讲座“ 深度学习 ”的讲义。 这是演讲视频和匹配幻灯片的完整记录。 我们希望您喜欢这些视频。 当然，此成绩单是使用深度学习技术自动创建的，并且仅进行了较小的手动修改。 如果发现错误，请告诉我们！

导航 (Navigation)

Previous Lecture / Watch this Video / Top Level / Next Lecture

上一个讲座 / 观看此视频 / 顶级 / 下一个讲座

CC BY 4.0 from the 深度学习讲座中 Deep Learning Lecture.CC BY 4.0下的图像。

Welcome everybody to the next part of deep learning! Today, we want to finish talking about common practices, and in particular, we want to have a look at the evaluation. Of course, we need to evaluate the performance of the models that we’ve trained so far. Now, we have set up the training, set hyperparameters, and configured all of this. Now, we want to evaluate the generalization performance on previously unseen data. This means the test data and it’s time to open the vault.

欢迎大家参加深度学习的下一部分！今天，我们想结束对常见做法的讨论，尤其是我们要看一下评估。当然，我们需要评估到目前为止已训练的模型的性能。现在，我们已经设置了训练，设置了超参数，并配置了所有这些。现在，我们要评估以前看不见的数据的泛化性能。这意味着需要测试数据，是时候打开文件库了。

Remember “Of all things the measure is man”. So, data is annotated and labeled by humans and during training, all labels are assumed to be correct. But of course, to err is human. This means we may have ambiguous data. The ideal situation that you actually want to have for your data is that it has been annotated by multiple human voters. Then you can take the mean or a majority vote. There’s also a very nice paper by Stefan Steidl from 2005. It introduces an entropy-based measure that takes into account the confusions of human reference labelers. This is very useful in situations where you have unclear labels. In particular, in emotion recognition, this is a problem as also humans confuse sometimes classes like angry versus annoyed while they are not very likely to confuse “angry” versus “happy” as this is a very clear distinction. There are different degrees of happiness. Sometimes you’re just a little bit happy. In these cases, it is really difficult to differentiate happy from neutral. This is also hard for humans. In prototypes, if you have for example actors playing, you get emotion recognition rates way over 90%. If you have real data emotion and if you have emotions as they occur in daily life, it’s much harder to predict. This can then also be seen in the labels and in the distribution of the labels. If you have a prototype, all of the raters will agree that the observation is clearly this particular class. If you have nuances and not so clear emotions, you will see that also our raters will have a less peaked or even a uniform distribution over the labels because they also can’t assess the specific sample. So, mistakes by the classifier are obviously less severe if the same class is also confused by humans. Exactly this is considered in Steidl’s entropy-based measure.

记住“万物都是人”。因此，数据是由人类注释和标记的，并且在训练过程中，所有标记都被认为是正确的。但是，当然，犯错是人的。这意味着我们可能有不明确的数据。您实际上想要的数据的理想情况是，它已经由多个人类选民进行了注释。然后，您可以进行平均或多数表决。 Stefan Steidl于2005年发表了一篇非常不错的论文。该论文引入了一种基于熵的度量，该度量考虑了人类参考标签的混淆。这在标签不清晰的情况下非常有用。尤其是在情绪识别中，这是一个问题，因为人类有时也会混淆诸如愤怒和生气之类的类，而他们不太可能将“愤怒”与“快乐”相混淆，因为这是一个非常明显的区别。有不同程度的幸福。有时候你只是有点高兴。在这些情况下，很难将快乐与中立区分开。这对人类来说也很难。在原型中，如果您有演员表演，您的情感识别率将超过90％。如果您有真实的数据情感，并且在日常生活中也有情感，那么很难预测。这也可以在标签和标签分布中看到。如果您有原型，则所有评估者都将同意观察显然是该特定类别。如果您有细微差别，但情绪不太清楚，您会发现我们的评估者在标签上的峰值或峰值分布也较小，因为他们也无法评估特定样本。因此，如果同一类也被人类混淆，则分类器的错误显然不会那么严重。正是在Steidl的基于熵的度量中考虑了这一点。

Now, if we look into performance measures, you want to take into account the typical classification measures. They are typically built around the false negatives, the true negatives, the true positives, and the false positives. From that for binary classification problems, you can then compute true and false positive rates. This typically then leads to numbers like the accuracy that is the number of true positives plus true negatives over the number of positives and negatives. Then, there is the precision or positive predictive value that is computed as the number of true positives over the number of true positives plus false positives. There’s the so-called recall that is defined as the true positives over the true positives plus the false negatives. Specificity or true negative value is given as the true negatives over the true negatives plus the false positives. The F1 score is an intermediate way of mixing these measures. You have the true positive value times the true negative value divided over the sum of two positive and true negative value. I typically recommend the receiver operating characteristic (ROC) curves because all of the measures that you’ve seen above, they are dependent on thresholds. If you have the ROC curves, you essentially evaluate your classifier for all different thresholds. This then gives you an analysis of how well it performs in different scenarios.

现在，如果我们研究绩效指标，则需要考虑典型的分类指标。它们通常围绕假阴性，真阴性，真阳性和假阳性构建。从二进制分类问题中，可以计算出真假率。然后，这通常会导致产生诸如准确性之类的数字，即准确的正数加上正确的负数超过正数和负数的数量。然后，有一个精度或正预测值，该值被计算为真阳性的数量超过真阳性的数量加上假阳性的数量。所谓的回忆，是指真实肯定比真实肯定加上错误否定。特异性或真实负值是相对于真实负值加上错误正值的真实负值。 F1分数是混合使用这些度量的中间方法。您得到的是真实的正值乘以真实的负值再除以两个正值和真实的负值之和。我通常建议接收器工作特性(ROC)曲线，因为您在上面看到的所有度量均取决于阈值。如果您具有ROC曲线，则可以从本质上评估所有不同阈值的分类器。然后，您可以对其在不同情况下的性能进行分析。

Furthermore, there are performance measures in multi-class classification. These are adapted versions of the measures above. The top-K error is the probability of the true class label not being in the K estimates with the highest prediction score. Common realizations are the top-1 and top-5 error. ImageNet, for example, usually uses the top-5 error. If you really want to understand what’s going on in multi-class classification, I recommend looking at confusion matrices. Confusion matrices are useful for let’s say 10 to 15 classes. If you have a thousand classes, confusion matrices don’t make any sense anymore. Still, you can gain a lot of understanding of what’s happening if you look at confusion matrices in cases with fewer classes.

此外，在多类别分类中有绩效指标。这些是上述措施的改编版本。前K个错误是真实类别标签不在具有最高预测得分的K个估计中的概率。常见的实现是top-1和top-5错误。例如，ImageNet通常使用top-5错误。如果您真的想了解多类分类中发生的事情，建议您查看混淆矩阵。混淆矩阵对于说10到15个类很有用。如果您有一千个课程，那么混淆矩阵将不再有意义。但是，如果您在类较少的情况下查看混淆矩阵，则可以对发生的事情有很多了解。

Now, sometimes you have very few data. So, in these cases, you may want to choose cross-validation. In k-fold cross-validation, you split your data into k folds, and then you use k — 1 folds as training data and test on Fold k. Then, you repeat it k times. This way, you have seen in the evaluation data all of the data but you trained on independent data because you held it out at the time of training. It’s rather uncommon in deep learning because it implies very long training times. You have to repeat the entire training K times which is really hard if you train for like seven days. If you have sevenfold cross-validation, you know you can do the math, it will take really long. If you use it for hyperparameter estimation, you have to nest it. Don’t perform cross-validation just on all of your data, select the hyperparameters, and then go ahead with the same data in testing. This will give you optimistic results. You should always make sure that if you select parameters you hold out the test data where you want to test on so there’s techniques for nesting cross-validation into cross-validation but then it will also become computationally very expensive so that’s even worse if you want to nest the cross-validation one thing that you have to keep in mind is that the variance of the results is typically underestimated because the training grants are not independent also pay attention that you may introduce additional bias by incorporating the architecture selection and hyperparameter selection so this should be done on different data and it’s very difficult if you’re working with cross-validation even without cross-validation training is a highly stochastic process therefore you may want to retrain your network multiple times with different initializations if you pick random initializations for example and then report the standard deviation just to make sure how well your training actually performs.

现在，有时您的数据很少。因此，在这些情况下，您可能需要选择交叉验证。在k折交叉验证中，您将数据分成k折，然后使用k -1折作为训练数据并在k折上进行测试。然后，您重复k次。这样，您已经在评估数据中看到了所有数据，但是您在独立数据上进行了训练，因为您在训练时就坚持了下来。在深度学习中这很不常见，因为这意味着很长的培训时间。您必须重复整个训练K次，如果您要进行7天的训练，这确实很难。如果您具有七重交叉验证，那么您知道可以进行数学运算，这将花费很长时间。如果将其用于超参数估计，则必须将其嵌套。不要仅对所有数据执行交叉验证，请选择超参数，然后在测试中继续使用相同的数据。这将使您获得乐观的结果。您应该始终确保，如果选择参数，则会在要进行测试的地方保留测试数据，因此存在将交叉验证嵌套到交叉验证中的技术，但是这也会在计算上变得非常昂贵，因此如果您愿意，则更糟要嵌套交叉验证，您要记住的一件事是，结果的方差通常被低估了，因为培训补助金不是独立的，还请注意，您可能会通过合并架构选择和超参数选择而引入额外的偏差，因此这应该在不同的数据上完成，如果您正在使用交叉验证，即使没有交叉验证，这也是非常困难的，训练是一个高度随机的过程，因此，如果您选择随机初始化来进行多次训练，则可能需要使用不同的初始化来多次训练网络。然后报告标准偏差，以确保您的培训实际表现良好 s。

Now, you want to compare different classifiers. The question is: “Is my new method with 91.5% accuracy better than the state-of-the-art with 90.9%?” Of course, training a system is a stochastic process. So, just comparing those two numbers will yield biased results. The actual question that you have to ask is: “Is there a significant difference between the classifiers?” This means that you need to run the training for each method multiple times. Only then you can, for example, use a t-test to see whether the distribution of the results is significantly different (see link section). The t-test compares two normally distributed data sets with equal variance. Then, you can determine that the means are significantly different with respect to a significance level α which is the level of randomness. Quite frequently you find in literature like 5% or 1% significance level. So you have a significant difference if the chance of this observation being random is less than 5% or 1%.

现在，您要比较不同的分类器。问题是：“我的新方法的准确度是否为91.5％，而最新技术的准确度为90.9％？” 当然，训练系统是一个随机过程。因此，仅比较这两个数字将产生偏差结果。您必须问的实际问题是：“分类器之间是否有显着差异？” 这意味着您需要针对每种方法多次运行训练。例如，只有到那时，您才能使用t检验来查看结果的分布是否显着不同(请参阅链接部分)。 t检验比较方差相等的两个正态分布的数据集。然后，您可以确定均值在显着性水平α(即随机性水平)方面存在显着差异。您经常在文学中发现诸如5％或1％的显着性水平。因此，如果此观察结果随机发生的机会小于5％或1％，则您将有很大的不同。

Now be careful if you train multiple models on the same data. If you ask the same data a couple of times, you actually have to correct your significance computation. This is called the Bonferroni correction. If we compare multiple classifiers, this will introduce multiple comparisons and then you have to correct for this. If you had n tests with significance level α, then the total risk is n times α. So, to reach the total significance level of α, the adjusted α’ would be α over n for each individual test. So, the more tests you run on the same data, the more you have to divide by. Of course, this assumes independence between the tests and it’s a kind of pessimistic estimation of significance. But you want to be pessimistic in this case just to make sure that you are not reporting something that has been produced by chance. Just because you test often enough and your testing is a random process, there may be a very good result showing up just by chance. More accurate, but incredibly time-consuming would be permutation tests, and believe me, you probably want to go with the Bonferroni correction instead. Permuting everything will take even longer than the cross-validation approach that we’ve seen previously.

现在，如果您在同一数据上训练多个模型，请当心。如果几次询问相同的数据，则实际上必须更正重要性计算。这称为Bonferroni校正。如果我们比较多个分类器，这将引入多个比较，然后您必须对此进行更正。如果您有n个显着性水平为α的测试，则总风险为n乘以α。因此，为了达到α的总显着性水平，对于每个测试，调整后的α'将为n上的α。因此，在相同数据上运行的测试越多，您需要除以的次数就越多。当然，这假设测试之间是独立的，并且是对重要性的悲观估计。但是在这种情况下，您希望保持悲观态度，以确保您没有报告偶然产生的事情。仅仅因为您经常测试并且测试是一个随机过程，所以偶然出现的结果可能会是一个很好的结果。排列测试会更准确，但非常耗时，请相信我，您可能想改为使用Bonferroni校正。排列所有内容所需的时间甚至比之前看到的交叉验证方法还要长。

Okay so let’s summarize what we’ve seen before: You check your implementation before training, the gradient initialization, monitor the training process continuously, the training, the validation losses, the weights, and the activations. Stick to established architectures before reinventing the wheel. Experiment with little data and keep your evaluation data safe until the evaluation. Decay the learning rate over time. Do a random search, not a grid search for hyperparameters. Perform model ensembling for better performance, and when you check your comparison, of course, you want to go for significance tests to make sure that you are not reporting a random observation.

好吧，让我们总结一下我们之前所看到的内容：您在训练之前检查实现情况，进行梯度初始化，连续监视训练过程，训练，验证损失，权重和激活。重新发明轮子之前，请遵循已建立的体系结构。尝试使用少量数据，并确保评估数据安全直到评估。逐渐降低学习率。进行随机搜索，而不是对超参数进行网格搜索。执行模型集成以获得更好的性能，当然，当您检查比较时，您想进行重要性测试以确保您没有报告随机观察值。

So next time on deep learning, we actually want to look at the evolution of neural network architectures. So from deep networks to even deeper networks. We want to have a look at sparse and dense connections and we’ll introduce a lot of common names, things you hear all over the place, LeNet, GoogLeNet, ResNet, and so on. So we will learn about many interesting state-of-the-art approaches in this next series of lecture videos. So, thank you very much for listening and see you in the next video!

因此，下一次深度学习时，我们实际上想看一下神经网络架构的发展。因此，从深度网络到更深层次的网络。我们想看看稀疏和密集的连接，并且我们将介绍很多通用名称，您在各处听到的东西，LeNet，GoogLeNet，ResNet等。因此，在接下来的系列讲座视频中，我们将学习许多有趣的最新技术。因此，非常感谢您的收听，并在下一个视频中与您相见！

If you liked this post, you can find more essays here, more educational material on Machine Learning here, or have a look at our Deep LearningLecture. I would also appreciate a follow on YouTube, Twitter, Facebook, or LinkedIn in case you want to be informed about more essays, videos, and research in the future. This article is released under the Creative Commons 4.0 Attribution License and can be reprinted and modified if referenced.

如果你喜欢这篇文章，你可以找到这里更多的文章，更多的教育材料，机器学习在这里，或看看我们的深入学习讲座。如果您希望将来了解更多文章，视频和研究信息，也欢迎关注YouTube ， Twitter ， Facebook或LinkedIn 。本文是根据知识共享4.0署名许可发布的，如果引用，可以重新打印和修改。

链接 (Links)

Online t-test

在线t检验
Online test for comparing recognition rates

在线测试以比较识别率
Online test to compare correlations

在线测试以比较相关性

翻译自: https://towardsdatascience.com/common-practices-part-4-70c08fce3588

通用大数据架构-

查看全文

http://www.taodudu.cc/news/show-1874129.html

香草 jboss 工具_使用Tensorflow创建香草神经网络
机器学习深度学习 ai_人工智能，机器学习和深度学习。真正的区别是什么？...
锁公平非公平_推荐引擎也需要公平！
创建dqn的深度神经网络_深度Q网络（DQN）-II
kafka topic:1_Topic️主题建模：超越令牌输出
dask 于数据分析_利用Dask ML框架进行欺诈检测-端到端数据分析
x射线计算机断层成像_医疗保健中的深度学习-X射线成像（第4部分-类不平衡问题）...
r-cnn 行人检测_了解用于对象检测的快速R-CNN和快速R-CNN。
语义分割空间上下文关系_多尺度空间注意的语义分割
自我监督学习和无监督学习_弱和自我监督的学习-第2部分
深度之眼 alexnet_AlexNet带给了深度学习的世界
ai生成图片是什么技术_什么是生成型AI？
ai人工智能可以干什么_我们可以使人工智能更具道德性吗？
pong_计算机视觉与终极Pong AI
linkedin爬虫_这些框架帮助LinkedIn大规模构建了机器学习
词嵌入生成词向量_使用词嵌入创建诗生成器
端到端车道线检测_如何使用Yolov5创建端到端对象检测器？
深度学习检测异常_深度学习用于异常检测：全面调查
自我监督学习和无监督学习_弱和自我监督的学习-第3部分
聊天工具机器人开发_聊天机器人-精致的交流工具？还是您的客户服务团队不可或缺的成员？...
自我监督学习和无监督学习_弱和自我监督的学习-第4部分
ai星际探索爪子_探索AI地牢
循环神经网络递归神经网络_递归神经网络-第5部分
用于小儿肺炎检测的无代码AI
建筑业建筑业大数据行业现状_建筑—第2部分
脸部识别算法_面部识别技术是种族主义者吗？先进算法的解释
ai人工智能对话了_产品制造商如何缓解对话式AI中的偏见
深度神经网络轻量化_正则化对深度神经网络的影响
dbscan js 实现_DBSCAN在PySpark上的实现
深度学习行人检测简介_深度学习简介

通用大数据架构-_通用做法-第4部分相关推荐

通用大数据架构为什么不适合处理物联网数据？
通用大数据架构为什么不适合处理物联网数据? 为处理日益增长的互联网数据,众多的工具开始出现,最流行的应该是Hadoop体系.除使用大家所熟悉的Hadoop组件如HDFS,MapReduce, HBas ...
pb数据窗口怎么调用视图_大数据架构如何做到流批一体？
阿里妹导读:大数据与现有的科技手段结合,对大多数产业而言都能产生巨大的经济及社会价值.这也是当下许多企业,在大数据上深耕的原因.大数据分析场景需要解决哪些技术挑战?目前,有哪些主流大数据架构模式及其发 ...
大数据架构如何做到流批一体？
阿里妹导读:大数据与现有的科技手段结合,对大多数产业而言都能产生巨大的经济及社会价值.这也是当下许多企业,在大数据上深耕的原因.大数据分析场景需要解决哪些技术挑战?目前,有哪些主流大数据架构模式及其发 ...
大数据架构如何做到流批一体？【对于Flink等流批一体的概念做了很好的澄清!】
导读:大数据与现有的科技手段结合,对大多数产业而言都能产生巨大的经济及社会价值.这也是当下许多企业,在大数据上深耕的原因.大数据分析场景需要解决哪些技术挑战?目前,有哪些主流大数据架构模式及其发展?今 ...
大数据知识面试题-通用（2022版）
序列号内容链接 1 大数据知识面试题-通用(2022版) https://blog.csdn.net/qq_43061290/article/details/124819089 2 大数据知识面试 ...
大数据架构详解_【数据如何驱动增长】（3）大数据背景下的数仓建设 amp; 数据分层架构设计...
背景了解数据仓库.数据流架构的搭建原理对于合格的数据分析师或者数据科学家来说是一项必不可少的能力.它不仅能够帮助分析人员更高效的开展分析任务,帮助公司或者业务线搭建一套高效的数据处理架构,更是能够从 ...
Oracle大数据量分页通用存储过程
type refCursorType is REF CURSOR; --游标类型定义,用于返回数据集 /*********************************************** ...
大数据数据收集数据困难_大数据架构、大数据开发与数据分析的区别
是新朋友吗?记得先点蓝字关注我哦- 今日课程菜单 Java全栈开发 | Web前端+H5 大数据开发 | 数据分析人工智能+Python | 人工智能+物联网来源:小职(z_zhizuobiao ...
python架构师工作职责_大数据架构师岗位的工作职责
大数据架构师需要负责深入理解业务需求,对业务建模,设计系统架构,满足业务需求.以下是学习啦小编整理的大数据架构师岗位的工作职责. 大数据架构师岗位的工作职责1 职责: 1.负责数据仓库建设,基于数据驱 ...
python架构师工作职责_大数据架构师工作的岗位职责
大数据架构师负责研发技术发展方向,新技术领域的探索,将新技术应用到公司大数据平台,提升公司效能.下面是学习啦小编整理的大数据架构师工作的岗位职责. 大数据架构师工作的岗位职责1 职责: 1.负责大数据 ...

通用大数据架构-_通用做法-第4部分

有关深层学习的FAU讲义 (FAU LECTURE NOTES ON DEEP LEARNING)

导航 (Navigation)

链接 (Links)

相关文章：

通用大数据架构-_通用做法-第4部分相关推荐

最新文章

热门文章