以下英文文档皆出自课程配套笔记

课9 代价函数二

这一课时考虑使用两个参数来描述代价函数。此时等价函数是一个碗形,碗底点为最小值,将碗形用等高线表示,等高线中心就是代价函数的最小值。所以距离等高线中心较近的点所对应的( θ0, θ1),能够较准确的拟合出原图像。

Cost Function - Intuition II
A contour plot is a graph that contains many contour lines. A contour line of a two variable
function has a constant value at all points of the same line. An example of such a graph is the

one to the right below.

Taking any color and going along the 'circle', one would expect to get the same value of the

cost function. For example, the three green points found on the green line above have the
same value for  J ( θ 0, θ 1) and as a result, they are found along the same line. The circled x
displays the value of the cost function for the graph on the left when  θ 0 = 800 and  θ 1= -0.15.

Taking another h(x) and plotting its contour plot, one gets the following graphs:

When  θ 0 = 360 and  θ 1 = 0, the value of  J ( θ 0, θ 1) in the contour plot gets closer to the center
thus reducing the cost function error. Now giving our hypothesis function a slightly positive

slope results in a better fit of the data.

The graph above minimizes the cost function as much as possible and consequently, the
result of  θ 1 and  θ 0 tend to be around 0.12 and 250 respectively. Plotting those values on our

graph to the right seems to put our point in the center of the inner most 'circle'.

课10 梯度下降

Gradient Descent

用于求出假设函数的参数。

So we have our hypothesis function and we have a way of measuring how well it fits into the data.Now we need to estimate the parameters in the hypothesis function.That's where gradient descent comes in.

Imagine that we graph our hypothesis function based on its fields  θ 0 and  θ 1 (actually we are
graphing the cost function as a function of the parameter estimates). We are not graphing x
and y itself, but the parameter range of our hypothesis function and the cost resulting from
selecting a particular set of parameters.
We put  θ 0 on the x axis and  θ 1 on the y axis, with the cost function on the vertical z axis. The
points on our graph will be the result of the cost function using our hypothesis with those

specific theta parameters. The graph below depicts such a setup.

We will know that we have succeeded when our cost function is at the very bottom of the pits
in our graph, i.e. when its value is the minimum. The red arrows show the minimum points in
the graph.
The way we do this is by taking the derivative (the tangential line to a function) of our cost
function. The slope of the tangent is the derivative at that point and it will give us a direction
to move towards. We make steps down the cost function in the direction with the steepest
descent. The size of each step is determined by the parameter α, which is called the learning
rate.
For example, the distance between each 'star' in the graph above represents a step
determined by our parameter α. A smaller α would result in a smaller step and a larger α
results in a larger step. The direction in which the step is taken is determined by the partial
derivative of  J ( θ 0, θ 1). Depending on where one starts on the graph, one could end up at
different points. The image above shows us two different starting points that end up in two
different places.
The gradient descent algorithm is:
repeat until convergence:

θj := θj − α ∂∂ θjJ ( θ 0, θ 1)

where
j=0,1 represents the feature index number.
At each iteration(迭代) j, one should simultaneously update the parameters  θ 1, θ 2,..., θn . Updating a
specific parameter prior to calculating another one on the  j ( th ) iteration would yield to a

wrong implementation.

注意同时更新两个参数

课11 梯度下降知识点总结

化简为一个参数,偏导数变为导数。展示了从最小点两边向最小点趋近的数学过程。

Gradient Descent Intuition
In this video we explored the scenario where we used one parameter  θ 1 and plotted its
cost function to implement a gradient descent. Our formula for a single parameter was :

Repeat until convergence:

θ 1 :=θ 1 −α ddθ1 J(θ 1 )

Regardless of the slope's sign for  ddθ1 J(θ 1 ) ,  θ 1 eventually converges to its minimum

value. The following graph shows that when the slope is negative, the value of  θ 1 increases and when it is positive, the value of  θ 1 decreases.

α是用来调节下降的“步伐”。

On a side note, we should adjust our parameter  α to ensure that the gradient descent

algorithm converges in a reasonable time. Failure to converge or too much time to obtain

the minimum value imply that our step size is wrong.

How does gradient descent converge with a fixed step size  α ?
The intuition behind the convergence is that  ddθ1 J(θ 1 ) approaches 0 as we approach the

bottom of our convex function. At the minimum, the derivative will always be 0 and thus

we get:

θ 1 :=θ 1 −α∗0      已经在最小点时,θ 1值不再发生变化。

当接近最小点时,下降的趋势会自动变小。因为导数逐渐趋向于0。

课12 线性回归的梯度下降

将梯度下降和代价函数结合得到线性回归的梯度下降算法。

Gradient Descent For Linear Regression

When specifically applied to the case of linear regression, a new form of the gradient descent
equation can be derived. We can substitute our actual cost function and our actual hypothesis
function and modify the equation to :

这些文档出现的x下标j,我认为是指代两种可能,一是各个横坐标,j=1。二是常数1,j=0。

上式可以推导出来

用懊悔法学习吴恩达机器学习【2】-----线性回归的梯度下降相关推荐

  1. 用懊悔法学习吴恩达机器学习【1】

    我比较适合这个 以下英文文档皆出自课程配套笔记 章节一 课3 Supervised Learning Supervised Learning In supervised learning, we ar ...

  2. 吴恩达机器学习(五)梯度下降

    文章目录 1.梯度下降 2.只有一个参数的最小化函数 1.梯度下降 梯度下降是很常用的算法,它不仅被用在线性回归上,还被广泛应用于机器学习的众多领域.我们将使用梯度下降法最小化其他函数,而不仅仅是最小 ...

  3. 用Python学习吴恩达机器学习——梯度下降算法理论篇

    开篇词:(CSDN专供) 欢迎阅读我的文章,本文起先是在B站上进行投稿,一开始是采用吴恩达机器学习2012年版的,目前已经出了十二期了.现在我决定将我摸索出来的学习笔记在各个平台上进行连载,保证同时更 ...

  4. 吴恩达机器学习 -- 多变量线性回归

    5.1 多维特征 前一周所讲是单变量线性回归,即 ,是只有一个变量 的假设函数,现在对房价预测模型有了更多的参考特征,比如楼层数,卧室的数量,还有房子的使用年限.根据这些特征来预测房价.此时的变量有多 ...

  5. 吴恩达-机器学习-多元线性回归模型代码

    吴恩达<机器学习>2022版 第一节第二周 多元线性回归 房价预测简单实现         以下以下共两个实验,都是通过调用sklearn函数,分别实现了 一元线性回归和多元线性回归的房价 ...

  6. 吴恩达-机器学习-一元线性回归模型实现

    吴恩达<机器学习>2022版 第一周 一元线性回归 房价预测简单实现 import numpy as np import math, copy#输入数据 x_train = np.arra ...

  7. 【学习笔记】吴恩达机器学习 WEEK2 线性回归 Octave教程

    Multivariate Linear Regression Multiple Features Xj(i)X_j^{(i)}Xj(i)​ 其中j表示迭代次数,i表示矩阵索引 转换 原来:hθ(x)= ...

  8. 吴恩达机器学习--单变量线性回归【学习笔记】

    说明:本文是本人用于记录学习吴恩达机器学习的学习笔记,如有不对之处请多多包涵. 作者:爱做梦的90后 一.模型的描述: 下面的这张图片是对于课程中一些符号的基本的定义: 吴老师列举的最简单的单变量线性 ...

  9. 吴恩达|机器学习作业目录

    一个多月除了上课自习,就是在coursera和网易云课堂上学习吴恩达机器学习的课程,现在学完了,以下是一个多月来完成的作业目录,整理一下以便使用: 1.0 单变量线性回归 1.1 多变量线性回归 2. ...

最新文章

  1. 阿里平头哥会和AMD一样成为令英特尔头痛的存在吗?
  2. mysql+keepalived搭建高可用环境
  3. DE连接,创建数据集,数据导入以及数据导出(转载)
  4. 卷积神经网络的结构_射击训练:卷积神经网络识别解剖结构标志位点
  5. iostat命令详解_对iostat输出结果的理解
  6. Spark SQL将rdd转换为数据集-反射来推断Inferring the Schema Using Reflection
  7. ffmpeg之YUV420P转RGB24
  8. 为什么要自定义ClassLoader进行类加载
  9. linux及windows文件共享
  10. 工具类:获取两个经纬度的距离(米)
  11. 灵魂有香气的女子李筱懿|讲述女性自我成长的重要性
  12. Cesium离线地图极简教程
  13. windows影音制作工具安装教程
  14. 机器学习、数据建模、数据挖掘分析 特征无量纲化的常见操作方法
  15. 简述TCP三次握手,看不懂算我输!
  16. 【Python游戏】用Python基于centernet在win10平台开发,射击游戏 | 附带源码
  17. 使用selenium爬取fofa中的网站链接
  18. android报错必须64位,64位系统使用Android虚拟机问题
  19. 机器学习:sklearn实现心脏病预测
  20. python做地图导航_【python】地图做图问题

热门文章

  1. java开心消消乐代码_Java小项目之:教你做个开心消消乐!
  2. 太阳辐照度的测量——基于51单片机
  3. 七月算法机器学习笔记1 微积分与概率论
  4. 如何在3个月内彻底改变自己的形象?
  5. 经常说的 CPU 上下文切换是什么意思?(下)
  6. 实例讨论数据可视化的配色思路
  7. java点餐系统实验报告_JAVA课程实践报告 基于web的点餐系统毕业设计.doc
  8. (6CBIR模拟问题)自己动手,编写神经网络程序,解决Mnist问题,并网络化部署...
  9. Docker删除指定镜像
  10. 数据结构初阶:二叉树