目录

  • Least-Squares Lines (最小二乘直线)
  • The General Linear Model
  • Least-Squares Fitting of Other Curves
  • Multiple Regression
  • References

Notation

  • For easy application of the discussion to real problems that you may encounter later in your career, we choose notation that is commonly used in the statistical analysis of scientific and engineering data:

    • Instead of Ax=bA\boldsymbol x =\boldsymbol bAx=b, we write Xβ=yX\boldsymbol \beta=\boldsymbol yXβ=y and refer to XXX as the design matrix (设计矩阵), β\boldsymbol \betaβ as the parameter vector (参数向量), and y\boldsymbol yy as the observation vector (参数向量).

Least-Squares Lines (最小二乘直线)

  • The simplest relation between two variables xxx and yyy is the linear equation y=β0+β1xy=\beta_0+\beta_1 xy=β0​+β1​x. Experimental data often produce points (x1,y1),...,(xn,yn)(x_1, y_1),...,(x_n, y_n)(x1​,y1​),...,(xn​,yn​) that, when graphed, seem to lie close to a line. We want to determine the parameters β0\beta_0β0​ and β1\beta_1β1​ that make the line as “close” to the points as possible.
  • Suppose β0\beta_0β0​ and β1\beta_1β1​ are fixed, and consider the line y=β0+β1xy=\beta_0+\beta_1 xy=β0​+β1​x in Figure 1. Corresponding to each data point (xj,yj)(x_j , y_j)(xj​,yj​) there is a point (xj,β0+β1xj)(x_j , \beta_0+\beta_1 x_j)(xj​,β0​+β1​xj​) on the line with the same xxx-coordinate. We call yjy_jyj​ the observedobservedobserved value of yyy and β0+β1xj\beta_0+\beta_1 x_jβ0​+β1​xj​ the predictedpredictedpredicted yyy-value. The difference between an observed yyy-value and a predicted yyy-value is called a residual(余差)residual(余差)residual(余差).
  • There are several ways to measure how “close” the line is to the data. The usual choice (primarily because the mathematical calculations are simple) is to add the squares of the residuals:
    • The least-squares line is the line y=β0+β1xy=\beta_0+\beta_1 xy=β0​+β1​x that minimizes the sum of the squares of the residuals. This line is also called a line of regression of yyy on xxx (yyy 对 xxx 的回归直线), because any errors in the data are assumed to be only in the yyy-coordinates. The coefficients β0,β1\beta_0, \beta_1β0​,β1​ of the line are called (linear) regression coefficients (回归系数).
  • If the data points were on the line, the parameters β0\beta_0β0​ and β1\beta_1β1​ would satisfy the equations
    We can write this system as
    This is a least-squares problem. The square of the distance between the vectors XXX and y\boldsymbol yy is precisely the sum of the squares of the residuals. Computing the least-squares solution of Xβ=yX\boldsymbol \beta=\boldsymbol yXβ=y is equivalent to finding the β\boldsymbol \betaβ that determines the least-squares line in Figure 1.
  • A common practice before computing a least-squares line is to compute the average x‾\overline xx of the original xxx-values and form a new variable x∗=x−x‾x^* = x -\overline xx∗=x−x. The new xxx-data are said to be in mean-deviation form (平均偏差形式). In this case, the two columns of the design matrix will be orthogonal.

EXERCISES 14

Show that the least-squares line for the data (x1,y1),...,(xn,yn)(x_1, y_1),...,(x_n, y_n)(x1​,y1​),...,(xn​,yn​) must pass through (x‾,y‾)(\overline x,\overline y)(x,y​). That is, show that x‾\overline xx and y‾\overline yy​ satisfy the linear equation y‾=β^0+β^1x‾\overline y =\hat\beta_0+\hat\beta_1\overline xy​=β^​0​+β^​1​x.

SOLUTION

  • Derive this equation from the vector equation y=Xβ^+ϵ\boldsymbol y=X\hat\beta +\boldsymbol \epsilony=Xβ^​+ϵ. Denote the first column of XXX by 1\boldsymbol 11. Use the fact that the residual vector ϵ\boldsymbol \epsilonϵ is orthogonal to the column space of XXX and hence is orthogonal to 1\boldsymbol 11. Thus ∑i=1nϵi=0\sum_{i=1}^{n}\epsilon_i=0∑i=1n​ϵi​=0.
    ∵yi=β^0+xiβ^1+ϵi∴∑i=1nyi=nβ^0+β^1∑i=1nxi∴y‾=β^0+β^1x‾\begin{aligned}\because y_i&=\hat\beta_{0}+x_i\hat\beta_{1}+\epsilon_i\\\therefore \sum_{i=1}^{n}y_i&=n\hat\beta_{0}+\hat\beta_{1}\sum_{i=1}^{n}x_i\\\therefore \overline y &=\hat\beta_0+\hat\beta_1\overline x\end{aligned}∵yi​∴i=1∑n​yi​∴y​​=β^​0​+xi​β^​1​+ϵi​=nβ^​0​+β^​1​i=1∑n​xi​=β^​0​+β^​1​x​

  • Given data for a least-squares problem, (x1,y1),...,(xn,yn)(x_1, y_1),...,(x_n, y_n)(x1​,y1​),...,(xn​,yn​), the following abbreviations are helpful:
    ∑x=∑i=1nxi,∑x2=∑i=1nxi2,∑y=∑i=1nyi,∑xy=∑i=1nxiyi\sum x=\sum_{i=1}^{n}x_i,\sum x^2=\sum_{i=1}^{n}x_i^2,\\\sum y=\sum_{i=1}^{n}y_i,\sum xy=\sum_{i=1}^{n}x_iy_i∑x=i=1∑n​xi​,∑x2=i=1∑n​xi2​,∑y=i=1∑n​yi​,∑xy=i=1∑n​xi​yi​
  • The normal equations for a least-squares line y=β^0+β^1xy = \hat\beta_0 +\hat\beta_1xy=β^​0​+β^​1​x is XTXβ=XTyX^TX\boldsymbol \beta=X^T \boldsymbol yXTXβ=XTy.
    ∵XTX=[1TxT][1x]=[n∑x∑x∑x2]\because X^TX=\begin{bmatrix}\boldsymbol 1^T\\\boldsymbol x^T\end{bmatrix}\begin{bmatrix}\boldsymbol 1&\boldsymbol x\end{bmatrix}=\begin{bmatrix}n&\sum x\\\sum x&\sum x^2\end{bmatrix}∵XTX=[1TxT​][1​x​]=[n∑x​∑x∑x2​]The normal equation may be written in the form
    [n∑x∑x∑x2]β^=[1TxT]y=[∑y∑xy]\begin{bmatrix}n&\sum x\\\sum x&\sum x^2\end{bmatrix}\hat\beta=\begin{bmatrix}\boldsymbol 1^T\\\boldsymbol x^T\end{bmatrix}\boldsymbol y=\begin{bmatrix}\sum y\\\sum xy\end{bmatrix}[n∑x​∑x∑x2​]β^​=[1TxT​]y=[∑y∑xy​]∴nβ^0+β^1∑x=∑y,β^0∑x+β^1∑x2=∑xy\therefore n\hat\beta_0+\hat\beta_1\sum x=\sum y\ \ \ \ \ \ ,\ \ \ \ \hat\beta_0\sum x+\hat\beta_1\sum x^2=\sum xy∴nβ^​0​+β^​1​∑x=∑y      ,    β^​0​∑x+β^​1​∑x2=∑xy
  • If XXX has 2 linearly independent columns, then
    β^=[n∑x∑x∑x2]−1[∑y∑xy]=1n∑x2−(∑x)2[∑x2−∑x−∑xn][∑y∑xy]\begin{aligned}\hat\beta&=\begin{bmatrix}n&\sum x\\\sum x&\sum x^2\end{bmatrix}^{-1}\begin{bmatrix}\sum y\\\sum xy\end{bmatrix} \\&=\frac{1}{n\sum x^2-(\sum x)^2}\begin{bmatrix}\sum x^2&-\sum x\\-\sum x&n\end{bmatrix} \begin{bmatrix}\sum y\\\sum xy\end{bmatrix} \end{aligned}β^​​=[n∑x​∑x∑x2​]−1[∑y∑xy​]=n∑x2−(∑x)21​[∑x2−∑x​−∑xn​][∑y∑xy​]​∴β^0=∑x2∑y−∑x∑xyn∑x2−(∑x)2,β^1=n∑xy−∑x∑yn∑x2−(∑x)2\therefore \hat\beta_0=\frac{\sum x^2\sum y-\sum x \sum xy}{n\sum x^2-(\sum x)^2},\hat\beta_1=\frac{n\sum xy-\sum x\sum y}{n\sum x^2-(\sum x)^2}∴β^​0​=n∑x2−(∑x)2∑x2∑y−∑x∑xy​,β^​1​=n∑x2−(∑x)2n∑xy−∑x∑y​

Consider the following numbers.

Every statistics text that discusses regression and the linear model y=Xβ+ϵ\boldsymbol y = X\boldsymbol \beta+\epsilony=Xβ+ϵ introduces these numbers.

  • (i) ∥Xβ^∥2\left\|X\hat\beta\right\|^2∥∥​Xβ^​∥∥​2—the sum of the squares of the “regression term.” Denote this number by SS(R)SS(R)SS(R).
  • (ii) ∥y−Xβ^∥2\left\|\boldsymbol y-X\hat\beta\right\|^2∥∥​y−Xβ^​∥∥​2—the sum of the squares for error term. Denote this number by SS(E)SS(E)SS(E).
  • (iii) ∥y∥2\left\|\boldsymbol y\right\|^2∥y∥2—the “total” sum of the squares of the yyy-values. Denote this number by SS(T)SS(T)SS(T).

EXERCISES 19

Justify the equation SS(T)=SS(R)+SS(E)SS(T) = SS(R) + SS(E)SS(T)=SS(R)+SS(E). This equation is extremely important in statistics, both in regression theory and in the analysis of variance.

SOLUTION

  • This follows from the Pythagorean Theorem (in Section 6.1). Then SS(E)=SS(T)−SS(R)=∥y∥2−∥Xβ^∥2=yTy−β^TXTXβ^=yTy−(β^TXTXβ^+β^TXTϵ)=yTy−β^TXT(Xβ^+ϵ)=yTy−β^TXTy\begin{aligned}SS(E)&=SS(T)-SS(R)\\&= \left\|\boldsymbol y\right\|^2- \left\|X\hat\beta\right\|^2\\&= \boldsymbol y^T\boldsymbol y-\hat\beta^TX^TX\hat\beta\\&=\boldsymbol y^T\boldsymbol y-(\hat\beta^TX^TX\hat\beta+\hat\beta^TX^T\boldsymbol \epsilon)\\&= \boldsymbol y^T\boldsymbol y-\hat\beta^TX^T(X\hat\beta+\boldsymbol \epsilon)\\&=\boldsymbol y^T\boldsymbol y-\hat\beta^TX^T\boldsymbol y\end{aligned}SS(E)​=SS(T)−SS(R)=∥y∥2−∥∥​Xβ^​∥∥​2=yTy−β^​TXTXβ^​=yTy−(β^​TXTXβ^​+β^​TXTϵ)=yTy−β^​TXT(Xβ^​+ϵ)=yTy−β^​TXTy​This is the standard formula for SS(E)SS(E)SS(E).

The General Linear Model

  • In some applications, it is necessary to fit data points with something other than a straight line.

    • In the examples that follow, the matrix equation is still Xβ=yX\boldsymbol \beta=\boldsymbol yXβ=y, but the specific form of XXX changes from one problem to the next.
    • Statisticians usually introduce a residual vector (余差向量) ϵ\boldsymbol\epsilonϵ, defined by ϵ=y−Xβ\boldsymbol\epsilon = \boldsymbol y - X\boldsymbol \betaϵ=y−Xβ, and write
      y=Xβ+ϵ\boldsymbol y = X\boldsymbol \beta+\boldsymbol \epsilony=Xβ+ϵAny equation of this form is referred to as a linear model. Once XXX and y\boldsymbol yy are determined, the goal is to minimize the length of ϵ\boldsymbol \epsilonϵ, which amounts to finding a least-squares solution of Xβ=yX\boldsymbol \beta=\boldsymbol yXβ=y. In each case, the least-squares solution β^\hat\betaβ^​ is a solution of the normal equations
      XTXβ=XTyX^TX\boldsymbol \beta=X^T\boldsymbol yXTXβ=XTy

Least-Squares Fitting of Other Curves

  • The next example shows how to fit data by curves that have the general form
    y=β0f0(x)+β1f1(x)+...+βkfk(x)(2)y=\beta_0f_0(x)+\beta_1f_1(x)+...+\beta_kf_k(x)\ \ \ \ \ (2)y=β0​f0​(x)+β1​f1​(x)+...+βk​fk​(x)     (2)where f0,...,fkf_0,..., f_kf0​,...,fk​ are known functions and β0,...,βk\beta_0,...,\beta_kβ0​,...,βk​ are parameters that must be determined.
  • As we will see, equation (2) describes a linear model because it is linear in the unknown parameters.

EXAMPLE 2

Suppose we wish to approximate the data by an equation of the form
y=β0+β1x+β2x2(3)y=\beta_0+\beta_1x+\beta_2x^2\ \ \ \ \ (3)y=β0​+β1​x+β2​x2     (3)Describe the linear model that produces a “least-squares fit” of the data by equation (3).

SOLUTION

  • The design matrix above is a Vandermonde matrix (范德蒙德矩阵). Example 4 in Section 2.1 and Theorem 14 in Section 6.5 shows that if at least 333 of the values x1,…,xnx_1, …, x_nx1​,…,xn​ are distinct, then the least-squares solution KaTeX parse error: Unexpected end of input in a macro argument, expected '}' at position 5: \hat\̲b̲o̲l̲d̲s̲y̲m̲b̲o̲l̲ ̲\beta will be unique.

Multiple Regression

多重回归

  • Suppose an experiment involves two independent variables(独立变量)—say, uuu and vvv—and one dependent variable(相关变量), yyy. A simple equation for predicting yyy from uuu and vvv has the form
    y=β0+β1u+β2v(4)y =\beta_0 +\beta_1u +\beta_2v\ \ \ \ \ (4)y=β0​+β1​u+β2​v     (4)A more general prediction equation might have the form
    y=β0+β1u+β2v+β3u2+β4uv+β5v2(5)y =\beta_0 +\beta_1u +\beta_2v+\beta_3u^2 +\beta_4uv+\beta_5v^2\ \ \ \ \ (5)y=β0​+β1​u+β2​v+β3​u2+β4​uv+β5​v2     (5)
  • Equations (4) and (5) both lead to a linear model because they are linear in the unknown parameters (even though uuu and vvv are multiplied). In general, a linear model will arise whenever yyy is to be predicted by an equation of the form
    y0=β0f0(u,v)+β1f1(u,v)+...+βkfk(u,v)y_0=\beta_0f_0(u, v)+\beta_1f_1(u, v)+...+\beta_kf_k(u, v)y0​=β0​f0​(u,v)+β1​f1​(u,v)+...+βk​fk​(u,v)

References

  • LinearLinearLinear algebraalgebraalgebra andandand itsitsits applicationsapplicationsapplications

Chapter 6 (Orthogonality and Least Squares): Applications to linear models相关推荐

  1. Chapter 3 (Determinants): Cramer‘s rule, volume, and linear transformations (克拉默法则、体积和线性变换)

    本文为<Linear algebra and its applications>的读书笔记 目录 Cramer's Rule Application to Engineering A Fo ...

  2. Chapter 1 (Linear Equations in Linear Algebra): Row reduction and echelon forms (行化简与阶梯式矩阵)

    本文为<Linear algebra and its applications>的读书笔记 目录 Definition Uniqueness of the Reduced Echelon ...

  3. ML (Chapter 3): 线性模型

    目录 基本形式 线性回归 (linear regression) 单变量线性回归 多变量线性回归 (multivariate linear regression) 广义线性模型 (generalize ...

  4. 牛人整理的统计学教材

    转载自:http://bbs.pinggu.org/forum.php?mod=viewthread&tid=1002838&ctid=104 在国外的论坛看到有人发表,免费转来分享给 ...

  5. 限制性立方样条(Restricted Cubic Spline)

    限制性立方样条(Restricted Cubic Spline) 一.背景 (一)什么是线性 (Linearity) ? (二)为什么要做线性假设? (三)如何面对非线性的难题? (四)多项式回归 二 ...

  6. Topic 12 临床预测模型之列线表 (Nomogram)

    点击关注,桓峰基因 在临床上列线表已经占据大样本临床研究的半壁江山,非常流行,这个简单的回归模型结合临床上大规模的研究数据,发一篇10+还是非常轻松的! 前言 线图(Alignment Diagram ...

  7. Topic 14. 临床预测模型之校准曲线 (Calibration curve)

    点击关注,桓峰基因 桓峰基因 生物信息分析,SCI文章撰写及生物信息基础知识学习:R语言学习,perl基础编程,linux系统命令,Python遇见更好的你 57篇原创内容 公众号 前言 Calibr ...

  8. statsmodels.tsa.stattools.adfuller()结构及用法详解

    statsmodels是一个Python模块,提供了大量统计模型的类和函数.主要功能有: regression: Generalized least squares (including weight ...

  9. R语言限制性立方样条回归

    前面用了2篇推文,帮大家梳理了从线性拟合到非线性拟合的常用方法,包括多项式回归.分段回归.样条回归.限制性立方样条回归,以及它们之间的区别和联系,详情请看: 多项式回归和样条回归1 多项式回归和样条回 ...

最新文章

  1. 唐山一个葬礼上的豪华车队
  2. 在web html页面中,打印、预览当前页面
  3. QT5对话框的中文字符串【乱码】 (error: C2001: 常量中有换行符)
  4. mysql环境搭载后老出错_使用Docker在window10下搭建SWOFT开发环境,mysql连接错误
  5. js+ asp.Net ajax开发163邮箱效果(列表底色、多选拖动等)--checkBox多选
  6. 所有的iPhone设备cell的宽度都是320,解决办法是?
  7. 架构师必须补充的能力
  8. 频繁gc是什么意思_[JVM]一次线上频繁GC的问题解决
  9. oracle 用多个常量表示某个字段的值
  10. 【突发】解决remote: Support for password authentication was removed on August 13, 2021. Please use a perso
  11. Android用浏览器打开pdf文件
  12. Javascript中的prototype是什么
  13. ios java模拟器 2017_Visual Studio 2017(Xamarin)未显示iPhone模拟器列表
  14. Pillow的下载与安装
  15. C++中自带的二分查找函数
  16. Unity Android 加载Sprite
  17. 资源分享——Java实现的密码生成器
  18. 跟着老万学linux运维-vi编辑器中的大小写转换技巧
  19. BufferedImage缩小图片大小
  20. 运用Xmap将xml数据转换成javabean

热门文章

  1. 伟大的数学家J.Keisler近照
  2. ignite究竟是个啥玩意儿?可能是目前为止较好理解的解释了吧
  3. 北大计算机学霸,国内大学“最牛的”一个班级,门槛极高,一般学霸根本进不去...
  4. 转:瑞·达利欧:世界上最懂选择的人,怎么做决定?
  5. redis存储java对象
  6. 【Redis】redis如何保存对象
  7. [转]怎么选择笔记本电脑?笔记本电脑参数指标!笔记本电脑全攻略
  8. Python学习26:个人所得税计算器
  9. 【报错篇】singularmatrix报错:解决方案
  10. Mac+Typora颜色快捷键设置