For easy application of the discussion to real problems that you may encounter later in your career, we choose notation that is commonly used in the statistical analysis of scientific and engineering data:
- Instead of Ax=bA\boldsymbol x =\boldsymbol bAx=b, we write Xβ=yX\boldsymbol \beta=\boldsymbol yXβ=y and refer to XXX as the design matrix (设计矩阵), β\boldsymbol \betaβ as the parameter vector (参数向量), and y\boldsymbol yy as the observation vector (参数向量).

Least-Squares Lines (最小二乘直线)

The simplest relation between two variables xxx and yyy is the linear equation y=β0+β1xy=\beta_0+\beta_1 xy=β0+β1x. Experimental data often produce points (x1,y1),...,(xn,yn)(x_1, y_1),...,(x_n, y_n)(x1,y1),...,(xn,yn) that, when graphed, seem to lie close to a line. We want to determine the parameters β0\beta_0β0 and β1\beta_1β1 that make the line as “close” to the points as possible.
Suppose β0\beta_0β0 and β1\beta_1β1 are fixed, and consider the line y=β0+β1xy=\beta_0+\beta_1 xy=β0+β1x in Figure 1. Corresponding to each data point (xj,yj)(x_j , y_j)(xj,yj) there is a point (xj,β0+β1xj)(x_j , \beta_0+\beta_1 x_j)(xj,β0+β1xj) on the line with the same xxx-coordinate. We call yjy_jyj the observedobservedobserved value of yyy and β0+β1xj\beta_0+\beta_1 x_jβ0+β1xj the predictedpredictedpredicted yyy-value. The difference between an observed yyy-value and a predicted yyy-value is called a residual(余差)residual(余差)residual(余差).
There are several ways to measure how “close” the line is to the data. The usual choice (primarily because the mathematical calculations are simple) is to add the squares of the residuals:
- The least-squares line is the line y=β0+β1xy=\beta_0+\beta_1 xy=β0+β1x that minimizes the sum of the squares of the residuals. This line is also called a line of regression of yyy on xxx (yyy 对 xxx 的回归直线), because any errors in the data are assumed to be only in the yyy-coordinates. The coefficients β0,β1\beta_0, \beta_1β0,β1 of the line are called (linear) regression coefficients (回归系数).
If the data points were on the line, the parameters β0\beta_0β0 and β1\beta_1β1 would satisfy the equations
We can write this system as
This is a least-squares problem. The square of the distance between the vectors XXX and y\boldsymbol yy is precisely the sum of the squares of the residuals. Computing the least-squares solution of Xβ=yX\boldsymbol \beta=\boldsymbol yXβ=y is equivalent to finding the β\boldsymbol \betaβ that determines the least-squares line in Figure 1.
A common practice before computing a least-squares line is to compute the average x‾\overline xx of the original xxx-values and form a new variable x∗=x−x‾x^* = x -\overline xx∗=x−x. The new xxx-data are said to be in mean-deviation form (平均偏差形式). In this case, the two columns of the design matrix will be orthogonal.

EXERCISES 14

Show that the least-squares line for the data (x1,y1),...,(xn,yn)(x_1, y_1),...,(x_n, y_n)(x1,y1),...,(xn,yn) must pass through (x‾,y‾)(\overline x,\overline y)(x,y). That is, show that x‾\overline xx and y‾\overline yy satisfy the linear equation y‾=β^0+β^1x‾\overline y =\hat\beta_0+\hat\beta_1\overline xy=β^0+β^1x.

SOLUTION

Derive this equation from the vector equation y=Xβ^+ϵ\boldsymbol y=X\hat\beta +\boldsymbol \epsilony=Xβ^+ϵ. Denote the first column of XXX by 1\boldsymbol 11. Use the fact that the residual vector ϵ\boldsymbol \epsilonϵ is orthogonal to the column space of XXX and hence is orthogonal to 1\boldsymbol 11. Thus ∑i=1nϵi=0\sum_{i=1}^{n}\epsilon_i=0∑i=1nϵi=0.
∵yi=β^0+xiβ^1+ϵi∴∑i=1nyi=nβ^0+β^1∑i=1nxi∴y‾=β^0+β^1x‾\begin{aligned}\because y_i&=\hat\beta_{0}+x_i\hat\beta_{1}+\epsilon_i\\\therefore \sum_{i=1}^{n}y_i&=n\hat\beta_{0}+\hat\beta_{1}\sum_{i=1}^{n}x_i\\\therefore \overline y &=\hat\beta_0+\hat\beta_1\overline x\end{aligned}∵yi∴i=1∑nyi∴y=β^0+xiβ^1+ϵi=nβ^0+β^1i=1∑nxi=β^0+β^1x

Given data for a least-squares problem, (x1,y1),...,(xn,yn)(x_1, y_1),...,(x_n, y_n)(x1,y1),...,(xn,yn), the following abbreviations are helpful:
∑x=∑i=1nxi,∑x2=∑i=1nxi2,∑y=∑i=1nyi,∑xy=∑i=1nxiyi\sum x=\sum_{i=1}^{n}x_i,\sum x^2=\sum_{i=1}^{n}x_i^2,\\\sum y=\sum_{i=1}^{n}y_i,\sum xy=\sum_{i=1}^{n}x_iy_i∑x=i=1∑nxi,∑x2=i=1∑nxi2,∑y=i=1∑nyi,∑xy=i=1∑nxiyi
The normal equations for a least-squares line y=β^0+β^1xy = \hat\beta_0 +\hat\beta_1xy=β^0+β^1x is XTXβ=XTyX^TX\boldsymbol \beta=X^T \boldsymbol yXTXβ=XTy.
∵XTX=[1TxT][1x]=[n∑x∑x∑x2]\because X^TX=\begin{bmatrix}\boldsymbol 1^T\\\boldsymbol x^T\end{bmatrix}\begin{bmatrix}\boldsymbol 1&\boldsymbol x\end{bmatrix}=\begin{bmatrix}n&\sum x\\\sum x&\sum x^2\end{bmatrix}∵XTX=[1TxT][1x]=[n∑x∑x∑x2]The normal equation may be written in the form
[n∑x∑x∑x2]β^=[1TxT]y=[∑y∑xy]\begin{bmatrix}n&\sum x\\\sum x&\sum x^2\end{bmatrix}\hat\beta=\begin{bmatrix}\boldsymbol 1^T\\\boldsymbol x^T\end{bmatrix}\boldsymbol y=\begin{bmatrix}\sum y\\\sum xy\end{bmatrix}[n∑x∑x∑x2]β^=[1TxT]y=[∑y∑xy]∴nβ^0+β^1∑x=∑y,β^0∑x+β^1∑x2=∑xy\therefore n\hat\beta_0+\hat\beta_1\sum x=\sum y\ \ \ \ \ \ ,\ \ \ \ \hat\beta_0\sum x+\hat\beta_1\sum x^2=\sum xy∴nβ^0+β^1∑x=∑y , β^0∑x+β^1∑x2=∑xy
If XXX has 2 linearly independent columns, then
β^=[n∑x∑x∑x2]−1[∑y∑xy]=1n∑x2−(∑x)2[∑x2−∑x−∑xn][∑y∑xy]\begin{aligned}\hat\beta&=\begin{bmatrix}n&\sum x\\\sum x&\sum x^2\end{bmatrix}^{-1}\begin{bmatrix}\sum y\\\sum xy\end{bmatrix} \\&=\frac{1}{n\sum x^2-(\sum x)^2}\begin{bmatrix}\sum x^2&-\sum x\\-\sum x&n\end{bmatrix} \begin{bmatrix}\sum y\\\sum xy\end{bmatrix} \end{aligned}β^=[n∑x∑x∑x2]−1[∑y∑xy]=n∑x2−(∑x)21[∑x2−∑x−∑xn][∑y∑xy]∴β^0=∑x2∑y−∑x∑xyn∑x2−(∑x)2,β^1=n∑xy−∑x∑yn∑x2−(∑x)2\therefore \hat\beta_0=\frac{\sum x^2\sum y-\sum x \sum xy}{n\sum x^2-(\sum x)^2},\hat\beta_1=\frac{n\sum xy-\sum x\sum y}{n\sum x^2-(\sum x)^2}∴β^0=n∑x2−(∑x)2∑x2∑y−∑x∑xy,β^1=n∑x2−(∑x)2n∑xy−∑x∑y

Consider the following numbers.

Every statistics text that discusses regression and the linear model y=Xβ+ϵ\boldsymbol y = X\boldsymbol \beta+\epsilony=Xβ+ϵ introduces these numbers.

(i) ∥Xβ^∥2\left\|X\hat\beta\right\|^2∥∥Xβ^∥∥2—the sum of the squares of the “regression term.” Denote this number by SS(R)SS(R)SS(R).
(ii) ∥y−Xβ^∥2\left\|\boldsymbol y-X\hat\beta\right\|^2∥∥y−Xβ^∥∥2—the sum of the squares for error term. Denote this number by SS(E)SS(E)SS(E).
(iii) ∥y∥2\left\|\boldsymbol y\right\|^2∥y∥2—the “total” sum of the squares of the yyy-values. Denote this number by SS(T)SS(T)SS(T).

EXERCISES 19

Justify the equation SS(T)=SS(R)+SS(E)SS(T) = SS(R) + SS(E)SS(T)=SS(R)+SS(E). This equation is extremely important in statistics, both in regression theory and in the analysis of variance.

SOLUTION

This follows from the Pythagorean Theorem (in Section 6.1). Then SS(E)=SS(T)−SS(R)=∥y∥2−∥Xβ^∥2=yTy−β^TXTXβ^=yTy−(β^TXTXβ^+β^TXTϵ)=yTy−β^TXT(Xβ^+ϵ)=yTy−β^TXTy\begin{aligned}SS(E)&=SS(T)-SS(R)\\&= \left\|\boldsymbol y\right\|^2- \left\|X\hat\beta\right\|^2\\&= \boldsymbol y^T\boldsymbol y-\hat\beta^TX^TX\hat\beta\\&=\boldsymbol y^T\boldsymbol y-(\hat\beta^TX^TX\hat\beta+\hat\beta^TX^T\boldsymbol \epsilon)\\&= \boldsymbol y^T\boldsymbol y-\hat\beta^TX^T(X\hat\beta+\boldsymbol \epsilon)\\&=\boldsymbol y^T\boldsymbol y-\hat\beta^TX^T\boldsymbol y\end{aligned}SS(E)=SS(T)−SS(R)=∥y∥2−∥∥Xβ^∥∥2=yTy−β^TXTXβ^=yTy−(β^TXTXβ^+β^TXTϵ)=yTy−β^TXT(Xβ^+ϵ)=yTy−β^TXTyThis is the standard formula for SS(E)SS(E)SS(E).

The General Linear Model

In some applications, it is necessary to fit data points with something other than a straight line.
- In the examples that follow, the matrix equation is still Xβ=yX\boldsymbol \beta=\boldsymbol yXβ=y, but the specific form of XXX changes from one problem to the next.
- Statisticians usually introduce a residual vector (余差向量) ϵ\boldsymbol\epsilonϵ, defined by ϵ=y−Xβ\boldsymbol\epsilon = \boldsymbol y - X\boldsymbol \betaϵ=y−Xβ, and write
  y=Xβ+ϵ\boldsymbol y = X\boldsymbol \beta+\boldsymbol \epsilony=Xβ+ϵAny equation of this form is referred to as a linear model. Once XXX and y\boldsymbol yy are determined, the goal is to minimize the length of ϵ\boldsymbol \epsilonϵ, which amounts to finding a least-squares solution of Xβ=yX\boldsymbol \beta=\boldsymbol yXβ=y. In each case, the least-squares solution β^\hat\betaβ^ is a solution of the normal equations
  XTXβ=XTyX^TX\boldsymbol \beta=X^T\boldsymbol yXTXβ=XTy

Least-Squares Fitting of Other Curves

The next example shows how to fit data by curves that have the general form
y=β0f0(x)+β1f1(x)+...+βkfk(x)(2)y=\beta_0f_0(x)+\beta_1f_1(x)+...+\beta_kf_k(x)\ \ \ \ \ (2)y=β0f0(x)+β1f1(x)+...+βkfk(x) (2)where f0,...,fkf_0,..., f_kf0,...,fk are known functions and β0,...,βk\beta_0,...,\beta_kβ0,...,βk are parameters that must be determined.
As we will see, equation (2) describes a linear model because it is linear in the unknown parameters.

EXAMPLE 2

Suppose we wish to approximate the data by an equation of the form
y=β0+β1x+β2x2(3)y=\beta_0+\beta_1x+\beta_2x^2\ \ \ \ \ (3)y=β0+β1x+β2x2 (3)Describe the linear model that produces a “least-squares fit” of the data by equation (3).

SOLUTION

The design matrix above is a Vandermonde matrix (范德蒙德矩阵). Example 4 in Section 2.1 and Theorem 14 in Section 6.5 shows that if at least 333 of the values x1,…,xnx_1, …, x_nx1,…,xn are distinct, then the least-squares solution KaTeX parse error: Unexpected end of input in a macro argument, expected '}' at position 5: \hat\̲b̲o̲l̲d̲s̲y̲m̲b̲o̲l̲ ̲\beta will be unique.

Multiple Regression

多重回归

Suppose an experiment involves two independent variables(独立变量)—say, uuu and vvv—and one dependent variable(相关变量), yyy. A simple equation for predicting yyy from uuu and vvv has the form
y=β0+β1u+β2v(4)y =\beta_0 +\beta_1u +\beta_2v\ \ \ \ \ (4)y=β0+β1u+β2v (4)A more general prediction equation might have the form
y=β0+β1u+β2v+β3u2+β4uv+β5v2(5)y =\beta_0 +\beta_1u +\beta_2v+\beta_3u^2 +\beta_4uv+\beta_5v^2\ \ \ \ \ (5)y=β0+β1u+β2v+β3u2+β4uv+β5v2 (5)
Equations (4) and (5) both lead to a linear model because they are linear in the unknown parameters (even though uuu and vvv are multiplied). In general, a linear model will arise whenever yyy is to be predicted by an equation of the form
y0=β0f0(u,v)+β1f1(u,v)+...+βkfk(u,v)y_0=\beta_0f_0(u, v)+\beta_1f_1(u, v)+...+\beta_kf_k(u, v)y0=β0f0(u,v)+β1f1(u,v)+...+βkfk(u,v)

References

LinearLinearLinear algebraalgebraalgebra andandand itsitsits applicationsapplicationsapplications

Chapter 6 (Orthogonality and Least Squares): Applications to linear models相关推荐

Chapter 3 (Determinants): Cramer‘s rule, volume, and linear transformations (克拉默法则、体积和线性变换)
本文为<Linear algebra and its applications>的读书笔记目录 Cramer's Rule Application to Engineering A Fo ...
Chapter 1 (Linear Equations in Linear Algebra): Row reduction and echelon forms (行化简与阶梯式矩阵)
本文为<Linear algebra and its applications>的读书笔记目录 Definition Uniqueness of the Reduced Echelon ...
ML (Chapter 3): 线性模型
目录基本形式线性回归 (linear regression) 单变量线性回归多变量线性回归 (multivariate linear regression) 广义线性模型 (generalize ...
牛人整理的统计学教材
转载自:http://bbs.pinggu.org/forum.php?mod=viewthread&tid=1002838&ctid=104 在国外的论坛看到有人发表,免费转来分享给 ...
限制性立方样条（Restricted Cubic Spline）
限制性立方样条(Restricted Cubic Spline) 一.背景 (一)什么是线性 (Linearity) ? (二)为什么要做线性假设? (三)如何面对非线性的难题? (四)多项式回归二 ...
Topic 12 临床预测模型之列线表 (Nomogram)
点击关注,桓峰基因在临床上列线表已经占据大样本临床研究的半壁江山,非常流行,这个简单的回归模型结合临床上大规模的研究数据,发一篇10+还是非常轻松的! 前言线图(Alignment Diagram ...
Topic 14. 临床预测模型之校准曲线 (Calibration curve)
点击关注,桓峰基因桓峰基因生物信息分析,SCI文章撰写及生物信息基础知识学习:R语言学习,perl基础编程,linux系统命令,Python遇见更好的你 57篇原创内容公众号前言 Calibr ...
statsmodels.tsa.stattools.adfuller()结构及用法详解
statsmodels是一个Python模块,提供了大量统计模型的类和函数.主要功能有: regression: Generalized least squares (including weight ...
R语言限制性立方样条回归
前面用了2篇推文,帮大家梳理了从线性拟合到非线性拟合的常用方法,包括多项式回归.分段回归.样条回归.限制性立方样条回归,以及它们之间的区别和联系,详情请看: 多项式回归和样条回归1 多项式回归和样条回 ...

Chapter 6 (Orthogonality and Least Squares): Applications to linear models

目录

Least-Squares Lines (最小二乘直线)

The General Linear Model

Least-Squares Fitting of Other Curves

Multiple Regression

References

Chapter 6 (Orthogonality and Least Squares): Applications to linear models相关推荐

最新文章

热门文章