How were these line fitting equations derived?

In summary, the formulas for finding the slope and y-intercept of a line that best fits a set of (x,y) data pairs can be derived by setting up a matrix equation and using the pseudoinverse to find the least-squares solution. This involves finding the solution vector by dropping a perpendicular from the data points to the two-dimensional subspace of the matrix equation. This process can also be applied to finding the best fit for a parabola, cubic, etc. by using a different matrix equation.
  • #1
samh
46
0
I've run into these formulas many times but I've never seen how they were derived.

Say you want to fit a line to some data. Your data is a bunch of (xi,yi) pairs.

Through some kind of magic, the slope m and the y-intercept b are given by these formulas:

http://img262.imageshack.us/img262/9250/eb49c97d171bcd52e220911.png
http://img407.imageshack.us/img407/5681/bdea06a3287d543f035cac4.png

My attempt at deriving them:

I set up a matrix equation Ax=b and use the pseudoinverse to find the least-squares solution: x = inv(A'*A)*A'*b where A' is A transpose.

Annnnnd now I'm stuck. :) How would you go from that to those two formulas? Or maybe I'm headed in the wrong direction...
 
Last edited by a moderator:
Mathematics news on Phys.org
  • #2
Finding a, b so that y= ax+ b gives the "least squares" best fit to the set of points [itex](x_i, y_i)[/itex] means "solving" the equations [itex]y_i= ax_i+ b[/itex] for all i. That is equivalent to the matrix equation Ax= y or
[tex]\begin{bmatrix}x_1 & 1 \\ x_2 & 1 \\ \cdot & \cdot \\ \cdot & \cdot \\ \cdot & \cdot \\ x_n & 1\end{bmatrix}\begin{bmatrix} a \\ b\end{bmatrix}= \begin{bmatrix} y_1 \\ y_2 \\ \cdot \\ \cdot \\ \cdot \\ y_n\end{bmatrix}[/tex]

Now that is a linear transformation from [itex]R^2[/itex] to [itex]R^n[/itex] and so its image is a 2 dimensional subspace of [itex]R^n[/itex]. If the vector
[tex]\begin{bmatrix} y_1 \\ y_2 \\ \cdot \\ \cdot \\ \cdot \\ y_n\end{bmatrix}[/tex]
happens to lie in that subspace, that is, if the points happen to lie on a straight line, then there would be a precise solution. If not, then then you are looking for the <a, b> that comes closest to that. Geometrically, you can find that solution vector by dropping a perpendicular from y to that two dimensional subspace. If [itex]\overline{x}[/itex] gives that "optimal" solution, that is, if [itex]A\overline{x}[/itex] is closest to y, then [itex]A\overline{x}- y[/itex] is perpendicular to that two dimensional subspace, the space of all [itex]Ax[/itex]. That means that, for any A, the inner product [itex]<Ax, A\overline{x}- y>= 0[/itex].

Now the "adjoint" of A, [itex]A^+[/itex], has the property that [itex]<Ax, y>= <x, A^+y>[/itex] so here we have [itex]<Ax, A\overline{x}- y>= <x, A^+(A\overline{x}- y)>[/itex][itex]=<x, A^+A\overline{x}- A^+y>= 0[/itex]. But since that is true for all x in R2, we must have [itex]A^+A\overline{x}- A^+y= 0[/itex] or [itex]A^+A\overline{x}= A^+y[/itex].

Now we can solve for x by multiplying both sides by [itex](A^+A)^{-1}[/itex]:
[itex]x= (A^+A)^{-1}A^+(y)[/itex]. That is the "pseudoinverse" you refer to.

Because, as before,
[tex]A= \begin{bmatrix}x_1 & 1 \\ x_2 & 1 \\ \cdot & \cdot \\ \cdot & \cdot \\ \cdot & \cdot \\ x_n & 1\end{bmatrix}[/itex]
[tex]A^+= \begin{bmatrix}x_1 & x_2 & \cdot\cdot\cdot & x_n \\ 1 & 1 & 1 \cdot\cdot\cdot & 1\end{bmatrix}[/tex]

So that
[tex]A^+A= \begin{bmatrix}\sum_{i= 1}^n x_i^2 & \sum_{i=1}^n x_i \\ \sum{i=1}^n x_i & n\end{bmatrix}[/tex]
and
[tex]A^+ y= \begin{bmatrix}\sum_{i=1}^n x_iy_i & \sum_{i=1}^n y_i\end{bmatrix}[/itex]

So you want to solve the matrix equation
[tex]\begin{bmatrix}\sum_{i= 1}^n x_i^2 & \sum_{i=1}^n x_i \\ \sum{i=1}^n x_i & n\end{bmatrix}\begin{bmatrix}a \\ b\end{bmatrix}= \begin{bmatrix}\sum_{i=1}^n x_iy_i & \sum_{i=1}^n y_i\end{bmatrix}[/itex]

That should be easy- its just a 2 by 2 matrix and the inverse of
[tex]\begin{bmatrix}A & B \\ C & D\end{bmatrix}[/tex]
is
[tex]\frac{1}{AD- BC}\begin{bmatrix}D & -B \\ -C & A\end{bmatrix}[/tex]
 
Last edited by a moderator:
  • #3
What an amazing reply! Thanks so much. It's so simple now that I see it, haha.
 
  • #4
I seem to remember it can be derived just by finding minimum value of

[tex]\sum(y_i - a x_i - b)^2[/tex]

that is, solving

[tex]\frac { \partial \sum(y_i - a x_i - b)^2 } { \partial a } = 0[/tex]

and

[tex]\frac { \partial \sum(y_i - a x_i - b)^2 } { \partial b } = 0[/tex]
 
  • #5
Yeah, but my explanation was much cooler!:smile:

More seriously, it also has the advantage that the same argument can be applied directly to "least squares parabola", "least squares cubic", etc.

For a least squares parabola, use
[tex]\begin{bmatrix}x_1^2 & x_1 & 1 \\ x_2^2 & x_2 & 1 \\ \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot \\ x_n^2 & x_n & 1\end{bmatrix}[/tex]
instead of
[tex]\begin{bmatrix}x_1 & 1 \\ x_2 & 1 \\ \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot \\ x_n & 1\end{bmatrix}[/tex]
 
Last edited by a moderator:
  • #6
[tex]
\sum(y_i - a x_i^2 - b x_i - c)^2
[/tex]

Still doable :-p
 

FAQ: How were these line fitting equations derived?

1. How were the line fitting equations derived?

The line fitting equations were derived through a process called linear regression. This involves finding the best-fit line that minimizes the distance between the data points and the line, using mathematical techniques such as least squares or maximum likelihood estimation.

2. Can the line fitting equations be applied to any type of data?

The line fitting equations can be applied to a variety of data sets, as long as the data follows a linear pattern. However, it is important to note that the equations may not accurately represent the relationship between variables in all cases, and should be interpreted with caution.

3. What assumptions are made when using line fitting equations?

When using line fitting equations, it is assumed that there is a linear relationship between the variables being studied, that the data points are independent of each other, and that there is no measurement error. Additionally, the equations assume that the data follows a normal distribution.

4. How accurate are the results obtained from line fitting equations?

The accuracy of the results obtained from line fitting equations depends on the quality of the data and how well it fits the assumptions made. If the data follows a linear pattern and the assumptions are met, the results can be highly accurate. However, if the data is non-linear or does not meet the assumptions, the results may be less accurate.

5. Can line fitting equations be used for prediction?

Yes, line fitting equations can be used for prediction by plugging in known values for one variable and solving for the other. However, it is important to note that the accuracy of the prediction depends on the accuracy of the data and the assumptions made when deriving the equations.

Similar threads

Replies
3
Views
1K
Replies
1
Views
4K
Replies
6
Views
1K
Replies
9
Views
3K
Replies
4
Views
2K
Replies
2
Views
2K
Replies
9
Views
5K

Back
Top