Understanding Least Square Solution for Random Matrices: A Comprehensive Guide

  • Thread starter Hiche
  • Start date
  • Tags
    Square
In summary: Ax=b in the space of all solutions to Ax=b is the point (x,y,z) such that <x,y,z> is the smallest distance between x and any other solution to Ax=b, and <y,z> is the smallest distance between y and any other solution to Ax=b.
  • #1
Hiche
84
0
Assume we have this random matrix:[tex]
\begin{pmatrix}
1 & 0 & 2\\
2 & 0 & 4\\
0 & 1 & 2
\end{pmatrix}
[/tex]
The least square solution can be found though: [itex]A^TAx = A^Tb[/itex]. What is exactly meant by least square solution and is there a proof of some sort on how the formula was computed?
 
Physics news on Phys.org
  • #2
I'll answer the first part of your question, "what is meant by the least squares solution?" My explanation will lead right up to the point where we want b (in Ax=b) to be the projection of b onto the column space of A (since it may not lie in the column space already). The projection of b onto column space of A is Ax, where x is the solution for A^TAx=A^Tb. The last will always have a solution. I guess we could say Ax-b will have shortest length. By the way, for this x, Ax will be orthogonal to b-Ax, which when people prove this, the often include diagrams displaying this. Suppose we have data sets of two measurements for each event: (x,u), (y,v), (z,w). We expect a linear relation, m and b, so u=mx+b, v=my+b, w=mz+b. But there may be no such line for any m and b. So we want a best fit. We want to find some m and b which minimize the sum of squares: (u-(mx+b))^2+(v-(my+b))^2+(w-(mz+b))^2. Let X be the matrix with columns (x,y,z) and (1,1,1), and let M be the vector (m,b), and U be the vector (u,v,w). Then if there had been a line going through all the data sets, we would have had a solution to XM=U. Now suppose there was possibly no line through all data sets. Then the column space of X does not contain U. Finding the closest point to U in the column space corresponds both to finding the projection of U onto the column space of X, and also minimizing the sum of squares of errors. So the M we want will correspond with XM being the projection of U onto the column space of X. Apparently, and this is the part i can remember, the projection of U onto the column space of X is given by XM, where M is the solution of X^TXM=X^TU.
 
Last edited:
  • #3
You have left out one important part of your question! "ATAx= A^Tb" is a "least squares solution" to what problem??

The "problem" is, of course, to solve Ax= b. If this were a basic algebra course, we would say "divide both sides by A". In matrix algebra, that would be "multiply both sides by the inverse of A". Here, we cannot do that because A does NOT have an inverse- its determinant is 0.

In terms of more general linear transformations from a vector space U to a vector space V, Ax= b has a unique solution if and only if A is both "one to one" and "onto". One can show that a linear transformation, from U to V, always maps U into a subspace of V. If A is not "one to one" and "onto", it maps U into some proper subspace. In that case, Ax= b will have a solution (perhaps an infinite number of solutions) if and only if b happens to lie in that subspace.

If A is not invertible, so that it maps U into some proper subspace of V, and b is not in that subspace, there is no solution to Ax= b. In that case, we have to ask "what is the best we can do?"- can we find and x so that Ax is closest to b?

To talk about "closest", we have to have a notion of distance, of course, so we have to have a "metric", perhaps a metric that can be derived from a "norm", and perhaps a norm that can be derived from a an "inner product". Now, let's go back from "linear transformation" to "n by n matrix" because an n by n matrix is a linear transformation from Rn to Rn- and there is a "natural" inner product on Rn, the "dot product".

Given inner products on two vector spaces, U and V, and a linear transformation, A, from U to V, the "adjoint" of A, AT, is defined as the linear transformation from V back to U such that for any u in U and any v in V, <Au, v>= <u, ATv> (notice that since Au is in V and ATv is in U, the first of those inner products is in V and the second in V). When U and V are Rn, the "adjoint" of matrix A is its adjoint, which is why I use the notation "AT".

Now, using the norm defined by that inner product ([itex]|u|= \sqrt{<u, u>}[/itex]) and the metric defined by that norm (distance from u to v is [itex]|u- v|= \sqrt{<u-v, u-v>}[/itex]), we get the usual "geometric" fact that the shortest distance from a point p to a subspace is along the line through p perpendicular to the subspace. That is, if b is the vector in Rn closest to Ax, which is in the subspace, we must have Ax- b perpendicular to the subspace which, in turn, means if v is any vector in the subspace, that is v= Au for some u, we must have <v, Ax- b>= <Au, Ax- b>= 0. That inner product is, of course, in the "range" of A, but we can use the "adjoint" (transpose) to cast that back to the domain: <Au, Ax- b>= <u, AT(Ax- b)>= <u, ATAx- ATb>= 0.

But u could be any member the domain so we must have ATAx- ATb= 0 so that ATAx= ATb.

Of course, this is called the "least squares" solution because we are minmizing the "distance" given by [itex]\sqrt{<u, u>}[/itex] which involves squares.
 
Last edited by a moderator:

FAQ: Understanding Least Square Solution for Random Matrices: A Comprehensive Guide

What is the least square solution?

The least square solution is a method used in statistics and mathematics to find the best fitting line or curve for a set of data points. It minimizes the sum of the squared differences between the actual data points and the predicted values by the mathematical model.

When is the least square solution used?

The least square solution is commonly used in regression analysis, where the goal is to find the relationship between two variables. It is also used in other fields such as engineering, physics, and economics to analyze data and make predictions.

How is the least square solution calculated?

The least square solution is calculated by minimizing the sum of the squared differences between the observed data points and the predicted values by the mathematical model. This is typically done using an algorithm called the "normal equations" or through matrix operations.

What are the advantages of using the least square solution?

The least square solution is a simple and efficient method for finding the best fit for a set of data points. It is also widely used and accepted in various fields, making it easier to compare results across studies or experiments.

Are there any limitations to the least square solution?

While the least square solution is a powerful tool for data analysis, it does have some limitations. It assumes that the data is normally distributed, and outliers can significantly affect the results. It also does not account for any potential nonlinear relationships between the variables.

Similar threads

Replies
15
Views
1K
Replies
17
Views
2K
Replies
12
Views
2K
Replies
14
Views
2K
Replies
4
Views
2K
Replies
14
Views
2K
Replies
3
Views
1K
Back
Top