Similar Matrices and Change of Basis

In summary: Why?Remember that, in any basis, $[T]_{\mathcal{B}}^{\mathcal{B}}$ is just a matrix with the same number of rows and columns as the basis itself. So, in this basis, the first column is the same as the second column (since they are both 1s), but the third column is different (since e_1 is in the first column and e_2 is in the second column).Now, suppose we want to use the basis $\mathcal{E} = \{(1,1),(0,2)\} = \{e_1,e_2\}$, which is easier to work with.
  • #1
Math Amateur
Gold Member
MHB
3,998
48
I am spending time revising vector spaces. I am using Dummit and Foote: Abstract Algebra (Chapter 11) and also the book Linear Algebra by Stephen Freidberg, Arnold Insel and Lawrence Spence.

On page 419 D&F define similar matrices as follows:View attachment 3047

They then state the following:View attachment 3048

BUT? ... how exactly does it follow that \(\displaystyle P^{-1} M_{\mathcal{B}}^{\mathcal{B}} (\phi) P = M_{\mathcal{E}}^{\mathcal{E}} (\phi)\)

Can anyone show me how this result is obtained?

I must be missing something obvious because no indication is given of how this result is obtained ... ... ?

Peter

***EDIT***

(1) Reflecting ... ... I am beginning to think that \(\displaystyle M_{\mathcal{B}}^{\mathcal{B}} (\phi)\) and \(\displaystyle M_{\mathcal{E}}^{\mathcal{E}} (\phi)\) are equal to the identity matrix \(\displaystyle I \) ... ... ? ... ... but then, what is the point of essentially writing \(\displaystyle P^{-1} I P = I\)?(2) Further reflecting ... ... It may be that the above formula makes more sense in the overall context of what D&F say about the change of basis or transition matrix ... ?

In the light of (2) I am proving the relevant test on similar matrices and the transition of change of basis matrix for MHB members interested in the post ... ... see below ... ... View attachment 3049
 
Last edited:
Physics news on Phys.org
  • #2
Here is the thing I want you to remember, and take to heart:

A similarity transform, and a change-of-basis, are the same thing (essentially).

We have two ways of looking at this: the linear transformation view, and the matrix view. One is "abstract", and one is "concrete".

Suppose $\rho \in \text{Hom}_{\ F}(V,V)$ is an isomorphism (or linear automorphism).

This means it is invertible, that is, there exists $\rho^{-1} \in \text{Hom}_{\ F}(V,V)$ such that:

$\rho \circ \rho^{-1} = \rho^{-1} \circ \rho = 1_V$

Now suppose $\phi \in \text{Hom}_{\ F}(V,V)$ is any linear endomorphism. Clearly:

$\rho^{-1}\phi\rho$ is also a linear endomorphism. What might this do?

Some things you will have to prove, before you are fully prepared to really comprehend this:

1) $\phi \in \text{Hom}_{\ F}(V,V)$ is injective if and only if for every linearly independent subset $S \subseteq V,\ \phi(S)$ is linearly independent.

2) $\phi \in \text{Hom}_{\ F}(V,V)$ is surjective if and only if for every set $T$ with $\text{span}(T) = V$, we have that $\text{span}(\phi(T)) = V$ as well.

Taken together, these two statements imply:

3) $\phi \in \text{Hom}_{\ F}(V,V)$ is an isomorphism (of vector spaces) if and only if $\phi$ maps a basis to a basis.

Note that these conditions reduce "total behavior" of $\phi$ (on all of $V$) to behavior on certain kinds of subsets (which in most of the simpler cases, are FINITE). So what we have is "labor-saving criteria". We can test for injectivity, surjectivity, or bijectivity on certain "well-chosen" subsets.

It's condition (3) that matters, here. Essentially, $\rho$ replaces one basis with another. Then $\phi$ "does its thing" (whatever linear transform it does), on the "new basis". Finally, $\rho^{-1}$ "returns us to our original basis". So, in a sense, $\phi$ and $\rho^{-1}\phi\rho$ represent "the same transformation" (hence the name "similar"), just "different bases" (which can be thought of, naively, as "coordinate systems" or "terminologies", since an isomorphism is essentially a "re-naming scheme").

Now let's look at the concrete side of things. Realize that, abstractly, "vectors are vectors", they don't care how we label them. When we attach NUMBERS (that is, field element entries) to a vector, what those numbers MEAN is "up to us". Imagine the Euclidean plane as a blank piece of paper, with just the one dot marking the 0-vector (0,0). The coordinate axes we draw, and the unit lengths we assign on them, are OUR CHOICES, they don't come with the space. We USUALLY draw them perpendicular, and "scaled the same", but this is a bit arbitrary, on our part.

As a pair, (2,3) is just a pair of numbers. As a VECTOR, we usually mean:

$(2,3) = 2v_1 + 2v_2$, where $\{v_1,v_2\}$ is a basis. WE HAVE TO SAY what $v_1,v_2$ ARE.

For example, in the polynomial space $P_1(\Bbb R) = \{a_0 + a_1t: a_0,a_1\}$, if our basis is $\{1,t\}$, then:

$(2,3)$ means $2 + 3t$.

If our basis is $\{1-t,1+t\}$, then $(2,3) = 2(1-t) + 3(1+t) = 2 - 2t + 3 + 3t = 5 + t$, which is a different polynomial.

A matrix in one basis, may have a totally different "appearance" in another basis. For example, it may be upper-triangular in one basis (everything below the main diagonal is 0), and not so in a different basis. Some bases may be easier to work with than others, depending on what kinds of calculations we are doing.

I am going to give you an example of how this works. Study it well.

Suppose we have the linear transformation $T:\Bbb R^2 \to \Bbb R^2$ given by:

$T(x,y) = (2x+4y,3x+3y)$

In the basis $\mathcal{B} = \{(1,0),(0,1)\} = \{e_1,e_2\}$ (this is called the standard basis), we have the matrix:

$[T]_{\mathcal{B}}^{\mathcal{B}} = \begin{bmatrix}2&4\\3&3 \end{bmatrix}$ (verify this!).

Note that the first column of this matrix is $[T(e_1)]_{\mathcal{B}}$, and the second column is $[T(e_2)]_{\mathcal{B}}$. This is no accident, the way the standard basis vectors (expressed IN that basis) "pick out columns" is a function of how matrix multiplication works (if we "hit them on the other side", as row-vectors, they "pick out rows").

Now $\mathcal{E} = \{(-4,3),(1,1)\} = \{v_1,v_2\}$ is ALSO a basis for $\Bbb R^2$:

It is linearly independent:

If $c_1v_1 + c_2 v_2 = 0$ that is, if: $c_1(-4,3) + c_2(1,1) = (0,0)$, so that:

$c_2 - 4c_1 = 0$
$3c_1 + c_2 = 0$

Then $4c_1 - c_2 + 3c_1 + c_2 = -0 + 0 = 0$, that is: $7c_1 = 0\implies c_1 = 0$. Clearly, we must have as well $c_2 = 0$, so $\{v_1,v_2\}$ is linearly independent. What this means is, neither $v_1$ nor $v_2$ (the possible non-empty proper subsets of $\mathcal{E}$) is expressible in terms of the other: we need them BOTH to describe linear combinations of the two.

It spans $\Bbb R^2$:

Given $(a,b) \in \Bbb R^2$ we have:

$(a,b) = \frac{1}{7}(7a,7b) = \frac{1}{7}[(4a-4b,3b-3a) + (3a+4b,3a+4b)]$

$= \dfrac{b-a}{7}(-4,3) + \dfrac{3a+4b}{7}(1,1)$

so any point in $\Bbb R^2$ is expressible as a linear combination of $v_1,v_2$.

Why would we choose to use such an unusual basis?

Let us calculate the matrix $[T]_{\mathcal{E}}^{\mathcal{E}}$. We will do this 2 ways.

First, we calculate $T(v_1),T(v_2)$.

$T(v_1) = T((-4,3)) = (2(-4) + 4(3),3(-4) + 3(3)) = (4,-3) = -v_1$.

In the basis $\mathcal{E}$ this is the linear combination:

$(-1)v_1 + 0v_2$ so $[T(v_1)]_{\mathcal{E}} = (-1,0)$.

$T(v_2) = T((1,1) = (2(1) + 4(1),3(1) + 3(1)) = (6,6) = 6v_2$. So $[T(v_2)]_{\mathcal{E}} = (0,6)$ and by our definition of $[T]_{\mathcal{E}}^{\mathcal{E}}$ we have:

$[T]_{\mathcal{E}}^{\mathcal{E}} = \begin{bmatrix}-1&0\\0&6\end{bmatrix}$.

Next, we take "the long way around". First, we need to find a matrix $P$ that take $\mathcal{E}$-coordinates to $\mathcal{B}$-coordinates. Such a matrix would take $[v_1]_{\mathcal{E}}$ to $[v_1]_{\mathcal{B}}$, that is, when it was multiplied by $(1,0)^T$, would yield $(-4,3)^T$, and similarly, for $v_2$, would take $(0,1)^T$ to $(1,1)^T$.

It doesn't take much thought to see that this matrix is:

$P = \begin{bmatrix}-4&1\\3&1 \end{bmatrix}$.

Having applied $P$ to our $\mathcal{E}$-coordinates, we are now in $\mathcal{B}$-coordinates, and may just multiply by our "old matrix" for $T$, to get $T(v)$ in $\mathcal{B}$-coordinates:

$[T]_{\mathcal{B}}^{\mathcal{B}}P([v]_{\mathcal{E}}) = [T]_{\mathcal{B}}^{\mathcal{B}}([v]_{\mathcal{B}}) = [T(v)]_{\mathcal{B}}$

Now the inverse coordinate transformation matrix is just going to be the inverse matrix $P^{-1}$ (why?). This is:

$P^{-1} = \frac{-1}{7}\begin{bmatrix}1&-1\\-3&-4 \end{bmatrix}$, and we have:

$P^{-1}[T]_{\mathcal{B}}^{\mathcal{B}}P([v]_\mathcal{E}) = P^{-1}([T(v)]_{\mathcal{B}} = [T(v)]_{\mathcal{E}}$

that is:

$P^{-1}[T]_{\mathcal{B}}^{\mathcal{B}}P = [T]_{\mathcal{E}}^{\mathcal{E}}$

since that IS the matrix which takes $[v]_{\mathcal{E}}$ to $[T(v)]_{\mathcal{E}}$

Seeing is believing:

$P^{-1}[T]_{\mathcal{B}}^{\mathcal{B}}P = \frac{-1}{7}\begin{bmatrix}1&-1\\-3&-4 \end{bmatrix}\begin{bmatrix}2&4\\3&3 \end{bmatrix}\begin{bmatrix}-4&1\\3&1 \end{bmatrix}$

$= \frac{-1}{7}\begin{bmatrix}1&-1\\-3&-4 \end{bmatrix}\begin{bmatrix}4&6\\-3&6 \end{bmatrix}$

$=\frac{-1}{7}\begin{bmatrix}7&0\\0&-42\end{bmatrix} = \begin{bmatrix}-1&0\\0&6\end{bmatrix}$

In this "unusual basis" we see that what $T$ does, is change the sign of the $v_1$ coordinate, and magnify the $v_2$ coordinate by a factor of $6$, that is, it is the composition of an axis flip, and an axis stretch, something that is not at all apparent when using the "standard axes".
 
Last edited:
  • #3
Deveno said:
Here is the thing I want you to remember, and take to heart:

A similarity transform, and a change-of-basis, are the same thing (essentially).

We have two ways of looking at this: the linear transformation view, and the matrix view. One is "abstract", and one is "concrete".

Suppose $\rho \in \text{Hom}_{\ F}(V,V)$ is an isomorphism (or linear automorphism).

This means it is invertible, that is, there exists $\rho^{-1} \in \text{Hom}_{\ F}(V,V)$ such that:

$\rho \circ \rho^{-1} = \rho^{-1} \circ \rho = 1_V$

Now suppose $\phi \in \text{Hom}_{\ F}(V,V)$ is any linear endomorphism. Clearly:

$\rho^{-1}\phi\rho$ is also a linear endomorphism. What might this do?

Some things you will have to prove, before you are fully prepared to really comprehend this:

1) $\phi \in \text{Hom}_{\ F}(V,V)$ is injective if and only if for every linearly independent subset $S \subseteq V,\ \phi(S)$ is linearly independent.

2) $\phi \in \text{Hom}_{\ F}(V,V)$ is surjective if and only if for every set $T$ with $\text{span}(T) = V$, we have that $\text{span}(\phi(T)) = V$ as well.

Taken together, these two statements imply:

3) $\phi \in \text{Hom}_{\ F}(V,V)$ is an isomorphism (of vector spaces) if and only if $\phi$ maps a basis to a basis.

Note that these conditions reduce "total behavior" of $\phi$ (on all of $V$) to behavior on certain kinds of subsets (which in most of the simpler cases, are FINITE). So what we have is "labor-saving criteria". We can test for injectivity, surjectivity, or bijectivity on certain "well-chosen" subsets.

It's condition (3) that matters, here. Essentially, $\rho$ replaces one basis with another. Then $\phi$ "does its thing" (whatever linear transform it does), on the "new basis". Finally, $\rho^{-1}$ "returns us to our original basis". So, in a sense, $\phi$ and $\rho^{-1}\phi\rho$ represent "the same transformation" (hence the name "similar"), just "different bases" (which can be thought of, naively, as "coordinate systems" or "terminologies", since an isomorphism is essentially a "re-naming scheme").

Now let's look at the concrete side of things. Realize that, abstractly, "vectors are vectors", they don't care how we label them. When we attach NUMBERS (that is, field element entries) to a vector, what those numbers MEAN is "up to us". Imagine the Euclidean plane as a blank piece of paper, with just the one dot marking the 0-vector (0,0). The coordinate axes we draw, and the unit lengths we assign on them, are OUR CHOICES, they don't come with the space. We USUALLY draw them perpendicular, and "scaled the same", but this is a bit arbitrary, on our part.

As a pair, (2,3) is just a pair of numbers. As a VECTOR, we usually mean:

$(2,3) = 2v_1 + 2v_2$, where $\{v_1,v_2\}$ is a basis. WE HAVE TO SAY what $v_1,v_2$ ARE.

For example, in the polynomial space $P_1(\Bbb R) = \{a_0 + a_1t: a_0,a_1\}$, if our basis is $\{1,t\}$, then:

$(2,3)$ means $2 + 3t$.

If our basis is $\{1-t,1+t\}$, then $(2,3) = 2(1-t) + 3(1+t) = 2 - 2t + 3 + 3t = 5 + t$, which is a different polynomial.

A matrix in one basis, may have a totally different "appearance" in another basis. For example, it may be upper-triangular in one basis (everything below the main diagonal is 0), and not so in a different basis. Some bases may be easier to work with than others, depending on what kinds of calculations we are doing.

I am going to give you an example of how this works. Study it well.

Suppose we have the linear transformation $T:\Bbb R^2 \to \Bbb R^2$ given by:

$T(x,y) = (2x+4y,3x+3y)$

In the basis $\mathcal{B} = \{(1,0),(0,1)\} = \{e_1,e_2\}$ (this is called the standard basis), we have the matrix:

$[T]_{\mathcal{B}}^{\mathcal{B}} = \begin{bmatrix}2&4\\3&3 \end{bmatrix}$ (verify this!).

Note that the first column of this matrix is $[T(e_1)]_{\mathcal{B}}$, and the second column is $[T(e_2)]_{\mathcal{B}}$. This is no accident, the way the standard basis vectors (expressed IN that basis) "pick out columns" is a function of how matrix multiplication works (if we "hit them on the other side", as row-vectors, they "pick out rows").

Now $\mathcal{E} = \{(-4,3),(1,1)\} = \{v_1,v_2\}$ is ALSO a basis for $\Bbb R^2$:

It is linearly independent:

If $c_1v_1 + c_2 v_2 = 0$ that is, if: $c_1(-4,3) + c_2(1,1) = (0,0)$, so that:

$c_2 - 4c_1 = 0$
$3c_1 + c_2 = 0$

Then $4c_1 - c_2 + 3c_1 + c_2 = -0 + 0 = 0$, that is: $7c_1 = 0\implies c_1 = 0$. Clearly, we must have as well $c_2 = 0$, so $\{v_1,v_2\}$ is linearly independent. What this means is, neither $v_1$ nor $v_2$ (the possible non-empty proper subsets of $\mathcal{E}$) is expressible in terms of the other: we need them BOTH to describe linear combinations of the two.

It spans $\Bbb R^2$:

Given $(a,b) \in \Bbb R^2$ we have:

$(a,b) = \frac{1}{7}(7a,7b) = \frac{1}{7}[(4a-4b,3b-3a) + (3a+4b,3a+4b)]$

$= \dfrac{b-a}{7}(-4,3) + \dfrac{3a+4b}{7}(1,1)$

so any point in $\Bbb R^2$ is expressible as a linear combination of $v_1,v_2$.

Why would we choose to use such an unusual basis?

Let us calculate the matrix $[T]_{\mathcal{E}}^{\mathcal{E}}$. We will do this 2 ways.

First, we calculate $T(v_1),T(v_2)$.

$T(v_1) = T((-4,3)) = (2(-4) + 4(3),3(-4) + 3(3)) = (4,-3) = -v_1$.

In the basis $\mathcal{E}$ this is the linear combination:

$(-1)v_1 + 0v_2$ so $[T(v_1)]_{\mathcal{E}} = (-1,0)$.

$T(v_2) = T((1,1) = (2(1) + 4(1),3(1) + 3(1)) = (6,6) = 6v_2$. So $[T(v_2)]_{\mathcal{E}} = (0,6)$ and by our definition of $[T]_{\mathcal{E}}^{\mathcal{E}}$ we have:

$[T]_{\mathcal{E}}^{\mathcal{E}} = \begin{bmatrix}-1&0\\0&6\end{bmatrix}$.

Next, we take "the long way around". First, we need to find a matrix $P$ that take $\mathcal{E}$-coordinates to $\mathcal{B}$-coordinates. Such a matrix would take $[v_1]_{\mathcal{E}}$ to $[v_1]_{\mathcal{B}}$, that is, when it was multiplied by $(1,0)^T$, would yield $(-4,3)^T$, and similarly, for $v_2$, would take $(0,1)^T$ to $(1,1)^T$.

It doesn't take much thought to see that this matrix is:

$P = \begin{bmatrix}-4&1\\3&1 \end{bmatrix}$.

Having applied $P$ to our $\mathcal{E}$-coordinates, we are now in $\mathcal{B}$-coordinates, and may just multiply by our "old matrix" for $T$, to get $T(v)$ in $\mathcal{B}$-coordinates:

$[T]_{\mathcal{B}}^{\mathcal{B}}P([v]_{\mathcal{E}}) = [T]_{\mathcal{B}}^{\mathcal{B}}([v]_{\mathcal{B}}) = [T(v)]_{\mathcal{B}}$

Now the inverse coordinate transformation matrix is just going to be the inverse matrix $P^{-1}$ (why?). This is:

$P^{-1} = \frac{-1}{7}\begin{bmatrix}1&-1\\-3&-4 \end{bmatrix}$, and we have:

$P^{-1}[T]_{\mathcal{B}}^{\mathcal{B}}P([v]_\mathcal{E}) = P^{-1}([T(v)]_{\mathcal{B}} = [T(v)]_{\mathcal{E}}$

that is:

$P^{-1}[T]_{\mathcal{B}}^{\mathcal{B}}P = [T]_{\mathcal{E}}^{\mathcal{E}}$

since that IS the matrix which takes $[v]_{\mathcal{E}}$ to $[T(v)]_{\mathcal{E}}$

Seeing is believing:

$P^{-1}[T]_{\mathcal{B}}^{\mathcal{B}}P = \frac{-1}{7}\begin{bmatrix}1&-1\\-3&-4 \end{bmatrix}\begin{bmatrix}2&4\\3&3 \end{bmatrix}\begin{bmatrix}-4&1\\3&1 \end{bmatrix}$

$= \frac{-1}{7}\begin{bmatrix}1&-1\\-3&-4 \end{bmatrix}\begin{bmatrix}4&6\\-3&6 \end{bmatrix}$

$=\frac{-1}{7}\begin{bmatrix}7&0\\0&-42\end{bmatrix} = \begin{bmatrix}-1&0\\0&6\end{bmatrix}$

In this "unusual basis" we see that what $T$ does, is change the sign of the $v_1$ coordinate, and magnify the $v_2$ coordinate by a factor of $6$, that is, it is the composition of an axis flip, and an axis stretch, something that is not at all apparent when using the "standard axes".
Really helpful post ... So much more guidance and clarity than Dummit and Foote's explanation ... Thank you ...

Peter
 

FAQ: Similar Matrices and Change of Basis

1. What are similar matrices and how are they related to change of basis?

Similar matrices are matrices that represent the same linear transformation under different bases. This means that they have the same eigenvalues and eigenvectors. Change of basis involves expressing the same vector in terms of different bases. Similar matrices are related to change of basis because they help us understand how a linear transformation changes when we change the basis.

2. How do you determine if two matrices are similar?

Two matrices are similar if they have the same eigenvalues and eigenvectors. This can be checked by calculating the determinants of the two matrices and comparing them, or by finding the eigenvalues and eigenvectors of the matrices and checking if they are the same.

3. Can two matrices be similar but have different dimensions?

No, similar matrices must have the same dimensions. This is because the number of eigenvalues and eigenvectors must be the same in order for the matrices to be similar.

4. How do you find the change of basis matrix between two bases?

To find the change of basis matrix between two bases, you need to first express the basis vectors of the new basis in terms of the old basis. Then, these basis vectors can be used to form a transformation matrix. The inverse of this transformation matrix will be the change of basis matrix.

5. Why is understanding similar matrices and change of basis important in linear algebra?

Similar matrices and change of basis are important in linear algebra because they allow us to understand how a linear transformation changes under different bases. This is useful in many applications, such as in solving systems of linear equations, diagonalizing matrices, and analyzing the behavior of systems in physics and engineering.

Similar threads

Back
Top