Learning Advanced Math Notation for Self-Studying Physics

Xilor · Aug 28, 2016

Hi, while self-studying physics, I keep bumping into books that transition into more abstract math, using notation systems unfamiliar to me. This is often accompanied by explanations consisting of: 'we can write this as' 'it is easy to show that' 'therefore', which are not particularly helpful. And as books usually build on earlier sections, this is usually the point where I have to abandon a work. And of course, the next book reexplains all the sections I did understand, until again suddenly descending into some rune-language.

Are there any good methods/places to become familiar with these notation systems? Unfortunately I don't have access to professors who can explain the tricky bits when questions arise.
My focus would mostly be on SR/GR, and the problem usually starts appearing around the point where more abstract objects containing multiple elements, such as matrices, start appearing in formulas with millions of indices.

Lucas SV · Aug 28, 2016

To be honest the best place to learn mathematical notation from is mathematics itself. Personally the first time I experienced material that emphasized the importance of proof is in this youtube series on analysis:
If you start practising proofs in which you use sets, the symbol ##\in##, ##\subset##, ##\cup##, ##\cap##, logical notation ##\Rightarrow##, ##\forall##, ##\exists##, you should start getting it at some point. I'm not sure it is this kind of notation you were talking about though.

The main mathematics of GR is differential geometry, which can be very technical. However, for a first look in GR. you do not need to know many technicalities. A book that teaches GR without going into the geometric details, or describing spacetime as a manifold, is Weinberg's "gravitation and cosmology". This was actually the first book I learned GR from, and it allowed me to learn it really early on. Later I needed to learn manifolds anyway, though.

If you have examples of notations you are struggling with feel free to post. Perhaps make a list. Look up the latex symbols required, as in http://web.ift.uib.no/Teori/KURS/WRK/TeX/symALL.html, or a post in PF (the faq talks about latex and writing equations).

Xilor · Aug 28, 2016

The main mathematics GR is differential geometry, which can be very technical. However, for a first look in GR. you do not need to know many technicalities. A book that teaches GR without going into the geometric details, or describing spacetime as a manifold, is Weinberg's "gravitation and cosmology". This wasactuallythe first book I learned GR from, and it allowed me to learn it really early on. Later I needed to learn manifolds anyway, though.

Unfortunately it's more the technical side that I'm interested in now. Conceptually most of GR seems clear, but I'm trying to work myself towards the point where I could understand and perhaps work with the field equations etc.

Lucas SV said:

If you have examples of notations you are struggling with feel free to post.

For example, I was pointed to the Sean Caroll lectures and quickly got stuck on some of the earliest sections.
https://preposterousuniverse.com/wp-content/uploads/grnotes-one.pdf
Up to (1.8) I don't have any problems following. At (1.9) I start having trouble. It took me some time here to figure out that the x^u and x^v actually refer to vectors using all of the dimensions, rather than just meaning one value of one dimension. but ok, I got through it.
At (1.10), I have no clue why the prime goes on top of the indice and I don't know why the delta is missing now, but conceptually still no problems here.
From (1.12) I can tell what (1.11) is supposed to be, but would not have figured it out otherwise. What is happening with these indices, why is the 'u' flying over to the matrix when he just referred to multiplying by x^u what is its meaning up there at the matrix?
(1.13) makes me confident that I have no idea what is going on. What are these 'T' indices, why is this what we 'would like', how can I tell that whatever he's doing here relates to things being invariant? Where did the delta-x's suddenly come from and why do have two of them. Are the T's even behind the numbers they belong to, or are they in front? What happened to all those other indices we were using earlier? Does this no longer apply to all dimensions? What?
Then he transforms whatever that was, using a 'therefore' and an 'or' to whatever (1.15) is, and I know I should just abandon this.

Another example, in this book by Schutz:
http://202.38.64.11/~jmy/documents/ebooks/Schutz%20A%20First%20Course%20in%20General%20Relativity(Second%20Edition).pdf
It starts out innocently enough, until (1.2) happens. A sudden transformation into some notation I don't understand. Okay, alpha and beta here are just 0,1,2,3 apparently, and I know what is supposed to come out of it so I have some vague idea. But when he then gets to (1.3), there is no way I can still follow, and again might as well abandon.

Lucas SV · Aug 28, 2016

Ok, I may have come across these notes when I started learning tensorial notation. I think I had similar struggles. Anyway I will try to answer each problem.

Xilor said:

At (1.9) I start having trouble. It took me some time here to figure out that the xu and xv actually refer to vectors using all of the dimensions, rather than just meaning one value of one dimension.

This is the first example of a tensor equation written in components. The key to understanding this is the Einstein summation convention. As explained in the notes, (1.3) has the same content. Indeed if you expand (1.9) by summing over both ##\mu## and ##\nu## indices, you will see this (Do it!). It is useful to note that although you are summing over 16 numbers, since ##\mu## and ##\nu## both range over four numbers, most of the terms in the sum are ##0## because ##\eta## is diagonal.

Xilor said:

At (1.10), I have no clue why the prime goes on top of the indice and I don't know why the delta is missing now, but conceptually still no problems here.

This is just a convention. Some authors use prime over indices to say that ##x## is written in the new coordinate system, while some authors use ##x'^\mu##. In the convention used by Caroll, ##\mu## and ##\mu'## should be considered as different indices in the same way ##\mu## and ##\nu## are considered as different indices.

Xilor said:

What is happening with these indices, why is the 'u' flying over to the matrix when he just referred to multiplying by xu what is its meaning up there at the matrix?

Again Einstein summation convention. Do the following exercise: expand (1.11) using the summation convention. Then expand (1.12) using the componentwise definition of a matrix acting on a vector (##\Lambda## is a matrix with components ##\Lambda^{\alpha}_{\ \beta}##, while ##x## is a vector). Compare your results.

Xilor said:

What are these 'T' indices, why is this what we 'would like', how can I tell that whatever he's doing here relates to things being invariant?

##T## means the matrix transpose (look it up if you don't know it!), so it is not an index. You can take the transpose of a column vector and it becomes a row vector (vectors are also matrices).

Xilor said:

What happened to all those other indices we were using earlier?

Carroll switched notation back from components to matrix notation in (1.13). The first line of 1.13 (unprimmed) is the same as (1.9), when written in components (Prove this!). He equates it to the prime version because ##s^2## as a scalar is meant to be invariant under ##\Lambda## coordinate transormations. Then he uses the transformation laws for the change in ##x##, as given in (1.12) in order to find the orthogonality condition (1.15).

Lucas SV · Aug 28, 2016

Xilor said:

Unfortunately it's more the technical side that I'm interested in now. Conceptually most of GR seems clear, but I'm trying to work myself towards the point where I could understand and perhaps work with the field equations etc.

Well what I meant by technical is harder than tensor notation. The approach by Weinberg is the same as chapter one of Carroll, except instead of going to manifolds in chapter 2, Weinberg keeps using components and equations like in chapter 1 (Carroll), throughout the whole book. But although this may be a shortcut (you will see what I mean one you get past chapter 1), it is prety old-fashioned and nowadays people would expect you to know about manifolds and index free notation, which appears in chapter 2. But still, Weinberg's book is a great book, with lots of physical insight and will certainly teach you how to do calculations in GR and apply them to different circunstances.

A lecture series that may help understanding the notation (it certainly did help me) is Suskind, .

Lucas SV · Aug 28, 2016

Xilor said:

It starts out innocently enough, until (1.2) happens. A sudden transformation into some notation I don't understand. Okay, alpha and beta here are just 0,1,2,3 apparently, and I know what is supposed to come out of it so I have some vague idea. But when he then gets to (1.3), there is no way I can still follow, and again might as well abandon.

This actually helps understanding carroll's (1.13), and is related to what I said "componentwise definition of a matrix acting on a vector". So Schutz equation (1.2) is very important. To convince yourself of its truthfullness, do as many exercises of matrix multiplication of the following form:
Compute the number ##u^T\cdot A \cdot v##, where ##A## is an ##n\times n## matrix, and both ##u## and ##v## are ##n\times 1## matrices (a.k.a. column vectors). ##T## is the transpose I already described.

Pick any matrix you like and any two vectors you like and do the computation. Play around with this. After you are comfortable with some examples prove that (1.2) is true for the case of ##n=2##. Then move on to ##n=3##. Soon enough you will understand the pattern.

Xilor · Aug 28, 2016

Lucas SV said:

Ok, I may have come across these notes when I started learning tensorial notation. I think I had similar struggles. Anyway I will try to answer each problem.

Thank you for your help! After reading your comments this is what I still struggled with:

Again Einstein summation convention. Do the following exercise: expand (1.11) using the summation convention. Then expand (1.12) using the componentwise definition of a matrix acting on a vector (##\Lambda## is a matrix with components ##\Lambda^{\alpha}_{\ \beta}##, while ##x## is a vector). Compare your results.

So here at (1.11) I'm mostly confused why it is written as: x^##\mu'## = ##\Lambda^{\mu}_{\nu}##x^##\nu## rather than x^' = ##\Lambda_{\nu}##x^##\nu##. The latter seems the same to me as (1.12), so what does this other information indicate?

##T## means the matrix transpose (look it up if you don't know it!), so it is not an index. You can take the transpose of a column vector and it becomes a row vector (vectors are also matrices).

Ah alright, that makes the T make sense. So then should I read these sections of 1.13 as:
a. Take the vector with the 4 values (x,y,z,t), transform that vector using a transposed form of the matrix we were using (wait isn't that the same since it was 4x4 and had a diagonal?), then multiply it by the vector again.
b. Same thing as the last one but now with the coords found earlier. (which we want to result in the same interval if the interval is invariant)
c. Take same coords as a. Transform them with that matrix we had used to find x' (##\Lambda##), transform with the transposed matrix again like in a. and b. , then transform with the transposed form of ##\Lambda##. So basically the first transform would provide us with x' again, then the second step would be the same result as that what we had after the first transform in step 2. Then I guess the third step is supposed to take us back to the place we were right before multiplying by the last vector, so that the result is the same as in step 1 in the end.
So we need a kind of matrix for ##\Lambda## that would make that possible. So ##\Lambda## needs to be a matrix that when transposed will basically undo it's transformation from x to x', but that needs to take into account that we transformed with the other matrix in between. Is that a correct interpretation of this step? (and basically what 1.14 says?).

Then on (1.15), if that just means the same again but using the summation notation then that mostly makes sense again. I still can't really read it, because the way this convention is done properly still eludes me. I for example have no clue how to figure out the order in which these operations are supposed to be done in this system. The whole transposing thing of ##\eta## is still confusing too. Or does the T mean a transpose of everything that happened before, rather than using a transposed form of whatever matrix it is in front?

But still, Weinberg's book is a great book, with lots of physical insight and will certainly teach you how to do calculations in GR and apply them to different circunstances.

I'll have a look at his if I end up struggling throughout Caroll's for sure. The lecture series sounds good too, thanks for the suggestions!

To convince yourself of its truthfullness, do as many exercises of matrix multiplication of the following form:

So my problem here with (1.2) was not that I don't believe they're the same or struggle with vector/matrix transformations. It's more that I couldn't really parse it to mean anything, let alone the right thing. But thanks to your previous comments, I'm guessing the meaning of it is that it's going to output (with both alpha and beta being 0,1,2,3 using t=0, x=1,y=2,z=3) 16 vectors M dealing with one of the possible combinations of dimensions. Is that correct? And for each we need to do a multiplication using only the values of those dimensions. Let's take alpha is 1 and beta is 2, we're going to have a vector M which is (0,x,0,0) * (0,0,y,0) = (0,0,0,0). And for alpha is 0 and beta is 0 we have (t,0,0,0) * (t,0,0,0) = (t^2,0,0,0). Is that correct?
On second thought. That doesn't seem right, after adding everything together we'd have a vector (t^2,x^2,y^2,z^2), rather than a single number, which is presumably what we want. And we also have t^2 instead of -t^2. How is it even possible to get the negative sign in here when nothing in 1.2 makes a reference to a negative sign?
If M is a number, then why the notation and wouldn't we get a different result?
If M is a matrix, then how exactly does everything even work, it doesn't seem to be defined as anything, so wouldn't it be just 4x4 zeroes?
Seems I'm still confused.

Perhaps the lectures will help. The question was initially more about figuring out how to be able to learn these kinds of things on my own accord anyway. It's amazing having someone knowledgeable help out, but if that's the solution for every roadblock, it's pretty hard to get further.

Lucas SV · Aug 28, 2016

Yes you certainly need to struggle to learn those things.

Xilor said:

So here at (1.11) I'm mostly confused why it is written as: xμ′μ′\mu' = ΛμνΛνμ\Lambda^{\mu}_{\nu}xνν\nu rather than x' = ΛνΛν\Lambda_{\nu}xνν\nu. The latter seems the same to me as (1.12), so what does this other information indicate?

It is ##x^{\mu'}=\Lambda^{\mu'}_\nu x^\nu##. Basically this actually means four equations, each equation for a specific value of ##\mu'=0,1,2,3##. It is four equations that only involve components, so numbers, each equation's RHS has four terms. The index ##\mu'## which appears once in each side is called a free index. The indices ##\nu## which appear on the RHS are called dummy indices. You sum over dummy indices but not free indices. If you recall your maths class on analytic geometry you would have learned that a system of linear equations can be written in matrix form. Well you can think of ##x^{\mu'}=\Lambda^{\mu'}_\nu x^\nu## as the system of four linear equations with four unknowns ##x^\nu## and ##x'=\Lambda\cdot x## as the corresponding matrix form.

Xilor said:

So ΛΛ\Lambda needs to be a matrix that when transposed will basically undo it's transformation from x to x', but that needs to take into account that we transformed with the other matrix in between. Is that a correct interpretation of this step? (and basically what 1.14 says?).

I'm not sure I follow, but it is true that 1.14 is a condition on the matrix ##\Lambda## that must hold in order for ##s^2## to be invariant under ##\Lambda## transformations. This condition is extremely important in SR, it is called the orthogonality condition. Any matrix ##\Lambda## satisfying this condition is called a Lorentz transformation.

Xilor said:

Lets take alpha is 1 and beta is 2, we're going to have a vector M which is (0,x,0,0) * (0,0,y,0) = (0,0,0,0). And for alpha is 0 and beta is 0 we have (t,0,0,0) * (t,0,0,0) = (t^2,0,0,0). Is that correct?

Are you trying to show (1.2) from Schutz? The short answer is: it depends on ##M##. If so, what we really want to compute is the product (which in mathematical terms is called a bilinear form):
$$
\langle u,v \rangle = u^T\cdot A \cdot v
$$
where ##u## and ##v## are elements of ##\mathbb{R}^n## and ##A## is a fixed linear operator acting on the space ##\mathbb{R}^n##. If you are not too familiar with vector spaces and linear operators, just think of ##A## as a matrix.
I will write this equation for the case of ##\mathbb{R}^2##
$$
\langle \begin{pmatrix}
u_1 \\ u_2 \end{pmatrix}
,
\begin{pmatrix}
v_1 \\ v_2 \end{pmatrix}
\rangle =
\begin{pmatrix}
u_1 & u_2 \end{pmatrix}
\begin{pmatrix}
A_{11} & A_{12} \\
A_{21} & A_{22} \\
\end{pmatrix}
\begin{pmatrix}
v_1 \\ v_2 \end{pmatrix}
$$
Please compute this equation. Then you will find an equation just like 1.2 (which by the way, does not use Einstein summation convention), except the indices range over two values instead of four. This is why I gave you the exercises, which I still suggest doing regardless of you knowing how to compute matrix multiplication. The point is to show the relationship between matrix multiplication in the special case of a bilinear form, and the expression for the bilinear form in components.

##s^2## appearing in SR is just a bilinear form in which ##M## happens to be ##\eta##. Schutz was trying explain why this is the case, by starting with an arbitrary ##M## and deriving what the components of ##M## should be, in order for ##s^2## to satisfy (1.1).

So, generically speaking, whenever you see an object with two indices it is the components of a matrix, and whenever you see an object with one index, it is the components of a vector. Well you will learn all this once you move to tensors and transformation laws. If you really can't follow even after some exercises, go directly to the material on tensors, before coming back to the physics. And again Suskind lectures help.

I think if you really understand (1.2) you will find all the rest much easier to understand also.

Lucas SV · Aug 29, 2016

If you are interested, there is a student manual that goes over Schutz book, which has been recently published.
Here it is: http://www.cambridge.org/us/academi...ual-first-course-general-relativity?format=PB
I doubt however that you will be able to find a version freely available online.

Xilor · Aug 30, 2016

Lucas SV said:

It is ##x^{\mu'}=\Lambda^{\mu'}_\nu x^\nu##. Basically this actually means four equations, each equation for a specific value of ##\mu'=0,1,2,3##. It is four equations that only involve components, so numbers, each equation's RHS has four terms. The index ##\mu'## which appears once in each side is called a free index. The indices ##\nu## which appear on the RHS are called dummy indices. You sum over dummy indices but not free indices. If you recall your maths class on analytic geometry you would have learned that a system of linear equations can be written in matrix form. Well you can think of ##x^{\mu'}=\Lambda^{\mu'}_\nu x^\nu## as the system of four linear equations with four unknowns ##x^\nu## and ##x'=\Lambda\cdot x## as the corresponding matrix form.

Aha. So all the ##\mu'## is saying is that after calculating the vector (or 4 equal vectors?), we take the value of that vector at index i, and plug it into the same index into x? So read it kind of like this: ##x^{\mu'}=(\Lambda_\nu x^\nu)^{\mu'}##. Is that it? Unfortunately I've not taken this class you mention, so maybe this is a bit of learning to run before walking.

Are you trying to show (1.2) from Schutz? The short answer is: it depends on ##M##. If so, what we really want to compute is the product (which in mathematical terms is called a bilinear form):
$$
\langle u,v \rangle = u^T\cdot A \cdot v
$$
where ##u## and ##v## are elements of ##\mathbb{R}^n## and ##A## is a fixed linear operator acting on the space ##\mathbb{R}^n##. If you are not too familiar with vector spaces and linear operators, just think of ##A## as a matrix.
I will write this equation for the case of ##\mathbb{R}^2##
$$
\langle \begin{pmatrix}
u_1 \\ u_2 \end{pmatrix}
,
\begin{pmatrix}
v_1 \\ v_2 \end{pmatrix}
\rangle =
\begin{pmatrix}
u_1 & u_2 \end{pmatrix}
\begin{pmatrix}
A_{11} & A_{12} \\
A_{21} & A_{22} \\
\end{pmatrix}
\begin{pmatrix}
v_1 \\ v_2 \end{pmatrix}
$$
Please compute this equation. Then you will find an equation just like 1.2 (which by the way, does not use Einstein summation convention), except the indices range over two values instead of four.

So I did your thing, and I got:

[u₁ * (A₁₁v₁+ A₁₂v₂) + [u₁ * (A₂₁v₁+ A₂₂v₂) + u₂ * (A₁₁v₁ + A₁₂v₂)] + u₂ *(A₂₁v₁ + A₂₂v₂)]

So I'm assuming you mean that we end up with something that is functionally similar to (1.2), with A representing M, and the dimensional values being like the values of u and v. It still doesn't tell me anything about A or M though, so how could it claim these are equal? I suppose M would have to be the standard SR matrix

\begin{pmatrix}
-1 & 0 & 0 & 0 \\
0 & 1 & 0 & 0 \\
0 & 0 & 1 & 0 \\
0 & 0 & 0 & 1 \\
\end{pmatrix}

But it doesn't say that we should use that anywhere. So how is one supposed to deduce that?
In your example, using just the first two columns/rows. We'd get
[u₁ * (-1*v₁+ 0*]v₂) + [u₁ * (0*v₁+ 1*v₂) + u₂ * (1*v₁ + 0*v₂)] + u₂ *(0*v₁ + 1*v₂)]
=
-v₁u₁ + u₁v₂ - v₁u₂+ u₂v₂

Wait, that doesn't check out anyway...

Lucas SV · Aug 30, 2016

Xilor said:

Is that it? Unfortunately I've not taken this class you mention, so maybe this is a bit of learning to run before walking.

Yes, maybe it is.

Xilor said:

So I did your thing, and I got:

[u1 * (A11v1+ A12v2) + [u1 * (A21v1+ A22v2) + u2 * (A11v1 + A12v2)] + u2 *(A21v1 + A22v2)]

I don't see where you get the two middle terms from. You should get
$$
\langle u, v \rangle = u_1(A_{11}v_1+A_{12}v_2)+u_2(A_{21}v_1+A_{22}v_2)
$$The RHS of (1.2) is not too different than this, except Schutz gave the name of ##M## to ##A##, and the expression is underlying vector space has four dimensions instead of two.

Xilor said:

It still doesn't tell me anything about A or M though, so how could it claim these are equal?

The ##A## in the definition of the bilinear product is meant to be arbitrary. Actually I should be more careful of the wording and say that is the definition of a bilinear product with respect to ##A##. So for any matrix ##A## you pick (you are free to choose), the bilinear product is defined as in post #8. Notice, from the equation I just wrote, that the bilinear product is a linear combination of products of components of the vectors ##u## and ##v##. Any arbitrary such linear combination will be a bilinear product for some matrix ##A##.

Now let us return to Schutz. The argument goes as follows. Assume (1.1) holds. Assume we make a linear change of coordinates. Then, since ##\Delta s^2## is a linear combination of components of ##\Delta x## (by which I mean the four-vector) and ##\Delta x##, it is the case that ##\Delta \bar{s}^2## is a linear combination of components of ##\Delta \bar{x}## and ##\Delta \bar{x}##. Therefore ##\Delta \bar{s}^2## must be a bilinear form with respect to some matrix ##M##. This is the same as saying that there exists a matrix ##M## such that (1.2) is true. Then this is used to figure out the components of ##M## up to the point where (1.5) is proved.

By the way, I understand what you mean by the notation ##\Lambda_\nu x^\nu##. But this notation is a little awkward and people don't really use it. Either they write in full component form or in index-free notation. What you are suggesting is to mix the two, which again is non-standard (there are exceptions to this statement, in gauge theory, but you still have way to go to get there.)

Also you had a good idea to only take the first two components to get your last equation (which would have been correct if you had gotten the bilinear form right). This is actually called a 1+1 spacetime.

Xilor · Aug 31, 2016

Lucas SV said:

I don't see where you get the two middle terms from. You should get
$$
\langle u, v \rangle = u_1(A_{11}v_1+A_{12}v_2)+u_2(A_{21}v_1+A_{22}v_2)
$$

It was because I took the outer product instead of the dot product. It makes sense now.

This is the same as saying that there exists a matrix ##M## such that (1.2) is true. Then this is used to figure out the components of ##M## up to the point where (1.5) is proved.

Thanks for the explanation, I finally get what (1.2) is saying. I think I should probably take a few steps backwards before approaching the rest of this though, it's clearly still above my level.

Dale · Aug 31, 2016

Xilor said:

At (1.10), I have no clue why the prime goes on top of the indice

There is no particular meaning for that. It is just a common convention in the SR literature to denote that a particular value is in a different reference frame (the primed frame) than the other frame (the unprimed frame)

Learning Advanced Math Notation for Self-Studying Physics

Related to Learning Advanced Math Notation for Self-Studying Physics

1. What is advanced math notation?

2. Why is it important to learn advanced math notation for self-studying physics?

3. What are some common advanced math notations used in physics?

4. How can I improve my understanding of advanced math notation?

5. Can I learn advanced math notation without a formal math background?

Similar threads

Hot Threads

Recent Insights