Showing that Lorentz transformations are the only ones possible

strangerep · Nov 19, 2012

Fredrik said:

Don't you mean L(λ) and L'(λ') (with x=L(λ) and x'=L'(λ')), [...]

More or less. I was trying to find a notation that made it more obvious that the ##L'## stuff was in a diffferent space. I need to think about the notation a bit more to come up with something better,

[...]Note that I do understand that the partial derivatives do not depend on v. I made that explicit by putting them into matrices M and m that are treated as constants.

OK, then let's dispose of the easy part, assuming that the partial derivatives do not depend on v, and using an example that's easy to relate to your M,m notation.

First off, suppose I give you this equation:
$$
az^2 ~=~ b z f(z) ~,
$$where ##z## is a real variable and ##a,b,## are real constants (i.e., independent of ##z##). Then I ask you to determine the most general form of the function ##f##, (assuming it's analytic).

We express it as a Taylor series: ##f(z) = f_0 + z f_1 + z^2 f_2^2 + \dots## where the ##f_i## coefficients are real constants. Substituting this into the main equation, we get
$$
az^2 ~=~ b z (f_0 + z f_1 + z^2 f_2^2 + \dots)
$$ Then, since ##z## is a variable, we may equate coefficients of like powers of ##z## on both sides. This implies ##f_1 = a/b## but all the other ##f_i## are zero. Hence ##f(z) \propto z## is the most general form of ##f## allowed.

Now extend this example to 2 independent variables ##z_1, z_2## and suppose we are given an equation like
$$
A^{ij} z_i z_j ~=~ b^k z_k f(z_1,z_2) ~,
$$ (in a hopefully-obvious index notation), where ##A,b## are independent of ##z##. Now we're asked to find the most general (analytic) form of ##f##. Since ##z_1, z_2## are independent variables, we may expand ##f## as a 2D Taylor series, substitute it into the above equation, and equate coefficients for like powers of the independent variables. We get an infinite set of equations for the coefficients of ##1, z_1, z_2, z_1^2, z_1 z_2, z_2^2, \dots~## but only the terms from the expansion of ##f## corresponding to ##f^j z_j## can possibly match up with a nonzero coefficient on the LHS.

[Erland: Does that explain it enough? All the ##v^i## are independent variables, because we're trying to find a mapping whose input constraint involves a set of arbitrary lines.]

Fredrik · Nov 19, 2012

strangerep said:

First off, suppose I give you this equation:
$$
az^2 ~=~ b z f(z) ~,$$ [...] Then I ask you to determine the most general form of the function ##f##, (assuming it's analytic).

If it's analytic, then I agree that what you're doing proves the linearity. But I don't think it's obvious that our f(x,v) is analytic in v.

Erland said:

True, but in this case. the equation above holds for all ##i##. And, since the matrix ##\Lambda^i{}_{,\,j}(x)## is assumed to be invertible for all ##x##, not all its rows can be orthogonal to ##v##.

Hm, that would solve one of our problems at least. I wrote as ##v^TMv=g(v)m^Tv## is n equalities, not just one. I should have kept the i index around to make that explicit. I'll put it downstairs: ##v^T M_i v =g(v)m_i^T v##. What you're saying is that when v≠0, there's always an i such that ##m_i^Tv\neq 0##. So if you're right, we can do this:

Let v be non-zero, but otherwise arbitrary. Let a be an arbitrary real number. For all i, we have
$$g(av)m_i^Tv=\frac{g(av)m_i^T(av)}{a} =\frac{(av)^TM_i(av)}{a} =a v^T M_i v =ag(v)m_iv^T.$$ So now we just choose i such that ##m_i^T v\neq 0## and cancel that factor from both sides to get g(av)=ag(v).

Unfortunately, I still don't see how to prove that g(u+v)=g(u)+g(v) for all u,v.

You may have to remind me of some calculus. The square matrix that has the ##m_i^T## as its rows is the Jacobian matrix of ##\Lambda##. We need those rows to be linearly independent, so we need the Jacobian determinant of ##\Lambda## to be non-zero. But what's the problem with a function whose Jacobian determinant is zero? I haven't thought about these things in a while.

strangerep · Nov 19, 2012

Fredrik said:

If it's analytic, then I agree that what you're doing proves the linearity. But I don't think it's obvious that our f(x,v) is analytic in v.

Well, that needs more care. I think one only needs the assumption that the desired be analytic in a neightborhood of the origin, but that's a subject for another post.

Unfortunately, I still don't see how to prove that g(u+v)=g(u)+g(v) for all u,v.

Having shown that ##f(x,v)## is of the form ##f_k v^k##, isn't that enough to continue to Guo's eq(165) and beyond?

You may have to remind me of some calculus. The square matrix that has the ##m_i^T## as its rows is the Jacobian matrix of ##\Lambda##. We need those rows to be linearly independent, so we need the Jacobian determinant of ##\Lambda## to be non-zero. But what's the problem with a function whose Jacobian determinant is zero? I haven't thought about these things in a while.

Since we're talking about transformations between inertial observers, we must be try to find a group of transformations, hence they must be invertible. This should probably be inserted in the statement of the theorem.

Fredrik · Nov 19, 2012

strangerep said:

Having shown that ##f(x,v)## is of the form ##f_k v^k##, isn't that enough to continue to Guo's eq(165) and beyond?

I suppose we can move on, but I don't think we have shown that.

strangerep said:

Since we're talking about transformations between inertial observers, we must be try to find a group of transformations, hence they must be invertible. This should probably be inserted in the statement of the theorem.

Right, but for ##\Lambda## to be invertible, isn't it sufficient that its Jacobian matrix at x is ≠0 for all x? The condition on ##\Lambda## that we need to be able to prove that ##f(x,av)=af(x,v)## for all x,v and all real numbers a, is that its Jacobian determinant at x is non-zero for all x. To put it another way, it's sufficient to know that the rows of the Jacobian matrix are linearly independent.

strangerep · Nov 19, 2012

Fredrik said:

I suppose we can move on, but I don't think we have shown that.

Wait -- if you don't follow that, then we can't move on. Are you able to do the 2-variable example in my earlier post #71 explicitly, and show that the ##f(z)## there is indeed of the form ##f_j z^j## ?

Fredrik · Nov 20, 2012

strangerep said:

Wait -- if you don't follow that, then we can't move on. Are you able to do the 2-variable example in my earlier post #71 explicitly, and show that the ##f(z)## there is indeed of the form ##f_j z^j## ?

Yes, if f is analytic, but we don't know even know if it's differentiable.

strangerep · Nov 20, 2012

Fredrik said:

Yes, if f is analytic, but we don't know even know if it's differentiable.

I think this follows from continuity of the mapping from ##\lambda## to ##\lambda'## (in terms of which ##f## was defined).

Edit: Adding a bit more detail... It's also physically reasonable to require that inertial observers with velocities ##v## and ##v+\epsilon## should not map to pathologically different inertial observers in the target space, else small error margins in one frame do not remain "small" in any sense under the mapping. Expressing this principle in a mathematically precise way, we say that open sets in ##v## space must map to open sets in ##v'## space, and vice versa. IOW, the mapping must be continuous wrt ##v##, in standard topology.

Erland · Nov 20, 2012

Of course it is so that a square matrix is invertible iff its rows are linearly independent iff its determinant is ≠0. If we assume that ##\Lambda## is an invertible tranformation such that both itself and its inverse are C¹ everywhere, then the Jacobian matrix of ##\Lambda## is invertible everywhere.

strangerep, I agree that you have proved that f(x,v) is linear in v if it is analytic, as a function of v, in a neighbourhood of the origin, but I agree with Fredrik that this is not obvious. Analyticity is a quite strong condition and I can't see any physical reason for it.

strangerep · Nov 20, 2012

Erland said:

strangerep, I agree that you have proved that f(x,v) is linear in v if it is analytic, as a function of v, in a neighbourhood of the origin, but I agree with Fredrik that this is not obvious. Analyticity is a quite strong condition and I can't see any physical reason for it.

Are you ok with the physical motivation that the mapping of the original projective space (of lines) to the target projective space (of lines) should be continuous?

Except for the point about analyticity, are you ok with the rest of the proof now?

Erland · Nov 21, 2012

strangerep said:

Are you ok with the physical motivation that the mapping of the original projective space (of lines) to the target projective space (of lines) should be continuous?

Yes, this is a reasonable assumption. So, analyticity follows from this?

strangerep said:

Except for the point about analyticity, are you ok with the rest of the proof now?

Up to the point we have discussed hitherto, yes. I have to read the rest of the proof.

Btw. It is indeed sufficient to prove analyticity in a neighbourhood of v=0. For then, strangerep's argument shows linearity for "small" vectors, and then Fredrik's argument showing homogeneity shows linearity also for "large" vectors.

micromass · Nov 21, 2012

By the way, if anybody is interested: the theorem also holds without any smoothness or continuity assumptions. So if [itex]U[/itex] and [itex]V[/itex] are open in [itex]\mathbb{R}^n[/itex] and if [itex]\varphi:U\rightarrow V[/itex] is a bijection, then it is of the form described in the paper (which is called a projectivity).

This result is known as the local form of the fundamental theorem of projective geometry.
A general proof can be found here: rupertmccallum.com/thesis11.pdf

In my opinion, that proof is much more easier than Guo's "proof" and more general. Sadly, I don't think the paper is very readable. If anybody is interested, then I'll write up a complete proof.

Fredrik · Nov 21, 2012

I'm definitely interested in some of it, but I'm not sure if I will need the most general theorem. I'm mainly interested in proving this:

Suppose that X is a vector space over ℝ such that 2 ≤ dim X < ∞. If T:X→X is a bijection that takes straight lines to straight lines, then there's a y in X, and a linear L:X→X such that T(x)=Lx+y for all x in X.

I have started looking at the approach based on affine spaces. (Link). I had to refresh my memory about group actions and what an affine space is, but I think I've made it to the point where I can at least understand the statement of the theorem ("the fundamental theorem of affine geometry"). Translated to vector space language, it says the following:

Suppose that X is a vector space over K, and that X' is a vector space over K'. Suppose that 2 ≤ dim X = dim X' < ∞. If T:X→X' is a bijection that takes straight lines to straight lines, then there's a y in X', an isomorphism σ:K→K', and a σ-linear L:X→X' such that T(x)=Lx+y for all x in X.

Immediately after stating the theorem, the author suggests that it can be used to prove that the only automorphism of ℝ is the identity, and that the only continuous automorphisms of ℂ are the identity and complex conjugation. That's another result that I've been curious about for a while, so if it actually follows from the fundamental theorem of affine geometry, then I think I want to study that instead of the special case I've been thinking about.

But now you're mentioning the fundamental theorem of projective geometry, so I have to ask? Why do we need to go to projective spaces?

Also, if you (or anyone) can tell me how that statement about automorphisms of ℝ and ℂ follows from the fundamental theorem of affine geometry, I would appreciate it.

strangerep · Nov 21, 2012

micromass said:

By the way, if anybody is interested [...]

YES! YES! YES! (Thank God someone who knows more math than me has taken pity on us and decided to participate in this thread... :-)

the theorem also holds without any smoothness or continuity assumptions. So if [itex]U[/itex] and [itex]V[/itex] are open in [itex]\mathbb{R}^n[/itex] and if [itex]\varphi:U\rightarrow V[/itex] is a bijection, then it is of the form described in the paper (which is called a projectivity).

Hmmm. On Wiki, "projectivity" redirects to "collineation", but there's not enough useful detail on projective linear transformations and "automorphic collineations". :-(

This result is known as the local form of the fundamental theorem of projective geometry.
A general proof can be found here: rupertmccallum.com/thesis11.pdf

Coincidentally, I downloaded McCallum's thesis yesterday after doing a Google search for fundamental theorems in projective geometry. But I quickly realized it's not an easy read, hence not something I can digest easily.

In my opinion, that proof is much more easier than Guo's "proof" and more general. Sadly, I don't think the paper is very readable. If anybody is interested, then I'll write up a complete proof.

YES, PLEASE! If you can derive those fractional-linear transformations in a way that physicists can understand, I'd certainly appreciate it -- I haven't been able to find such a proof at that level, despite searching quite hard. :-(

[Edit: I'm certainly interested in the more general projective case, although Fredrik is not.]

DrGreg · Nov 21, 2012

I've just realized there's a simple geometric proof, for Fredrik's special case, for the case of the whole of [itex]\mathbb{R}^2[/itex], which I suspect would easily extend to higher dimensions.

Let [itex]T : \mathbb{R}^2 \rightarrow \mathbb{R}^2[/itex] be a bijection that maps straight lines to straight lines. It must map parallel lines to parallel lines, otherwise two points on different parallel lines would both be mapped to the intersection of the non-parallel image lines, contradicting bijectivity. So it maps parallelograms to parallelograms. But, if you think about it, that's pretty much the defining property of linearity (assuming T(0)=0).

There are a few I's to dot and T's to cross to turn the above into a rigorous proof, but I think I'm pretty much there -- or have I omitted too many steps in my thinking? (I think you may have to assume T is continuous to extend the additive property of linearity to the scalar multiplication property.)

Fredrik · Nov 21, 2012

DrGreg said:

I've just realized there's a simple geometric proof, for Fredrik's special case, for the case of the whole of [itex]\mathbb{R}^2[/itex], which I suspect would easily extend to higher dimensions.

Let [itex]T : \mathbb{R}^2 \rightarrow \mathbb{R}^2[/itex] be a bijection that maps straight lines to straight lines. It must map parallel lines to parallel lines, otherwise two points on different parallel lines would both be mapped to the intersection of the non-parallel image lines, contradicting bijectivity. So it maps parallelograms to parallelograms. But, if you think about it, that's pretty much the defining property of linearity (assuming T(0)=0).

There are a few I's to dot and T's to cross to turn the above into a rigorous proof, but I think I'm pretty much there -- or have I omitted too many steps in my thinking? (I think you may have to assume T is continuous to extend the additive property of linearity to the scalar multiplication property.)

This idea is similar to the proof of the fundamental theorem of affine geometry in the book I linked to. The author is breaking it up into five steps. I think these are the steps, in vector space language:

Step 1: Show that T takes linearly independent sets to linearly independent sets.
Step 2: Show that T takes parallel lines to parallel lines.
Step 3: Show that T(x+y)=T(x)+T(y) for all x,y in X.
Step 4: Define an isomorphism σ:K→K'.
Step 5: Show that T(ax)=σ(a)T(x) for all a in K.

For my special case, we can skip step 4 and simplify step 5 is to "Show that T(ax)=aT(x) for all a in K". I've been thinking that I should just try to prove these statements myself, using the book for hints, but I haven't had time to do a serious attempt yet.

micromass · Nov 21, 2012

Fredrik said:

I'm definitely interested in some of it, but I'm not sure if I will need the most general theorem. I'm mainly interested in proving this:
If X is a finite-dimensional vector space over ℝ, and T:X→X is a bijection that takes straight lines to straight lines, then there's a y in X, and a linear L:X→X such that T(x)=Lx+y for all x in X.

OK, I'll try to type out the proof for you in this special case.

I have started looking at the approach based on affine spaces. (Link). I had to refresh my memory about group actions and what an affine space is, but I think I've made it to the point where I can at least understand the statement of the theorem ("the fundamental theorem of affine geometry"). Translated to vector space language, it says the following:
Suppose that X is a vector space over K, and that X' is a vector space over K'. Suppose that dim X = dim X' ≥ 2. If T:X→X' is a bijection that takes straight lines to straight lines, then there's a y in X', an isomorphism σ:K→K', and a σ-linear L:X→X' such that T(x)=Lx+y for all x in X.
(I don't know if these vector spaces need to be finite-dimensional).

Ah, but this is far more general since it deals with arbitrary fields and stuff. The proof will probably be significantly harder than the [itex]\mathbb{R}[/itex] case.

Immediately after stating the theorem, the author suggests that it can be used to prove that the only automorphism of ℝ is the identity, and that the only continuous automorphisms of ℂ are the identity and complex conjugation. That's another result that I've been curious about for a while, so if it actually follows from the fundamental theorem of affine geometry, then I think I want to study that instead of the special case I've been thinking about.

I don't think you can use the fundamental theorem to prove that [itex]\mathbb{R}[/itex] has only automorphism. I agree the author makes you think that. But what he actually wants to do is prove that the only line preserving maps [itex]\mathbb{R}^n\rightarrow\mathbb{R}^n[/itex] are the affine maps. The fundamental theorem deals with semi-affine maps: so there is an automorphism of the field. So in order to prove the case of [itex]\mathbb{R}^n[/itex] he needs a lemma that states that there is only one automorphism on [itex]\mathbb{R}[/itex]. It is not a result that (I think) follows from the fundamental theorem.

That said, the proof that [itex]\mathbb{R}[/itex] has only one automorphism is not very hard. Let [itex]\sigma:\mathbb{R}\rightarrow \mathbb{R}[/itex] be an automorphism. So:

[itex]\sigma[/itex] is bijective
[itex]\sigma(x+y)=\sigma(x)+\sigma(y)[/itex]
[itex]\sigma(xy)=\sigma(x)\sigma(y)[/itex]

So [itex]\sigma(0)=\sigma(0+0)=\sigma(0)+\sigma(0)[/itex], so [itex]\sigma(0)=0[/itex].
Likewise, [itex]\sigma(1)=\sigma(1.1)=\sigma(1)\sigma(1)[/itex], so [itex]\sigma(1)=1[/itex] (unless [itex]\sigma(1)=0[/itex] which is impossible because if injectivity).

Take [itex]n\in \mathbb{N}[/itex]. Then we can write [itex]n=\sum_{k=1}^n 1[/itex]. So
[tex]\sigma(n)=\sigma\left(\sum_{k=1}^n 1\right)=\sum_{k=1}^n \sigma(1)=\sum_{k=1}^n 1=n[/tex]

Now, we know that [itex]0=\sigma(0)=\sigma(n+(-n))=\sigma(n)+\sigma(-n)[/itex]. It follows that [itex]\sigma(-n)=\sigma(n)[/itex].

So we have proven that [itex]\sigma[/itex] is fixed on [itex]\mathbb{Z}[/itex].

Take [itex]p\neq 0[/itex]. Then [itex]1=\sigma(1)=\sigma(p\frac{1}{p})= \sigma(p)\sigma(\frac{1}{p})=p\sigma(\frac{1}{p})[/itex]. So [itex]\sigma(1/p)=1/p[/itex].
So, for [itex]q,p\in \mathbb{Z}[/itex] with [itex]p\neq 0[/itex]: [itex]\sigma(p/q)=\sigma(p)\sigma(1/q)=p/q[/itex]. So this proves that [itex]\sigma[/itex] is fixed on [itex]\mathbb{Q}[/itex].

Take [itex]x>0[/itex] in [itex]\mathbb{R}[/itex]. Then there exists a unique [itex]y\in \mathbb{R}[/itex] with [itex]y^2=x[/itex]. But then [itex]\sigma(y)^2=\sigma(x)[/itex]. It follows that [itex]\sigma(x)>0[/itex].
Take [itex]x<y[/itex] in [itex]\mathbb{R}[/itex]. Then [itex]x-y>0[/itex]. So [itex]\sigma(x-y)>0[/itex]. Thus [itex]\sigma(x)<\sigma(y)[/itex]. So [itex]\sigma[/itex] preserves the ordering.

Assume that there exists an [itex]x\in \mathbb{R}[/itex] such that [itex]\sigma(x)\neq x[/itex]. Assume (for example), that [itex]\sigma(x)<x[/itex]. Then there exists a [itex]q\in \mathbb{Q}[/itex] such that [itex]\sigma(x)<q<x[/itex]. But since [itex]\sigma[/itex] preserves orderings and rationals, it follows that [itex]\sigma(x)>q[/itex], which is a contradiction. So [itex]\sigma(x)=x[/itex].

This proves that the identity is the only automorphism on [itex]\mathbb{R}[/itex].

Now, for automorphisms on [itex]\mathbb{C}[/itex]. Let [itex]\tau[/itex] be a continuous automorphism on [itex]\mathbb{C}[/itex]. Completely analogously, we prove that [itex]\tau[/itex] is fixed on [itex]\mathbb{Q}[/itex]. Since [itex]\tau[/itex] is continuous and since [itex]\mathbb{Q}[/itex] is dense in [itex]\mathbb{R}[/itex], it follows that [itex]\tau[/itex] is fixed on [itex]\mathbb{R}[/itex].

Now, since [itex]i^2=-1[/itex]. It follows that [itex]\tau(i)^2=-1[/itex]. So [itex]\tau(i)=i[/itex] or [itex]\tau(i)=-i[/itex]. In the first case [itex]\tau(a+ib)=\tau(a)+\tau(i)\tau(b)=a+ib[/itex]. In the second case: [itex]\tau(a+ib)=a-ib[/itex].
So there are only two automorphisms on [itex]\mathbb{C}[/itex].

But now you're mentioning the fundamental theorem of projective geometry, so I have to ask? Why do we need to go to projective spaces?

We don't really need projective spaces. We can prove the result without referring to it. But the result is often stated in this form because it is more general.
Also, one of the advantages of projective spaces is that [itex]\varphi(\mathbf{x})=\frac{A\mathbf{x}+B}{C\mathbf{x}+D}[/itex] is everywhere defined, even if the denominator is 0 (in that case, the result will be a point at infinity).

DrGreg · Nov 21, 2012

Fredrik said:

This idea is similar to the proof of the fundamental theorem of affine geometry in the book I linked to. The author is breaking it up into five steps. I think these are the steps, in vector space language:

Step 1: Show that T takes linearly independent sets to linearly independent sets.
Step 2: Show that T takes parallel lines to parallel lines.
Step 3: Show that T(x+y)=T(x)+T(y) for all x,y in X.
Step 4: Define an isomorphism σ:K→K'.
Step 5: Show that T(ax)=σ(a)T(x) for all a in K.

For my special case, we can skip step 4 and simplify step 5 is to "Show that T(ax)=aT(x) for all a in K". I've been thinking that I should just try to prove these statements myself, using the book for hints, but I haven't had time to do a serious attempt yet.

Maybe I need to spell this bit out. I think if T is continuous and your Step 3 is true and [itex]K = \mathbb{R}[/itex] then you can prove [itex]T(a\mathbf{x})=aT(\mathbf{x})[/itex] as follows.

It's clearly true for a = 2 (put x=y in step 3).

By induction it's true for any integer a (y = (a-1)x).

By rescaling it's true for any rational a.

By continuity of T and density of [itex]\mathbb{Q}[/itex] in [itex]\mathbb{R}[/itex] it's true for all real a.

Fredrik · Nov 21, 2012

micromass said:

But what he actually wants to do is prove that the only line preserving maps [itex]\mathbb{R}^n\rightarrow\mathbb{R}^n[/itex] are the affine maps. The fundamental theorem deals with semi-affine maps: so there is an automorphism of the field. So in order to prove the case of [itex]\mathbb{R}^n[/itex] he needs a lemma that states that there is only one automorphism on [itex]\mathbb{R}[/itex]. It is not a result that (I think) follows from the fundamental theorem.

That said, the proof that [itex]\mathbb{R}[/itex] has only one automorphism is not very hard.
...
Now, for automorphisms on [itex]\mathbb{C}[/itex].
...

Thank you micromass. That was exceptionally clear. I didn't even have to grab a pen.

This saved me a lot of time.

DrGreg said:

Maybe I need to spell this bit out. I think if T is continuous and your Step 3 is true and [itex]K = \mathbb{R}[/itex] then you can prove [itex]T(a\mathbf{x})=aT(\mathbf{x})[/itex] as follows.

It's clearly true for a = 2 (put x=y in step 3).

By induction it's true for any integer a (y = (a-1)x).

By rescaling it's true for any rational a.

By continuity of T and density of [itex]\mathbb{Q}[/itex] in [itex]\mathbb{R}[/itex] it's true for all real a.

Interesting idea. Thanks for posting it. I will however still be interested in a proof that doesn't rely on the assumption that T is continuous.

micromass · Nov 22, 2012

Here is a proof for the plane. I think the same method of proof directly generalizes to higher dimensions, but it might get annoying to write down.

DEFINITION: A projectivity is a function [itex]\varphi[/itex] on [itex]\mathbb{R}^2[/itex] such that

[tex]\varphi(x,y)=\left(\frac{Ax+By+C}{Gx+Hy+I},\frac{Dx+Ey+F}{Gx+Hy+I}\right)[/tex]

where [itex]A,B,C,D,E,F,G,H,I[/itex] are real numbers such that the matrix

[tex]\left(\begin{array}{ccc} A & B & C\\ D & E & F\\ G & H & I\end{array}\right)[/tex]

is invertible. This invertible-condition tells us exactly that [itex]\varphi[/itex] is invertible. The inverse is again a perspectivity and its matrix is given by the inverse of the above matrix.

We can see this easily as follows:
Recall that a homogeneous coordinate is defined as a triple [x:y:z] with not all x, y and z zero. Furthermore, if [itex]\alpha\neq 0[/itex], then we define [itex][\alpha x: \alpha y : \alpha z]=[x:y:z][/itex].

There exists a bijection between [itex]\mathbb{R}^2[/itex] and the homogeneous coordinates [x:y:z] with nonzero z. Indeed, with (x,y) in [itex]\mathbb{R}^2[/itex], we can associate [x:y:1]. And with [x:y:z] with nonzero z, we can associate (x/z,y/z).

We can now look at [itex]\varphi[/itex] on homogeneous coordinates. We define [itex]\varphi [x:y:z] = \varphi(x/z,y/z)[/itex]. Clearly, if [itex]\alpha\neq 0[/itex], then [itex]\varphi [\alpha x:\alpha y:\alpha z]=\varphi [x:y:z][/itex]. So the map is well defined.

Actually, our [itex]\varphi[/itex] is actually just matrix multiplication:

[tex]\varphi[x:y:z] = \left(\begin{array}{ccc} A & B & C\\ D & E & F\\ G & H & I\end{array}\right)\left(\begin{array}{c} x\\ y \\ z\end{array}\right)[/tex]

Now we see clearly that [itex]\varphi[/itex] has an inverse given by

[tex]\varphi^{-1} [x:y:z] = \left(\begin{array}{ccc} A & B & C\\ D & E & F\\ G & H & I\end{array}\right)^{-1}\left(\begin{array}{c} x\\ y \\ z\end{array}\right)[/tex]

LEMMA: Let x,y,z and t in [itex]\mathbb{R}^2[/itex] be four distinct points such that no three of them lie on the same line. Let x',y',z',t' in [itex]\mathbb{R}^2[/itex] also be four points such that no three of them lie on the same line. There exists a projectivity [itex]\varphi[/itex] such that [itex]\varphi(x)=x^\prime[/itex], [itex]\varphi(y)=y^\prime[/itex], [itex]\varphi(z)=z^\prime[/itex], [itex]\varphi(t)=t^\prime[/itex].

We write in homogeneous coordinates:
[tex]x=[x_1:x_2:x_3],~y=[y_1:y_2:y_3],~z=[z_1:z_2:z_3],~t=[t_1:t_2:t_3][/tex]

Since [itex]\mathbb{R}^3[/itex] has dimension 3, we can find [itex]\alpha,\beta,\gamma[/itex] in [itex]\mathbb{R}[/itex] such that

[tex](t_1,t_2,t_3)=(\alpha x_1,\alpha x_2,\alpha x_3)+(\beta y_1,\beta y_2,\beta y_3)+ (\gamma z_1, \gamma z_2,\gamma z_3)[/tex].

The vectors [itex](\alpha x_1,\alpha x_2,\alpha x_3), (\beta y_1,\beta y_2,\beta y_3), (\gamma z_1, \gamma z_2,\gamma z_3)[/itex] form a basis for [itex]\mathbb{R}^3[/itex] (because of the condition that not three of x,y,z or t is on one line).

We can do the same for the x',y',z',t' and we again obtain a basis [itex](\alpha^\prime x_1^\prime,\alpha^\prime x_2^\prime,\alpha^\prime x_3^\prime), (\beta^\prime y_1^\prime,\beta^\prime y_2^\prime,\beta^\prime y_3^\prime), (\gamma^\prime z_1^\prime, \gamma^\prime z_2^\prime,\gamma^\prime z_3^\prime)[/itex] such that

[tex](t_1^\prime, t_2^\prime,t_3^\prime)=(\alpha^\prime x_1^\prime,\alpha^\prime x_2^\prime,\alpha^\prime x_3^\prime)+(\beta^\prime y_1^\prime,\beta^\prime y_2^\prime,\beta^\prime y_3^\prime)+(\gamma^\prime z_1^\prime, \gamma^\prime z_2^\prime,\gamma^\prime z_3^\prime)[/tex]

By linear algebra, we know that there exists an invertible matrix T that sends the bases on each other. This implies directly that the associated projectivity sends x to x', y to y' and z to z'.
Since
[tex](t_1,t_2,t_3)=(\alpha x_1,\alpha x_2,\alpha x_3)+(\beta y_1,\beta y_2,\beta y_3)+ (\gamma z_1, \gamma z_2,\gamma z_3)[/tex]
we get after applying T that

[tex]T(t_1,t_2,t_3)=(\alpha^\prime x_1^\prime,\alpha^\prime x_2^\prime,\alpha^\prime x_3^\prime)+(\beta^\prime y_1^\prime,\beta^\prime y_2^\prime,\beta^\prime y_3^\prime)+(\gamma^\prime z_1^\prime, \gamma^\prime z_2^\prime,\gamma^\prime z_3^\prime)[/tex]

and thus [itex]T(t_1,t_2,t_3)=(t_1^\prime,t_2^\prime, t_3^\prime)[/itex]. Thus the projectivity also sends t to t'.

THEOREM Let [itex]U\subseteq \mathbb{R}^2[/itex] be open and let [itex]\varphi:U\rightarrow \mathbb{R}^2[/itex] be injective. Assume that [itex]\varphi[/itex] sends lines to lines, then it is a projectivity.

We can of course assume that U contains an equilateral triangle ABC. Let P be the centroid of ABC.
By the previous lemma, there exists a projectivity [itex]\psi[/itex] such that [itex]\psi(\varphi(A))=A, ~\psi(\varphi(B))=B, ~\psi(\varphi(C))=C, ~\psi(\varphi(P))=P[/itex]. So we see that [itex]\sigma:=\psi\circ\varphi[/itex] sends lines to lines and that [itex]\sigma(A)=A,~\sigma(B)=B,~\sigma(C)=C,~\sigma(P)=P[/itex]. We will prove that [itex]\sigma[/itex] is the identity.

HINT: look at Figure 2.1, p.19 of the Mccallum paper.

Define E the midpoint of AC. Then E is the intersection of AC and PB. But these lines are fixed by [itex]\sigma[/itex]. Thus [itex]\sigma(E)=E[/itex]. Let D be the midpoint of BC and F the midpoint of AB. Likewise follows that [itex]\sigma(D)=D[/itex] and [itex]\sigma(F)=F[/itex].

Thus [itex]\sigma[/itex] preserves the verticles of the equilateral traingles AFE, FBD, DEF and EDC. Since [itex]\sigma[/itex] preserves parallelism, we see easily that [itex]\sigma[/itex] preserves the midpoints and centroids of the smaller triangles. So we can subdivide the triangles in even smaller triangles whose vertices are preserved. We keep doing this process and eventually we find a set S dense in the triangle such that [itex]\sigma[/itex] is fixed on that dense set. If [itex]\sigma[/itex] were continuous, then [itex]\sigma[/itex] is the identity on the triangle.

To prove continuity, we show that certain rhombuses are preserved. Look at Figure 2.3 on page 20 of McCallum. We have shown that the vertices of arbitrary triangles are preserved. Putting those two triangles together gives a rhombus. We will show that [itex]\sigma[/itex] sends the interior of any rhombus ABCD into the rhombus ABCD. Since the rhombus can be made arbitrarily small around an arbitrary point, it would follow that [itex]\sigma[/itex] were continuous.

By composing with a suitable linear map, we restrict to the following situation:

LEMMA: Let A=(0,0), B=(1,0), C=(1,1) and D=(0,1) and let [itex]\Sigma[/itex] be the square ABCD. Suppose that [itex]\sigma:\Sigma\rightarrow \mathbb{R}^2[/itex] sends lines to lines and suppose that [itex]\sigma[/itex] is fixed on A,B,C and D. Then [itex]\sigma(\Sigma)\subseteq \Sigma[/itex].

Take S on CB. We can make a construction analogous to 2.4 p.22 in MCCallen. So we let TS be horizontal, TU have slope -1 and VU be vertical. We define Q as the intersection of AS and VU. If S has coordinates [itex](1,s)[/itex] for some s. Then we can easily check that Q has coordinates [itex](s,s^2)[/itex]. In particular, Q lies in the upper half-plane (= everything about AB).

Since S in CB and since C and B are fixed. We see that [itex]\sigma(S)\in CB[/itex]. Let's say that [itex]\sigma(S)=(1,t)[/itex] for some t. The line TS is a horizontal and [itex]\sigma[/itex] maps this to a horizontal. So [itex]\sigma(T)[/itex] has the form (0,t). The line TU has slope -1. So [itex]\sigma(U)[/itex] has the form (t,0). Finally, it follows that [itex]\sigma(Q)[/itex] has the form [itex](t,t^2)[/itex]. In particular, [itex]\sigma(Q)[/itex] is in the upper half plane.

So we have proven that if S is on CB, then they ray AS emanating from A is sent into the upper half plane. Let P be an arbitrary point in the square, then it is an element of a ray AS for some S. This ray is taken to the upper half plane. So [itex]\sigma(P)[/itex] is in the upper half plane.

So this proves that the square ABCD is sent by [itex]\sigma[/itex] into the upper half plane. Similar constructions show that the square is also sent to the lower half plane, the left and right half planes. So taking all of these things together: ABCD is sent into ABCD. This proves the lemma.

So, right now we have shown that [itex]\sigma[/itex] is the identity on some small equilateral triangle in [itex]U[/itex]. So [itex]\varphi[/itex] is a projectivity on some small open set [itex]U^\prime[/itex] of U (namely on the interior of the triangle). We prove now that [itex]\varphi[/itex] will be a projectivity on entire U.

Around any point P in U, we can find some equilateral triangle. And we proved for such triangles that [itex]\varphi[/itex] is a projectivity and thus analytic. The uniqueness of analytic continuation now proves that [itex]\varphi[/itex] is a projectivity on entire U.

TrickyDicky · Nov 22, 2012

Nice proof!
If I understand it correctly this proves that the most general transformations that take straight lines to straight lines are the linear fractional ones.
To get to the linear case one still needs to impose the condition mentioned above about the continuity of the transformation, right?
Classically(Pauli for instance) this was done just assuming the euclidean (minkowskian) space as the underlying geometry.

Fredrik · Nov 22, 2012

TrickyDicky said:

If I understand it correctly this proves that the most general transformations that take straight lines to straight lines are the linear fractional ones.
To get to the linear case one still needs to impose the condition mentioned above about the continuity of the transformation, right?

It's sufficient to assume that the map that takes straight lines to straight lines is defined on the entire vector space, rather than a proper subset. It's not necessary to assume that the map is continuous. (If you want the map to be linear, rather than linear plus a translation, you must also assume that it takes 0 to 0).

Fredrik · Nov 22, 2012

DrGreg said:

I've just realized there's a simple geometric proof, for Fredrik's special case, for the case of the whole of [itex]\mathbb{R}^2[/itex], which I suspect would easily extend to higher dimensions.

Let [itex]T : \mathbb{R}^2 \rightarrow \mathbb{R}^2[/itex] be a bijection that maps straight lines to straight lines. It must map parallel lines to parallel lines, otherwise two points on different parallel lines would both be mapped to the intersection of the non-parallel image lines, contradicting bijectivity. So it maps parallelograms to parallelograms. But, if you think about it, that's pretty much the defining property of linearity (assuming T(0)=0).

There are a few I's to dot and T's to cross to turn the above into a rigorous proof, but I think I'm pretty much there -- or have I omitted too many steps in my thinking? (I think you may have to assume T is continuous to extend the additive property of linearity to the scalar multiplication property.)

I've been examining the proof in Berger's book more closely. (Change the .se to your own country domain if the url is giving you trouble). His strategy is very close to yours, but there's a clever trick at the end that allows us to drop the assumption of continuity. Consider the following version of the theorem:

Suppose that X=ℝ². If T:X→X is a bijection that takes straight lines to straight lines and 0 to 0, then T is linear.

For this theorem, the steps are as follows:

1. If K and L are two different lines through 0, then T(K) and T(L) are two different lines through 0.
2. If K and L are two parallel lines, then T(K) and T(L) are two parallel lines.
3. For all x,y such that {x,y} is linearly independent, T(x+y)=Tx+Ty. (This is done by considering a parallelogram as you suggested).
4. For all vectors x and all real numbers a, T(ax)=aTx. (Note that this result implies that T(x+y)=Tx+Ty when {x,y} is linearly dependent).

The strategy for step 4 is as follows: Let x be an arbitrary vector and a an arbitrary real number. If either x or a is zero, we have T(ax)=0=aTx. If both are non-zero, we have to be clever. Since Tx is on the same straight line through 0 as T(ax), there's a real number b such that T(ax)=bTx. We need to prove that b=a. Let B be the map ##t\mapsto tx##. Let C be the map ##t\mapsto tTx##. Let f be the restriction of T to the line through x and 0. Define ##\sigma:\mathbb R\to\mathbb R## by ##\sigma=C^{-1}\circ f\circ B##. Since
$$\sigma(a)=C^{-1}\circ f\circ B(a) =C^{-1}(f(B(a)) =C^{-1}(T(ax)) =C^{-1}(bTx)=b,$$ what we need to do is to prove that σ is the identity map. Berger does this by proving that σ is a field isomorphism. Since both the domain and codomain is ℝ, this makes it an automorphism of ℝ, and by the lemma that micromass proved so elegantly above, that implies that it's the identity map.

TrickyDicky · Nov 22, 2012

Fredrik said:

It's sufficient to assume that the map that takes straight lines to straight lines is defined on the entire vector space, rather than a proper subset. It's not necessary to assume that the map is continuous. (If you want the map to be linear, rather than linear plus a translation, you must also assume that it takes 0 to 0).

What I meant is that one must impose that the transformation must map finite coordinates to finite coordinates, which I think is equivalent to what you are saying here.

strangerep · Nov 22, 2012

micromass said:

Here is a proof for the plane.

Thank you Micromass.
Your posts deserve to be polished and turned into a library item, so I'll mention a couple of minor typos I noticed:

[...] again a perspectivity [...]

Even though this is a synonym, I presume it should be "projectivity", since that's the word you used earlier.

Also,

[...] verticles [...]

Fredrik · Nov 23, 2012

Just out of curiosity, do people use the term "line" for curves that aren't straight? Do we really need to say "straight line" every time?

micromass · Nov 23, 2012

strangerep said:

Even though this is a synonym, I presume it should be "projectivity", since that's the word you used earlier.

Ah yes, thank you! It should indeed be projectivity.
A perspectivity is something slightly different. I don't know why I used that term...

lugita15 · Nov 23, 2012

Fredrik said:

Just out of curiosity, do people use the term "line" for curves that aren't straight? Do we really need to say "straight line" every time?

Yes, at least historically line was just used to mean any curve. I think Euclid defined a line to be a "breadthless length", and defined a straight line to be a line that "lies evenly with itself", whatever that means.

EDIT: If you're interested, you can see the definitions here.

Fredrik · Nov 24, 2012

I think I have completely understood how to prove the following theorem using the methods described in Berger's book.

If ##T:\mathbb R^2\to\mathbb R^2## is a bijection that takes lines to lines and 0 to 0, then ##T## is linear.

I have broken it up into ten parts. Most of them are very easy, but there are a few tricky ones.

Notation: If L is a line, then I will write TL instead of T(L).

If K is a line through 0, then so is TK.
If K,L are lines through 0 such that K≠L, then TK≠TL. (Note that this implies that if {x,y} is linearly independent, then so is {Tx,Ty}).
If K is parallel to L, then TK is parallel to TL.
For all x,y such that {x,y} is linearly independent, T(x+y)=Tx+Ty.
If x=0 or a=0, then T(ax)=aTx.
If x≠0 and a≠0, then there's a b such that T(ax)=bTx. (Note that this implies that for each x≠0, there's a map σ such that T(ax)=σ(a)Tx. The following steps determine the properties of σ for an arbitrary x≠0).
σ is a bijection from ℝ² into ℝ².
σ is a field homomorphism.
σ is the identity map. (Combined with 5-6, this implies that T(ax)=aTx for all a,x).
For all x,y such that {x,y} is linearly dependent, T(x+y)=Tx+Ty.

I won't explain all the details of part 8, because they require a diagram. But I will describe the idea. If you want to understand part 8 completely, you need to look at the diagrams in Berger's book.

Notation: I will denote the line through x and y by [x,y].

Since T takes lines to lines, TK is a line. Since T0=0, 0 is on TK.
Suppose that TK=TL. Let x be an arbitrary non-zero point on TK. Since x is also on TL, T^-1(x) is in both K and L. But this implies that T^-1(x)=0, which contradicts that x≠0.
If K=L, then obviously TK=TL. If K≠L, then, they are either parallel or intersect somewhere, and part 2 tells us that they don't intersect.
Let x,y be arbitrary vectors such that {x,y} is linearly independent. Part 2 tells us that {Tx,Ty} is linearly independent. Define
K=[0,x] (This is the range of ##t\mapsto tx##).
L=[0,y] (This is the range of ##t\mapsto ty##).
K'=[x+y,y] (This is the range of ##t\mapsto y+tx## so this line is parallel to K).
L'=[x+y,x] (This is the range of ##t\mapsto x+ty## so this line is parallel to L).
Since x+y is at the intersection of K' and L', T(x+y) is at the intersection of TK' and TL'. we will show that Tx+Ty is also at that intersection. Since x is on L', Tx is on TL'. Since L' is parallel to L, TL' is parallel to TL (the line spanned by {Ty}). These two results imply that TL' is the range of the map B defined by B(t)=Tx+tTy. Similarly, TK' is the range of the map C defined by C(t)=Ty+tTx. So there's a unique pair (r,s) such that T(x+y)=C(r)=B(s). The latter equality can be written as Ty+rTx=Tx+sTy. This is equivalent to (r-1)Tx+(1-s)Ty=0, and since {Tx,Ty} is linearly independent, this implies r=s=1. So T(x+y)=B(1)=Tx+Ty.
Let x be an arbitrary vector and a an arbitrary real number. If either of them is zero, we have T(ax)=0=aT(x).
Let x be non-zero but otherwise arbitrary. 0,x, and ax are all on the same line, K. So 0,x and T(ax) are on the line TK. This implies that there's a b such that T(ax)=bTx. (What we did here proves this statement when a≠0 and x≠0, and part 5 shows that it's also true when a=0 or x=0).
The map σ can be defined explicitly in the following way. Define B by B(t)=tx for all t. Define C by C(t)=tTx for all t. Let K be the range of B. Then the range of C is TK. Define ##\sigma=C^{-1}\circ T|_K\circ B##. This map is a bijection (ℝ→ℝ), since it's the composition of three bijections (ℝ→K→TK→ℝ). To see that this is the σ that was discussed in the previous step, let b be the real number such that T(ax)=bTx, and note that
$$\sigma(a)=C^{-1}\circ T|_K\circ B(a) =C^{-1}(T(B(a))) =C^{-1}(T(ax)) =C^{-1}(bTx)=b.$$
Let a,b be arbitrary real numbers. Using the diagrams in Berger's book, we can show that there are two lines K and L such that (a+b)x is at the intersection of K and L. This implies that the point at the intersection of TK and TL is T((a+b)x)=σ(a+b)Tx. Then we use the diagram (and its image under T) to argue that T(ax)+T(bx) must also be at that same intersection. This expression can be written (σ(a)+σ(b))Tx, so these results tell us that
$$(\sigma(a)+\sigma(b)-\sigma(a+b))Tx=0.$$ Since Tx≠0, this implies that σ(a+b)=σ(a)+σ(b). Then we use similar diagrams to show that σ(ab)=σ(a)σ(b), and that if a<b, then σ(a)<σ(b). (The book doesn't include a diagram for that last part, but it's easy to imagine one).
This follows from 8 and the lemma that says that the only automorphism of R is the identity.
Suppose that {x,y} is linearly dependent. Let k be the real number such that y=kx. Part 9 tells us that T(x+y)=T((1+k)x)=(1+k)Tx=Tx+kTx=Tx+T(kx)=Tx+Ty.

friend · Nov 25, 2012

This is a very interesting thread. Sorry I'm late to the conversation. I appreciate all the contributions. But I'm getting a little lost.

The question of the OP was asking about what kind of transformation keeps the following invariant:

[tex]c^2t^2 - x^2 - y^2 - z^2 = 0[/tex]
[tex]c^2t'^2 - x'^2 - y'^2 - z'^2 = 0 [/tex]

But Mentz114 in post 3 interprets this to means that the transformation preserves

-dt'2 + dx'2 = -dt2 + dx2.

And Fredrik in post 8 interprets this to mean

If Λ is linear and g(Λx,Λx)=g(x,x) for all x∈R4, then Λ is a Lorentz transformation.

And modifies this in post 9 to be

If Λ is surjective, and g(Λ(x),Λ(y))=g(x,y) for all x,y∈R4, then Λ is a Lorentz transformation.Are these all the same answer in different forms? Or is there a side question being addressed about linearity? Thank you.

Fredrik · Nov 25, 2012

friend said:

And Fredrik in post 8 interprets this to mean

If Λ is linear and g(Λx,Λx)=g(x,x) for all x∈R4, then Λ is a Lorentz transformation.

And modifies this in post 9 to be

If Λ is surjective, and g(Λ(x),Λ(y))=g(x,y) for all x,y∈R4, then Λ is a Lorentz transformation.

Those aren't interpretations of the original condition. I would interpret the OP's assumption as saying that g(Λx,Λx) for all x∈ℝ⁴ such that g(x,x)=0 (i.e. for all x on the light cone). This assumption isn't strong enough to to imply that Λ is a Lorentz transformation, so I described two similar but stronger assumptions that are strong enough. The two statements you're quoting here are theorems I can prove.

There is another approach to relativity that's been discussed in a couple of other threads recently. In this approach, the speed of light isn't mentioned at all. (Note that the g in my theorems is the Minkowski metric, so the speed of light is mentioned there). Instead, we interpret the principle of relativity as a set of mathematically precise statements, and see what we get if we take those statements as axioms. The axioms are telling us that the set of functions that change coordinates from one inertial coordinate system to another is a group, and that each of them takes straight lines to straight lines.

The problem I'm interested in is this: If space and time are represented in a theory of physics as a mathematical structure ("spacetime") with underlying set ℝ⁴, then what is the structure? When ℝ⁴ is the underlying set, it's natural to assume that those functions are defined on all of ℝ⁴. The axioms will then include the statement that those functions are bijections from ℝ⁴ into ℝ⁴. (Strangerep is considering something more general, so he is replacing this with something weaker).

The theorems we've been discussing lately tell us that a bijection ##T:\mathbb R^4\to\mathbb R^4## takes straight lines to straight lines if and only if there's an ##a\in\mathbb R^4## and a linear ##\Lambda:\mathbb R^4\to\mathbb R^4## such that ##T(x)=\Lambda x+a## for all ##x\in\mathbb R^4##. The set of inertial coordinate transformations with a=0 is a subgroup, and it has a subgroup of its own that consists of all the proper and orthochronous transformations with a=0.

What we find when we use the axioms is that this subgroup is either the group of Galilean boosts and proper and orthochronous rotations, or it's isomorphic to the restricted (i.e. proper and orthochronous) Lorentz group. In other words, we find that "spacetime" is either the spacetime of Newtonian mechanics, or the spacetime of special relativity. Those are really the only options when we take "spacetime" to be a structure with underlying set ℝ⁴.

Of course, if we had lived in 1900, we wouldn't have been very concerned with mathematical rigor in an argument like this. We would have been trying to guess the structure of spacetime in a new theory, and in that situation, there's no need to prove that theorem about straight lines. We can just say "let's see if there are any theories in which Λ is linear", and move on.

In 2012 however, I think it makes more sense to do this rigorously all the way from the axioms that we wrote down as an interpretation of the principle of relativity, because this way we know that there are no other spacetimes that are consistent with those axioms.

member 11137 · Nov 26, 2012

Fredrik said:

Of course, if we had lived in 1900, we wouldn't have been very concerned with mathematical rigor in an argument like this. We would have been trying to guess the structure of spacetime in a new theory, and in that situation, there's no need to prove that theorem about straight lines. We can just say "let's see if there are any theories in which Λ is linear", and move on.

In 2012 however, I think it makes more sense to do this rigorously all the way from the axioms that we wrote down as an interpretation of the principle of relativity, because this way we know that there are no other spacetimes that are consistent with those axioms.

OK. Thank you for all these explanations. But don't you think that the "obsession" with preservation of straight lines is entirely due to our false and old fashioned use of the definition of what an inertial observer is? What do I mean? Inertial observer is not = observer without acceleration, but = observer on which no force is acting. And this is not the same thing within a generalized theory of relativity where F = d(m. v)/dt = m. acceleration + dm/dt. speed => F = 0 is not acceleration = 0.

Fredrik · Nov 26, 2012

Those formulas do imply that ##F=0\Leftrightarrow \dot v=0##.

$$\gamma=\frac{1}{\sqrt{1-v^2}},\qquad m=\gamma m_0$$
$$\dot\gamma=-\frac{1}{2}(1-v^2)^{-\frac{3}{2}}(-2v\dot v)=\gamma^3v\dot v$$
$$\dot m=\dot\gamma m_0=\gamma^3v\dot v m_0$$
\begin{align}
F &=\frac{d}{dt}(mv)=\dot m v+m\dot v=\gamma^3v^2\dot v m_0+\gamma m_0\dot v =\gamma m_0\dot v(\gamma^2v^2+1)\\
& =\gamma m_0\dot v\left(\frac{v^2}{1-v^2}+\frac{1-v^2}{1-v^2}\right) =\gamma^3 m_0\dot v
\end{align}

A complete specification of a theory of physics must include a specification of what measuring devices to use to test the theory's predictions. In particular, a theory about space, time and motion must describe how to measure lengths. It's not enough to just describe a meter stick, because the properties of a stick will to some degree depend on what's being done to it. So the theory must also specify the ideal conditions under which the measuring devices are expected to work the best. It's going to be very hard to specify a theory without ever requiring that an accelerometer displays 0. I don't even know if can be done.

So non-accelerated motion is probably always going to be an essential part of all theories of physics. In all of our theories, motion is represented by curves in the underlying set of a structure called "spacetime". I will denote that set by M. A coordinate system is a function from a subset of M into ℝ⁴. If ##C:(a,b)\to M## is a curve in M, U is a subset of M, and ##x:U\to\mathbb R^4## is a coordinate system, then ##x\circ C## is a curve in C. So each coordinate system takes curves in spacetime to curves in ℝ⁴. If such a curve is a straight line, then the object has zero velocity in that coordinate system. If a coordinate system takes all the curves that represent non-accelerating motion to straight lines, then it assigns a constant velocity to every non-accelerating object. Those are the coordinate systems we call "inertial". There's nothing particularly old-fashioned about that.

Edit: Fixed four (language/typing/editing) mistakes in the last paragrah.

member 11137 · Nov 26, 2012

Fredrik said:

Those formulas do imply that ##F=0\Leftrightarrow \dot v=0##.

$$\gamma=\frac{1}{\sqrt{1-v^2}},\qquad m=\gamma m_0$$
$$\dot\gamma=-\frac{1}{2}(1-v^2)^{-\frac{3}{2}}(-2v\dot v)=\gamma^3v\dot v$$
$$\dot m=\dot\gamma m_0=\gamma^3v\dot v m_0$$
\begin{align}
F &=\frac{d}{dt}(mv)=\dot m v+m\dot v=\gamma^3v^2\dot v m_0+\gamma m_0\dot v =\gamma m_0\dot v(\gamma^2v^2+1)\\
& =\gamma m_0\dot v\left(\frac{v^2}{1-v^2}+\frac{1-v^2}{1-v^2}\right) =\gamma^3 m_0\dot v
\end{align}

A complete specification of a theory of physics must include a specification of what measuring devices to use to test the theory's predictions. In particular, a theory about space, time and motion must describe how to measure lengths. It's not enough to just describe a meter stick, because the properties of a stick will to some degree depend on what's being done to it. So the theory must also specify the ideal conditions under which the measuring devices are expected to work the best. It's going to be very hard to specify a theory without ever requiring that an accelerometer displays 0. I don't even know if can be done.

So non-accelerated motion is probably always going to be an essential part of all theories of physics. In all of our theories, motion is represented by curves in the underlying set of a structure called "spacetime". I will denote that set by M. A coordinate system is a function from a subset of M into ℝ⁴. If ##C:(a,b)\to M## is a curve in M, U is a subset of M, and ##x:U\to\mathbb R^4## is a coordinate system, then ##x\circ C## is a curve in C. So each coordinate systems takes curves in spacetime to curves in ℝ⁴. If such a curve is a straight line, then object has zero velocity in that coordinate system. If a coordinate system takes all the curves that represent non-accelerating motion are take to straight lines, then it assigns a constant velocity to every to non-accelerating objects. Those are the coordinate systems we call "inertial". There's nothing particularly old-fashioned about that.

Ok, well-done and -explained (thanks). But all this concerns only special relativity. Where do you see that the question asked by the OP (and recalled by friend) is imposing linearity? For me it only imposes the Christoffel's work; see the other discussion "O-S model of star collapse" post 109, Foundations of the GTR by A. Einstein and translated by Bose, [793], (25). My impression (perhaps false) is that SR is based on a coherent but circular way of thinking including "linearity" for easy understandable historical reasons. The preservation of a length element (which is the initial question here) does not impose a flat geometry. Don't you think so?

TrickyDicky · Nov 26, 2012

Fredrik said:

The problem I'm interested in is this: If space and time are represented in a theory of physics as a mathematical structure ("spacetime") with underlying set ℝ⁴, then what is the structure? When ℝ⁴ is the underlying set, it's natural to assume that those functions are defined on all of ℝ⁴.The axioms will then include the statement that those functions are bijections from ℝ4 into ℝ4

I find this confusing, if you start by assuming a spacetime structure that admits bijections from ℝ4 into ℝ4 (that is E^4 or M^4) as the underlying structure because it seems natural to you, you are already imposing linearity for the transformations that respect the relativity principle. This leaves only the two posible transformations you comment below. The second postulate of SR is what allows us to pick which of the two is the right transformation.

But if you follow this path it is completely superfluous to prove anything about mapping straight lines to straight lines to get the most general transformation that does that and once you have it restrict it to the linear ones with a plausible physical assumption, since you are already starting with linear transformations.

Fredrik said:

What we find when we use the axioms is that this subgroup is either the group of Galilean boosts and proper and orthochronous rotations, or it's isomorphic to the restricted (i.e. proper and orthochronous) Lorentz group. In other words, we find that "spacetime" is either the spacetime of Newtonian mechanics, or the spacetime of special relativity. Those are really the only options when we take "spacetime" to be a structure with underlying set ℝ⁴.

Just a minor correction the Lorentz transformations are locally isomorphic to the restricted group.

Fredrik · Nov 26, 2012

TrickyDicky said:

I find this confusing, if you start by assuming a spacetime structure that admits bijections from ℝ4 into ℝ4 (that is E^4 or M^4) as the underlying structure because it seems natural to you, you are already imposing linearity for the transformations that respect the relativity principle. This leaves only the two posible transformations you comment below.

How am I "already imposing linearity"? I'm starting with "takes straight lines to straight lines", because that is the obvious property of inertial coordinate transformations, and then I'm using the theorem to prove that (when spacetime is ℝ⁴) an inertial coordinate transformation is the composition of a linear map and a translation. I don't think linearity is obvious. It's just an algebraic condition with no obvious connection to the concept of inertial coordinate transformations.

TrickyDicky said:

The second postulate of SR is what allows us to pick which of the two is the right transformation.

Right, if we add that to our assumptions, we can eliminate the Galilean group as a possibility. But I would prefer to just say this: These are the two theories that are consistent with a) the idea that ℝ⁴ is the underlying set of "spacetime", and b) our interpretation of the principle of relativity as a set of mathematically precise statements about transformations between global inertial coordinate systems. Now that we have two theories, we can use experiments to determine which one of them makes the better predictions.

TrickyDicky said:

Just a minor correction the Lorentz transformations are locally isomorphic to the restricted group.

How is that a correction? It seems like an unrelated statement.

Showing that Lorentz transformations are the only ones possible

Similar threads

Hot Threads

Recent Insights