Math Challenge - January 2021

In summary: The solution can be seen as a corollary of the theorem about continuous functions on compact sets, but the theorem can be used at the same time to give a direct proof. It is a beautiful theorem and proof.In summary, the conversation covers topics such as linear programming, trigonometry, calculus, PDE, differential matrix equation, function theory, linear algebra, irrationality, group theory, and ring theory. The conversation also includes discussions on solving equations, calculating integrals, proving the existence of certain paths, and determining values of variables in equations. Some of the problems addressed include solving differential equations, showing the irrationality of certain numbers, and proving the existence of certain solutions using analytical and topological tools
  • #36
StoneTemplePython said:
... so I am trying to figure out a slicker approach that yes hides a lot of ugliness via a (nice?) induction.
... c'mon, one lousy rotation ...

Maybe you can wait until the 7th before posting it. Would be a nice reminiscence!
 
  • Like
Likes StoneTemplePython and sysprog
Physics news on Phys.org
  • #37
fresh_42 said:
10. Let ##f(x)=2x^5-6x+6\in \mathbb{Z}[x]##. In which of the following rings is ##f## irreducible and why?
(a) ##\mathbb{Z}[x]##
(b) ##(S^{-1}\mathbb{Z})[x]## with ##S=\{2^n\,|\,n\in \mathbb{N}_0\}##
(c) ##\mathbb{Q}[x]##
(d) ##\mathbb{R}[x]##
(e) ##\mathbb{C}[x]##
(abc). By Eisenstein's criterion, ##f(x)## is irreducible over ##\mathbb{Q}##. Proof: Choose p = 3; p divides each non-leading coefficient (-6, 6), but does not divide the leading coefficient (2). Furthermore, p^2 does not divide the constant term (6).
Since ##\mathbb{Z}## and ##S^{-1}\mathbb{Z}## are subrings of ##\mathbb{Q}##, it follows that ##f(x)## is irreducible over them as well.

(d). Compute the discriminant of ##f(x)##, for example via row reduction of the Sylvester matrix. The discriminant is then ##12^4\cdot (5^5 - 3\cdot 4^4) = 12^4 \cdot 2357##. Since the discriminant is positive, all roots are real, and ##f(x)## splits into linear factors over ##\mathbb{R}[x]##.

(e). By the fundamental theorem of algebra, every non-constant univariate polynomial with coefficients in ##\mathbb{C}## splits into linear factors over ##\mathbb{C}[x]##.
 
  • #38
suremarc said:
(abc). By Eisenstein's criterion, ##f(x)## is irreducible over ##\mathbb{Q}##. Proof: Choose p = 3; p divides each non-leading coefficient (-6, 6), but does not divide the leading coefficient (2). Furthermore, p^2 does not divide the constant term (6).
Since ##\mathbb{Z}## and ##S^{-1}\mathbb{Z}## are subrings of ##\mathbb{Q}##, it follows that ##f(x)## is irreducible over them as well.

(d). Compute the discriminant of ##f(x)##, for example via row reduction of the Sylvester matrix. The discriminant is then ##12^4\cdot (5^5 - 3\cdot 4^4) = 12^4 \cdot 2357##. Since the discriminant is positive, all roots are real, and ##f(x)## splits into linear factors over ##\mathbb{R}[x]##.

(e). By the fundamental theorem of algebra, every non-constant univariate polynomial with coefficients in ##\mathbb{C}## splits into linear factors over ##\mathbb{C}[x]##.
You have been sloppy, which made one answer wrong. (d) can be answered easier, if we only look at ##x\to \pm\infty##.
 
  • #39
fresh_42 said:
You have been sloppy, which made one answer wrong. (d) can be answered easier, if we only look at ##x\to \pm\infty##.
Is it because 2 is irreducible in ##\mathbb{Z}##, and so is technically an irreducible 0-degree "polynomial"? Now I understand why the theorems I read mentioned the non-primitive case. Oh, and yeah, I guess since ##f## is an odd-degree polynomial, it has to have one real root. D'oh.
 
  • #40
suremarc said:
Is it because 2 is irreducible in ##\mathbb{Z}##, and so is technically an irreducible 0-degree "polynomial"? Now I understand why the theorems I read mentioned the non-primitive case. Oh, and yeah, I guess since ##f## is an odd-degree polynomial, it has to have one real root. D'oh.
Yes, ##2## isn't a unit in ##\mathbb{Z}## so ##2x^5-6x+6=2(x^5-3x+3)## is a legitimate split. Eisenstein makes is irreducible over ##\mathbb{Q}## and ##S## in part (b) makes it a unit, too, but not in part (a).
 
  • Like
Likes suremarc
  • #41
Problem 1 is a strange one. One readily finds a small dimensional counterexample to first bullet point. E.g [itex]-1 x = 1[/itex] is not solvable for [itex]x\geq 0[/itex]. Am I missing something?
 
  • #42
nuuskur said:
Problem 1 is a strange one. One readily finds a small dimensional counterexample to first bullet point. E.g [itex]-1 x = 1[/itex] is not solvable for [itex]x\geq 0[/itex]. Am I missing something?
Maybe ...
fresh_42 said:
... exactly one of the following two statements is true ...
 
  • #43
fresh_42 said:
8. We define ##e=\displaystyle{\sum_{k=0}^\infty \dfrac{1}{k!}}.## Prove that ##e^2## is irrational.

Obviously the rationality of ##e^2## depends on that of ##e##: this has been done, just wanted to note that this proof was non-trivial in the sense that there exists an irrational number, say ##\sqrt{2}##, such that it's square (namely 2) is rational.
 
  • #44
My guess is [itex]\sum _{k=0}^\infty \frac{2^k}{k!} = \frac{p}{q}[/itex] leads to some kind of contradiction. I sense some technical work ahead.
 
  • Like
Likes fresh_42
  • #45
@nuuskur Might need the Cauchy product as well, ##e^2:=\left( \sum _{k=0}^\infty\frac{1}{k!}\right) \left( \sum _{j=0}^\infty\frac{1}{j!}\right) = \sum _{k=0}^\infty \sum _{j=0}^k\frac{1}{k!(k-j)!} = \frac{p}{q}##
 
  • #46
... or ##qe=pe^{-1}## ... given that the formula for ##e^{-1}## can be thought of as given, because it can easily be verified.
 
  • #47
From ##qe - pe^{-1} = 0## we can write:

\begin{align*}
q \sum_{n=0}^\infty \frac{1}{n!} - p \sum_{n=0}^\infty \frac{(-1)^n}{n!} = 0
\end{align*}

Multiply by ##r!## you get:

\begin{align*}
& q [r! + r! + 3 \cdot 4 \cdots r + \cdots + (r-1) r + r + 1] + q \left( \frac{1}{r+1} + \frac{1}{(r+1) (r+2)} + \cdots \right) - \\
& - p [r! - r! + 3 \cdot 4 \cdots r - \cdots + (-1)^{r-2} (r-1) r + (-1)^{r-1}r + (-1)^r ] + \\
& + (-1)^r p \left( \frac{1}{r+1} - \frac{1}{(r+1) (r+2)} - \cdots \right) = 0
\end{align*}

The numbers inside the square brackets are integers. Let's move the integer ##q [\cdots] - p [\cdots]## to the RHS:\begin{align*}
& q \left( \frac{1}{r+1} + \frac{1}{(r+1) (r+2)} + \cdots \right) + (-1)^r p \left( \frac{1}{r+1} - \frac{1}{(r+1) (r+2)} + \cdots \right) = \text{integer}
\end{align*}

From now on we take ##r## to be even, so:

\begin{align*}
& q \left( \frac{1}{r+1} + \frac{1}{(r+1) (r+2)} + \cdots \right) + p \left( \frac{1}{r+1} - \frac{1}{(r+1) (r+2)} + \cdots \right) = \text{integer} \quad (*)
\end{align*}

The LHS is obviously positive and so "LHS of ##(*) > 0##". But

\begin{align*}
& q \left( \frac{1}{r+1} + \frac{1}{(r+1) (r+2)} + \cdots \right) + p \left( \frac{1}{r+1} - \frac{1}{(r+1) (r+2)} + \cdots \right) \\
& < q \left( \frac{1}{r+1} + \frac{1}{(r+1)^2} + \cdots \right) + p \left( \frac{1}{r+1} + \frac{1}{(r+1)^2} + \cdots \right) \\
& = (q + p) \frac{1/(r+1)}{1 - 1/(r+1)} \\
& = (q + p) \frac{1}{r} \\
& = (q + p) \frac{1}{2m} \qquad \text{where } r = 2m
\end{align*}

Taking ##m## to be large enough we have ##0 < \text{LHS of } (*) < 1## . A contradiction.
 
Last edited:
  • Like
Likes nuuskur and fresh_42
  • #48
One readily verifies the subset [itex]O(n) \subseteq GL(n,\mathbb R)[/itex] of orthogonal matrices is a group. A matrix is orthogonal iff its column vectors are pairwise orthonormal, therefore [itex]O(n)[/itex] is bounded. The map [itex]f(A)=A^tA[/itex] is continuous, therefore [itex]O(n) = f^{-1}(E)[/itex] is closed (because singletons are closed). By Heine-Borel [itex]O(n)[/itex] is compact.

Let [itex]A[/itex] be a real [itex]n\times n[/itex] matrix. Denote
[tex]
\mathcal V(A) := \sum _{i\neq j} a_{ij}^2.
[/tex]
I.e sums of squares not on the diagonal.
Iirc, this lemma is by Jacobi, but might be mistaken.

Statement. Let [itex]A[/itex] be a real symmetric matrix. If [itex]\mathcal V(A) >0[/itex], then there exists a real orthogonal matrix [itex]B[/itex] such that [itex]\mathcal V(B^tAB)<\mathcal V(A)[/itex].

Sketch of Proof. Let [itex]a_{k\ell} \neq 0[/itex], where [itex]k\neq \ell[/itex]. Choose [itex]B[/itex] in the following manner. Start with [itex]E[/itex] and put [itex]b_{kk} = b_{\ell\ell} = \cos\alpha[/itex] and [itex]b_{k\ell} = -b_{\ell k} = \sin\alpha[/itex] for some suitable [itex]\alpha[/itex]. Clearly, [itex]B\in O(n)[/itex].

Put [itex]C:= B^tAB[/itex]. Then
[tex]
c_{k\ell} = \sum _{i,j} b_{ik}a_{ij}b_{j\ell}
[/tex]
For [itex]s,t\notin \{k,\ell\}[/itex] we have [itex]c_{st} = a_{st}[/itex]. Otherwise, for [itex]s\notin \{k,\ell\}[/itex]
[tex]
c_{sk} = a_{sk}\cos\alpha - a_{s\ell}\sin\alpha
[/tex]
and
[tex]
c_{s\ell} = a_{s\ell}\cos\alpha + a_{sk}\sin\alpha.
[/tex]
Analogously for [itex]t\notin \{k,\ell\}[/itex]. We then have
[tex]
c_{sk}^2 + c_{s\ell}^2 = a_{sk}^2 + a_{e\ell}^2\ \ (s\notin \{k,\ell\}) \quad\mbox{and}\quad c_{kt}^2 + c_{\ell t}^2 = a_{kt}^2 + a_{\ell t}^2 \ \ (t\notin \{k,\ell\}).
[/tex]
Now all that's left is to pick [itex]\alpha[/itex] such that [itex]c_{k\ell} = 0[/itex], then [itex]\mathcal V(B^tAB) = \mathcal V(A) - 2a_{k\ell}^2 < \mathcal V(A)[/itex]. Note that
[tex]
c_{k\ell} = (a_{kk} - a_{\ell\ell})\sin\alpha\cos\alpha + a_{k\ell}\cos 2\alpha.
[/tex]
By IVT [itex]c_{k\ell} = 0[/itex] for some [itex]0<\alpha<\pi/2[/itex].

Let [itex]A[/itex] be a real symmetric matrix. Consider [itex]h:O(n) \to \mathrm{Mat}_n,\quad X\mapsto X^tAX[/itex]. Since this map is continuous, the image [itex]\mathrm{im}(h)[/itex] is compact. Thus, [itex]\mathcal V : \mathrm{im}(h) \to \mathbb R[/itex] attains minimum on, say, [itex]M\in\mathrm{im}(h)[/itex].

Put [itex]M=Q^tAQ[/itex] for some [itex]Q\in O(n)[/itex]. Suppose [itex]\mathcal V(M)>0[/itex]. Then for some orthogonal matrix [itex]B[/itex]
[tex]
\mathcal V((QB)^tAQB) < \mathcal V (Q^tAQ),
[/tex]
which is impossible. So [itex]M[/itex] must be a diagonal matrix and [itex]A[/itex] is diagonalisable.
 
Last edited:
  • Like
Likes fresh_42
  • #49
nuuskur said:
One readily verifies the subset [itex]O(n) \subseteq GL(n,\mathbb R)[/itex] of orthogonal matrices is a group. A matrix is orthogonal iff its column vectors are pairwise orthonormal, therefore [itex]O(n)[/itex] is bounded. The map [itex]f(A)=A^tA[/itex] is continuous, therefore [itex]O(n) = f^{-1}(E)[/itex] is closed (because singletons are closed). By Heine-Borel [itex]O(n)[/itex] is compact.

Let [itex]A[/itex] be a real [itex]n\times n[/itex] matrix. Denote
[tex]
\mathcal V(A) := \sum _{i\neq j} a_{ij}^2.
[/tex]
I.e sums of squares not on the diagonal.
Iirc, this lemma is by Jacobi, but might be mistaken.

Statement. Let [itex]A[/itex] be a real symmetric matrix. If [itex]\mathcal V(A) >0[/itex], then there exists a real orthogonal matrix [itex]B[/itex] such that [itex]\mathcal V(B^tAB)<\mathcal V(A)[/itex].

Sketch of Proof. Let [itex]a_{k\ell} \neq 0[/itex], where [itex]k\neq \ell[/itex]. Choose [itex]B[/itex] in the following manner. Start with [itex]E[/itex] and put [itex]b_{kk} = b_{\ell\ell} = \cos\alpha[/itex] and [itex]b_{k\ell} = -b_{\ell k} = \sin\alpha[/itex] for some suitable [itex]\alpha[/itex]. Clearly, [itex]B\in O(n)[/itex].

Put [itex]C:= B^tAB[/itex]. Then
[tex]
c_{k\ell} = \sum _{i,j} b_{ik}a_{ij}b_{j\ell}
[/tex]
For [itex]s,t\notin \{k,\ell\}[/itex] we have [itex]c_{st} = a_{st}[/itex]. Otherwise, for [itex]s\notin \{k,\ell\}[/itex]
[tex]
c_{sk} = a_{sk}\cos\alpha - a_{s\ell}\sin\alpha
[/tex]
and
[tex]
c_{s\ell} = a_{s\ell}\cos\alpha + a_{sk}\sin\alpha.
[/tex]
Analogously for [itex]t\notin \{k,\ell\}[/itex]. We then have
[tex]
c_{sk}^2 + c_{s\ell}^2 = a_{sk}^2 + a_{e\ell}^2\ \ (s\notin \{k,\ell\}) \quad\mbox{and}\quad c_{kt}^2 + c_{\ell t}^2 = a_{kt}^2 + a_{\ell t}^2 \ \ (t\notin \{k,\ell\}).
[/tex]
Now all that's left is to pick [itex]\alpha[/itex] such that [itex]c_{k\ell} = 0[/itex], then [itex]\mathcal V(B^tAB) = \mathcal V(A) - 2a_{k\ell}^2 < \mathcal V(A)[/itex]. Note that
[tex]
c_{k\ell} = (a_{kk} - a_{\ell\ell})\sin\alpha\cos\alpha + a_{k\ell}\cos 2\alpha.
[/tex]
By IVT [itex]c_{k\ell} = 0[/itex] for some [itex]0<\alpha<\pi/2[/itex].

Let [itex]A[/itex] be a real symmetric matrix. Consider [itex]h:O(n) \to \mathrm{Mat}_n,\quad X\mapsto X^tAX[/itex]. Since this map is continuous, the image [itex]\mathrm{im}(h)[/itex] is compact. Thus, [itex]\mathcal V : \mathrm{im}(h) \to \mathbb R[/itex] attains minimum on, say, [itex]M\in\mathrm{im}(h)[/itex].

Put [itex]M=Q^tAQ[/itex] for some [itex]Q\in O(n)[/itex]. Suppose [itex]\mathcal V(M)>0[/itex]. Then for some orthogonal matrix [itex]B[/itex]
[tex]
\mathcal V((QB)^tAQB) < \mathcal V (Q^tAQ),
[/tex]
which is impossible. So [itex]M[/itex] must be a diagonal matrix and [itex]A[/itex] is diagonalisable.
This proof is from Herb Wilf who passed away 9 years ago that day (*). I like how its pure analytic nature proves a statement from linear algebra. It shows that proof techniques do not necessarily belong to the area the statement stems from. Another famous example is the fundamental theorem of algebra which has various proofs.

(*) Thanks for waiting, @nuuskur.
 
  • Like
Likes nuuskur
  • #50
Problem 4.
I solve the PDE$$
xu_x+yu_y+(x^2 +y^2)u_z=0
$$
by the "method of characteristics". Dividing by ##x## the PDE becomes
$$

u_x + \frac{y}{x}u_y + \frac{(x^2 + y^2)}{x}u_z=0
$$
The characteristic ODEs are,
$$
\frac{dx}{ds}=1
$$
$$
\frac{dy}{ds}=\frac{y}{x}
$$
$$
\frac{dz}{ds}=\frac{(x^2 + y^2)}{x}
$$
where ##s## is the characteristic parameter. Solving these ODEs we find,
$$
x=s
$$
$$
y=y_0x
$$
$$
z=z_0 + \frac{1}{2}(x^2 + y^2)
$$
The solution is any function of the constant curves,
$$
u(x,y,z)=F(y_0,z_0)
$$
Solving for ##y_0## and ##z_0## I find,
$$u(x,y,z)=F(\frac{y}{x},z-\frac{1}{2}(x^2 + y^2))
$$
In anticipation of the initial conditions I displace the ##z## coordinate by ##\frac{\pi}{2}## and guess the solution.
$$
u(x,y,z)=\frac{y}{x}\left ( z-\frac{\pi}{2}-\frac{1}{2}(x^2 + y^2)\right )
$$
Taking the partial derivatives I find,
$$
u_x=-\frac{y}{x^2}\left ( z-\frac{\pi}{2}-\frac{1}{2}(x^2 + y^2)\right )-y
$$
$$
u_y=\frac{1}{x}\left ( z-\frac{\pi}{2}-\frac{1}{2}(x^2 + y^2)\right )-\frac{y^2}{x}
$$
$$
u_z=\frac{y}{x}
$$
This function satisfies the initial conditions and the original PDE.
 
  • Like
Likes benorin and fresh_42
  • #51
Fred Wright said:
Problem 4.
I solve the PDE$$
xu_x+yu_y+(x^2 +y^2)u_z=0
$$
by the "method of characteristics". Dividing by ##x## the PDE becomes
$$

u_x + \frac{y}{x}u_y + \frac{(x^2 + y^2)}{x}u_z=0
$$
The characteristic ODEs are,
$$
\frac{dx}{ds}=1
$$
$$
\frac{dy}{ds}=\frac{y}{x}
$$
$$
\frac{dz}{ds}=\frac{(x^2 + y^2)}{x}
$$
where ##s## is the characteristic parameter. Solving these ODEs we find,
$$
x=s
$$
$$
y=y_0x
$$
$$
z=z_0 + \frac{1}{2}(x^2 + y^2)
$$
The solution is any function of the constant curves,
$$
u(x,y,z)=F(y_0,z_0)
$$
Solving for ##y_0## and ##z_0## I find,
$$u(x,y,z)=F(\frac{y}{x},z-\frac{1}{2}(x^2 + y^2))
$$
In anticipation of the initial conditions I displace the ##z## coordinate by ##\frac{\pi}{2}## and guess the solution.
$$
u(x,y,z)=\frac{y}{x}\left ( z-\frac{\pi}{2}-\frac{1}{2}(x^2 + y^2)\right )
$$
Taking the partial derivatives I find,
$$
u_x=-\frac{y}{x^2}\left ( z-\frac{\pi}{2}-\frac{1}{2}(x^2 + y^2)\right )-y
$$
$$
u_y=\frac{1}{x}\left ( z-\frac{\pi}{2}-\frac{1}{2}(x^2 + y^2)\right )-\frac{y^2}{x}
$$
$$
u_z=\frac{y}{x}
$$
This function satisfies the initial conditions and the original PDE.
Well, ##F## should be differentiable, not just any. We have infinitely many possible solutions and the initial values are useless. Those initial values given here corresponded to ##F(a,b)=a^2\sin(b),## but this is not a unique solution, as your solution shows.
 
  • #52
My contribution to problem 7. Lots of simple inequalities and few indices, which is more my style. The little lemmas also give some other valuable information about symmetric matrices, via very simple means.

Let ##A## be a real symmetric matrix in all cases. And ##\mathbf e_k## denotes the relevant kth standard basis vector.

lemma 1:
a ##2\times 2## real symmetric ##A## with quadratic form
##f: \mathbb R_{\big \Vert \mathbb x\big \Vert_2=1}^2 \longrightarrow \mathbb R## given by ##f\big(\mathbf x\big)=\mathbf x^T A \mathbf x## (notice the domain is the unit circle), attains a maximum, ##m_x##

proof:
##f## is continuous, applied to a compact set, so its image is compact i.e. in this case closed and bounded in ##\mathbb R##, so it attains a maximum, ##m_x##.

lemma 2:
If ##A## is a ##2\times 2##, non-diagonal matrix with (at least) one zero on the diagonal, then ##m_x \gt 0##.

proof:
WLOG suppose ##a_{1,1}=0## (if not, re-label and run the argument on ##A':= P^TAP=PAP## where ##P## is the (non-identity) type 2 elementary matrix)

##f\big(\mathbf e_1\big) = 0##
now ##m_x \leq 0\implies f\big(\mathbf x_{m_x}\big) \leq 0 \implies \sigma^2 \cdot f\big( \mathbf x_{m_x}\big) =f\big(\sigma \cdot \mathbf x_{m_x}\big) \leq 0##
thus for notational ease we relax the length one constraint on ##\mathbf x## for this lemma only (i.e. let the domain be all of ##\mathbb R^2## for this lemma only)

so consider ##\mathbf x:=\mathbf e_1 + \beta \mathbf e_2##
##f\big(\mathbf x\big) = \mathbf x^TA\mathbf x = 2\cdot a_{1,2}\cdot x_{1}\cdot x_{2} + a_{2,2}\cdot x_{2}^2 =2\cdot \beta \cdot a_{1,2}+ \beta^2 a_{2,2} = \beta\big(\beta a_{2,2} + a_{1,2}\big) ##
i.) if ##a_{1,2} \gt 0## then the RHS is necessarily positive for small enough modulus ##\beta \gt 0##
(i.e. if ##a_{2,2}=0## then any positive ##\beta## works, otherwise we have
##\beta a_{2,2} + a_{1,2}\geq a_{1,2}-\beta \big \vert a_{2,2}\big \vert\geq \frac{1}{2}a_{1,2}\gt 0##
by selecting ##\beta \in \big(0, \frac{a_{1,2}}{2 \vert a_{2,2}\vert }\big)##
ii.) similarly if ##a_{1,2} \lt 0## then the RHS is necessarily positive for small enough modulus ##\beta \lt 0##
iii.) ##a_{1,2} \neq 0## because ##A## is symmetric but not diagonal

this complete the proof of the lemma, i.e. ##m_x \gt 0##

For ##2\times 2## real symmetric ##A##, ##m_x \geq \max\big(a_{1,1},a_{2,2}\big)## with equality *iff* ##A## is diagonal.

For the first direction: if ##A## is diagonal, then
##m_x=f\big(\mathbf x_{m_x}\big) = a_{1,1}\cdot x_{1}^2 + a_{2,2}\cdot x_{2}^2\leq \max\big(a_{1,1},a_{2,2}\big)\cdot \big(x_1^2 + x_2^2)= \max\big(a_{1,1},a_{2,2}\big)##
and ##\max\Big(f\big(\mathbf e_{1}\big), f\big(\mathbf e_{2}\big)\Big) =\max\big(a_{1,1},a_{2,2}\big)\leq m_x ## so this is met with equality.

For the other direction, suppose ##A## is not diagonal. Then ##B:=A-\max\big(a_{1,1},a_{2,2}\big)\cdot I##
is symmetric ##2\times 2## but not diagonal, with a zero on the diagonal. Applying the prior lemma to ##B## tells us that the maximal quadratic form is positive. i.e. there must exist (length one) ##\mathbf x## where

##0 \lt \mathbf x^T \Big(A-\max\big(a_{1,1},a_{2,2}\big)\cdot I\Big)\mathbf x = f\big(\mathbf x\big) - \max\big(a_{1,1},a_{2,2}\big)\mathbf x^T \mathbf x = f\big(\mathbf x\big)-\max\big(a_{1,1},a_{2,2}\big) \leq m_x -\max\big(a_{1,1},a_{2,2}\big)##
##\implies \max\big(a_{1,1},a_{2,2}\big) \lt m_x##
(where ##m_x## refers to maximum value of the quadratic form with respect to A, not B, i.e ##\mathbf x^T A \mathbf x##)

**now assume** ##A## is ##n\times n##
##f\big(\mathbf x\big)=\mathbf x^T A \mathbf x## (again with the constraint of ##\Big \Vert \mathbf x\big \Vert_2 = 1##) attains a maximum ##m_x## again for reasons of applying a continuous function to a compact domain, with the real line as the codomain. Define orthogonal ##Q_1## to have its first col vector ##\mathbf x_{m_x}## (i.e. some unit length vector such that ##f\big(\mathbf x_{m_x}\big) = m_x## and extend to an orthonormal basis.)

##B:= Q_1^T A Q_1 = \begin{bmatrix}
m_x & *^T\\
* & A_{n-1\times n-1}\\
\end{bmatrix}##
note: ##B## is symmetric, so is ##A_{n-1\times n-1}## and we claim ##*=\mathbf 0_{n-1}##. Suppose this is not the case and there is ##b_{i,1}=\gamma \neq 0## for some ##i\in \big\{2,3,...,n\big\}##. Let ##P## be the elementary type 2 matrix that swaps element i with element 2 (which is a symmetric permutation matrix).

Then ##P^TBP## has a leading ##2\times 2## principal submatrix given by
##\begin{bmatrix}
m_x & \gamma\\
\gamma & b_{i,i}\\ \end{bmatrix} ##

By the lemma 3, since this ##2 \times 2## symmetric matrix is not diagonal there is some maximizing ##\mathbf x## on the unit circle such that ##\eta =\mathbf x^T\begin{bmatrix}
m_x & \gamma\\
\gamma & *\\\end{bmatrix} \mathbf x \gt \max\big(b_{1,1},b_{i,i}\big)= \max\big(m_x,*\big)\geq m_x##

Finally, define
##\mathbf v :=Q_1P\begin{bmatrix}
\mathbf x \\
\mathbf 0_{n-2}\\\end{bmatrix} ##, noting that ##\Big \Vert \mathbf v\Big \Vert_2 = 1## because ##Q_1## and ##P## are orthogonal

thus ##\mathbf v^T A\mathbf v = \eta \gt m_x## which is a contradiction.

Thus we've proven
##B= Q_1^T A Q_1 = \begin{bmatrix}
m_x & \mathbf 0_{n-1}^T\\
\mathbf 0_{n-1} & A_{n-1\times n-1}\\
\end{bmatrix}## and by inductive hypothesis (with the trivial base case of ##A_{n-1\times n-1}## being ##1\times 1##) there is an orthogonal matrix ##Q_{n-1\times n-1}## such that ##Q_{n-1\times n-1}^T A_{n-1\times n-1} Q_{n-1\times n-1} = D_{n-1}## so
##Q_2:= \begin{bmatrix}
1 & \mathbf 0_{n-1}^T\\
\mathbf 0_{n-1} & Q_{n-1\times n-1}\\
\end{bmatrix}##

##Q:= Q_1Q_2## and ##Q^TAQ = \begin{bmatrix}
m_x & \mathbf 0_{n-1}^T\\
\mathbf 0_{n-1} & D_{n-1\times n-1}\\
\end{bmatrix}=D## as desired
 
  • Informative
  • Like
Likes nuuskur and fresh_42
  • #53
Just one question about
StoneTemplePython said:
lemma 2:
If ##A## is a ##2\times 2##, non-diagonal matrix with (at least) one zero on the diagonal, then ##m_x \gt 0##.

proof:
WLOG suppose ##a_{1,1}=0## (if not, re-label and run the argument on ##A':= P^TAP=PAP## where ##P## is the (non-identity) type 2 elementary matrix)

##f\big(\mathbf e_1\big) = 0##
Why isn't the proof done here? If we have a point with ##f(\mathbf{e_1})=0,## isn't the maximum ##f(\mathbf{x}_{m_x})## automatically non-negative? The case ##f(\mathbf{x}_{m_x})=0## is easy to exclude.
 
  • #54
fresh_42 said:
Just one question about

Why isn't the proof done here? If we have a point with ##f(\mathbf{e_1})=0,## isn't the maximum ##f(\mathbf{x}_{m_x})## automatically non-negative? The case ##f(\mathbf{x}_{m_x})=0## is easy to exclude.
Agreed that non-negativity is immediate. I wanted to exclude ##f(\mathbf{x}_{m_x})=0## in a methodical way (without using eigenvectors, bilinear forms, etc.). How would you do it?
 
  • #55
StoneTemplePython said:
Agreed that non-negativity is immediate. I wanted to exclude ##f(\mathbf{x}_{m_x})=0## in a methodical way (without using eigenvectors, bilinear forms, etc.). How would you do it?
If ##a_{22}\neq 0## then ##\mathbf{x}=(0,1)## is positive, and if ##a_{22}=0 \wedge a_{12}\neq 0## then ##\mathbf{x}=(a_{12}^{-1},1)## does it, and if ##a_{22}=0 \wedge a_{12}= 0## then ##A=0## in which case there is nothing to do. But you do not need to make the cases, it can already be seen from the equation for ##f(\mathbf{x})##, even without extending the domain.

Would Liouville work? O.k. it is shooting sparrows with canons, but ...
 
  • #56
fresh_42 said:
If ##a_{22}\neq 0## then ##\mathbf{x}=(0,1)## is positive...
I was thinking of ##a_{2,2} \lt 0## so ##f(\mathbf e_2)\lt 0##...

the fact that
##\beta\big(\beta a_{2,2} + a_{1,2}\big) ##
equals 0 with ##\beta =0## and the sign changes in some small ball around zero had nice analytic feel to it since it basically becomes ##\beta\cdot a_{1,2}## in that neighborhood.

fresh_42 said:
Would Liouville work? O.k. it is shooting sparrows with canons, but ...

I originally had a mind to lift something from complex analysis since ##\mathbb R^2## is a bit like ##\mathbb C## though I didn't see holomorphicity. There maybe be some cannons that can be used though,
 
  • #57
I'm tired and want to turn in. Here's a go at Problem 3:

We are given:

\begin{align*}
z(t) \leq C + L \left| \int_{t_0}^t z (t_1) dt_1 \right| .
\end{align*}

Note that because ##z(t)## is a non-negative function on ##[a,b]##, and since the above inequality is required to hold for all ##t \in [a,b]##, this is why we it is required that ##C \geq 0##.

First assume that ##t \geq t_0##, so we can write:

\begin{align*}
z(t) \leq C + L \int_{t_0}^t z (t_1) dt_1 .
\end{align*}Since ##L \geq 0## we may substitute the inequality into the RHS which gives a first iteration:

\begin{align*}
z(t) & \leq C+ L \int_{t_0}^t \left( C + L \int_{t_0}^{t_1} z (t_2) dt_2 \right) dt_1 \\
& = C + C L \int_{t_0}^t + L^2 \int_{t_0}^t \int_{t_0}^{t_1} z (t_2) dt_2 dt_1
\end{align*}

A second iteration gives:

\begin{align*}
z(t) & \leq C + C L \int_{t_0}^t + L^2 \int_{t_0}^t \int_{t_0}^{t_1} \left( C + L \int_{t_0}^{t_2} z (t_3) dt_3 \right) dt_2 dt_1 \\
& = C + C L \int_{t_0}^t + C L^2 \int_{t_0}^t \int_{t_0}^{t_1} dt_2 dt_1 + L^3 \int_{t_0}^t \int_{t_0}^{t_1} \int_{t_0}^{t_2} z (t_3) dt_3 dt_2 dt_1
\end{align*}

An ##n##th iteration gives

\begin{align*}
z(t) & \leq C + C \sum_{i=1}^n A^{(i)} + L^{n+1} \int_{t_0}^t \int_{t_0}^{t_1} \int_{t_0}^{t_2} \cdots \int_{t_0}^{t_n} z (t_{n+1}) dt_{n+1} dt_n dt_{n-1} \dots dt_1 \quad (*)
\end{align*}

where

\begin{align*}
A^{(i)} = L^i \int_{t_0}^t \int_{t_0}^{t_1} \cdots \int_{t_0}^{t_{i-1}} dt_{i} dt_{i-1} \cdots dt_1
\end{align*}

By construction, we have

\begin{align*}
t \leq t_n \leq t_{n-1} \leq \cdots \leq t_1 \leq t .
\end{align*}As ##z(t)## is non-negative real valued continuous function over the closed interval ##[a,b]## it attains its maximum value, denoted by ##z_\max##. We deal with the last term in (*):

\begin{align*}
& L^{n+1} \int_{t_0}^t \int_{t_0}^{t_1} \int_{t_0}^{t_2} \cdots \int_{t_0}^{t_n} z (t_{n+1}) dt_{n+1} dt_n dt_{n-1} \dots dt_1 \\
& \leq z_\max L^{n+1} \int_{t_0}^t \int_{t_0}^{t_1} \int_{t_0}^{t_2} \cdots \int_{t_0}^{t_n} dt_{n+1} dt_n dt_{n-1} \dots dt_1 \\
& = \frac{z_\max L^{n+1}}{(n+1)!} \int_{t_0}^t \int_{t_0}^t \int_{t_0}^t \cdots \int_{t_0}^t dt_{n+1} dt_n dt_{n-1} \dots dt_1 \\
& = \frac{z_\max L^{n+1}}{(n+1)!} (t - t_0)^{n+1} \\
& \leq \frac{z_\max L^{n+1}}{n!} (b - a)^{n+1}
\end{align*}

This tends to zero as ##n \rightarrow \infty##: let ##m## be the largest integer less than or equal to ##2x## and let ##n > m## then

\begin{align*}
\frac{x^n}{n!} & = \frac{x^m}{m!} \frac{x}{m+1} \frac{x}{m+2} \cdots \frac{x}{n} \\
& \leq \frac{x^m}{m!} \left( \frac{1}{2} \right)^{n - m}
\end{align*}

Since

\begin{align*}
\lim_{n \rightarrow \infty} \frac{x^m}{m!} \left( \frac{1}{2} \right)^{n - m}
\end{align*}

we have

\begin{align*}
\lim_{n \rightarrow \infty} \frac{x^n}{n!} = 0 .
\end{align*}

So as ##n \rightarrow \infty##

\begin{align*}
z(t) & \leq C + C \sum_{i=1}^\infty A^{(i)}
\end{align*}

where

\begin{align*}
A^{(i)} & = \frac{1}{i!} L^i \int_{t_0}^t \int_{t_0}^t \cdots \int_{t_0}^t dt_{i} dt_{i-1} \cdots dt_1 \\
& = \frac{1}{i!} L^i (t - t_0)^i
\end{align*}

Finally we have

\begin{align*}
z(t) & \leq C + C \sum_{i=1}^\infty A^{(i)} \\
& = C + C \sum_{i=1}^\infty \frac{1}{i!} L^i (t - t_0)^i \\
& = C e^{L (t - t_0)}
\end{align*}Now assume that ##t_0 > t##, then

\begin{align*}
z(t) \leq C+ L \left| \int_{t_0}^t z (t_1) dt_1 \right| = C + L \int_t^{t_0} z (t_1) dt_1
\end{align*}

and giving the same argument again we obtain

\begin{align*}
z(t) \leq C e^{L (t_0 - t)} = C e^{L |t - t_0|} .
\end{align*}
 
Last edited:
  • Like
Likes benorin and fresh_42
  • #58
julian said:
I'm tired and want to turn in. Here's a go at Problem 3:

We are given:

\begin{align*}
z(t) \leq C + L \left| \int_{t_0}^t z (t_1) dt_1 \right| .
\end{align*}

Note that because ##z(t)## is a non-negative function on ##[a,b]##, and since the above inequality is required to hold for all ##t \in [a,b]##, this is why we it is required that ##C \geq 0##.

First assume that ##t \geq t_0##, so we can write:

\begin{align*}
z(t) \leq C + L \int_{t_0}^t z (t_1) dt_1 .
\end{align*}Since ##L \geq 0## we may substitute the inequality into the RHS which gives a first iteration:

\begin{align*}
z(t) & \leq C+ L \int_{t_0}^t \left( C + L \int_{t_0}^{t_1} z (t_2) dt_2 \right) dt_1 \\
& = C + C L \int_{t_0}^t + L^2 \int_{t_0}^t \int_{t_0}^{t_1} z (t_2) dt_2 dt_1
\end{align*}

A second iteration gives:

\begin{align*}
z(t) & \leq C + C L \int_{t_0}^t + L^2 \int_{t_0}^t \int_{t_0}^{t_1} \left( C + L \int_{t_0}^{t_2} z (t_3) dt_3 \right) dt_2 dt_1 \\
& = C + C L \int_{t_0}^t + C L^2 \int_{t_0}^t \int_{t_0}^{t_1} dt_2 dt_1 + L^3 \int_{t_0}^t \int_{t_0}^{t_1} \int_{t_0}^{t_2} z (t_3) dt_3 dt_2 dt_1
\end{align*}

An ##n##th iteration gives

\begin{align*}
z(t) & \leq C + C \sum_{i=1}^n A^{(i)} + L^{n+1} \int_{t_0}^t \int_{t_0}^{t_1} \int_{t_0}^{t_2} \cdots \int_{t_0}^{t_n} z (t_{n+1}) dt_{n+1} dt_n dt_{n-1} \dots dt_1 \quad (*)
\end{align*}

where

\begin{align*}
A^{(i)} = L^i \int_{t_0}^t \int_{t_0}^{t_1} \cdots \int_{t_0}^{t_{i-1}} dt_{i} dt_{i-1} \cdots dt_1
\end{align*}

By construction, we have

\begin{align*}
t \leq t_n \leq t_{n-1} \leq \cdots \leq t_1 \leq t .
\end{align*}As ##z(t)## is real valued continuous function over the closed interval ##[a,b]## it attains its maximum value, denoted by ##z_\max##. We deal with the last term in (*):

\begin{align*}
& L^{n+1} \int_{t_0}^t \int_{t_0}^{t_1} \int_{t_0}^{t_2} \cdots \int_{t_0}^{t_n} z (t_{n+1}) dt_{n+1} dt_n dt_{n-1} \dots dt_1 \\
& < z_\max L^{n+1} \int_{t_0}^t \int_{t_0}^{t_1} \int_{t_0}^{t_2} \cdots \int_{t_0}^{t_n} dt_{n+1} dt_n dt_{n-1} \dots dt_1 \\
& = \frac{z_\max L^{n+1}}{n!} \int_{t_0}^t \int_{t_0}^t \int_{t_0}^t \cdots \int_{t_0}^t dt_{n+1} dt_n dt_{n-1} \dots dt_1 \\
& = \frac{z_\max L^{n+1}}{n!} (t - t_0)^n \\
& \leq \frac{z_\max L^{n+1}}{n!} (b - a)^n
\end{align*}
The first ##<## should be ##\leq##.
This tends to zero as ##n \rightarrow \infty##: let ##m## be the largest integer less than or equal to ##2x## and let ##n > m## then

\begin{align*}
\frac{x^n}{n!} & = \frac{x^m}{m!} \frac{x}{m+1} \frac{x}{m+2} \cdots \frac{x}{n} \\
& \leq \frac{x^m}{m!} \left( \frac{1}{2} \right)^{n - m}
\end{align*}

Since

\begin{align*}
\lim_{n \rightarrow \infty} \frac{x^m}{m!} \left( \frac{1}{2} \right)^{n - m}
\end{align*}

we have

\begin{align*}
\lim_{n \rightarrow \infty} \frac{x^n}{n!} = 0 .
\end{align*}

So as ##n \rightarrow \infty##

\begin{align*}
z(t) & \leq C + C \sum_{i=1}^\infty A^{(i)}
\end{align*}

where

\begin{align*}
A^{(i)} & = \frac{1}{i!} L^i \int_{t_0}^t \int_{t_0}^t \cdots \int_{t_0}^t dt_{i} dt_{i-1} \cdots dt_1 \\
& = \frac{1}{i!} L^i (t - t_0)^i
\end{align*}
A remark that this follows from the convergence of the Taylor series of the exponential function would had been sufficient. Instead you should have proven the iterated integral which you used twice, or at least mentioned that it is the volume formula for parallelepipeds.
Finally we have

\begin{align*}
z(t) & \leq C + C \sum_{i=1}^\infty A^{(i)} \\
& = C + C \sum_{i=1}^\infty \frac{1}{i!} L^i (t - t_0)^i \\
& = C e^{L (t - t_0)}
\end{align*}Now assume that ##t_0 > t##, then

\begin{align*}
z(t) \leq C+ L \left| \int_{t_0}^t z (t_1) dt_1 \right| = C + L \int_t^{t_0} z (t_1) dt_1
\end{align*}

and giving the same argument again we obtain

\begin{align*}
z(t) \leq C e^{L (t_0 - t)} = C e^{L |t - t_0|} .
\end{align*}
Well done. You could have avoided the entire iteration, if you had used a function ##F## and considered ##\dfrac{F'}{F}\leq L##. This is as good as the exponential function and shifts the iteration into the derivative.

The statement is called Grönwall Lemma.
 
Last edited:
  • #59
fresh_42 said:
Imagine a thread with this topic placed in our Topology and Analysis forum rather than in the Linear And Abstract Algebra forum where it perfectly fits without being moved. The first sentences could be:
The orthogonal real matrices build a compact subset of the Euclidean real space, because ... Now we define a continuous function ...

The present formulation of Problem 5 makes it look like the path ##x## is also prescribed, while I believe you want the solver to prove that such a path ##x## exists and is unique for a prescribed ##A## and a prescribed initial condition. This (and the typo that was already corrected) was what I hinted at.
 
  • #60
S.G. Janssens said:
The present formulation of Problem 5 makes it look like the path ##x## is also prescribed, while I believe you want the solver to prove that such a path ##x## exists and is unique for a prescribed ##A## and a prescribed initial condition. This (and the typo that was already corrected) was what I hinted at.
Yes.

I think that quotation and problem reference (#5) do not match here:
https://www.physicsforums.com/threads/math-challenge-january-2021.997970/#post-6438289
which was my mistake.

The problem means: Solve the differential equation system ##\dot x=Ax## and show that ##x(0)=x_0## makes the solution unique. (I added this to the problem, thanks.)
 
  • #61
fresh_42 said:
1. Let ##A\in \mathbb{M}_{m,n}(\mathbb{R})## and ##b\in \mathbb{R}^m##. Then exactly one of the following two statements is true:
  • ##Ax=b\, , \,x\geq 0 \,,## is solvable for a ##x\in \mathbb{R}^n.##
  • ##A^\tau y\leq 0\, , \,b^\tau y> 0\,,## is solvable for some ##y\in \mathbb{R}^m.##
The ordering is meant componentwise.
Lemma 1. ##\mathbf{x\geq 0}\iff (\forall \mathbf{y\geq 0})\,\,\mathbf{x^\top y}\geq 0.##
The latter implies the former since one can choose ##\mathbf{y=e}_i##.
Corollary. ##\; \mathbf{x\leq 0}\iff (\forall \mathbf{y\geq 0})\,\,\mathbf{x^\top y}\leq 0.##
This follows by substituting ##\mathbf{-x}## for ##\mathbf{x}##.

Lemma 2. ##(\forall \mathbf{x}\in\mathbb{R}^n)\,\, A\mathbf{x=b}\iff(\exists\mathbf{w}\in\mathbb{R}^n)\,\,\mathbf{x}=A^+\mathbf{b}+(I-A^+A)\mathbf{w}.##
##A^+## denotes the (possibly nonunique) Moore-Penrose pseudo-inverse of ##A##.

Lemma 3. The map ##\mathbf{v}^\top:V\rightarrow \mathbb{R}## for a nonzero vector ##\mathbf{v}## is surjective.

We handle the case ##\mathbf{b\neq 0}##, as the case for ##\mathbf{b=0}## can be verified trivially.

We have the two statements $$\begin{align} & \quad(\exists\mathbf{x\geq 0}\in\mathbb{R}^n)\,\,A\mathbf{x}=\mathbf{b} \\ & \quad (\exists\mathbf{y}\in\mathbb{R}^m)\,\, A^\top\mathbf{y\leq 0},\,\mathbf{b}^\top\mathbf{y}>0\end{align}$$ To prove that exactly one of these two statements is true, we prove that the negation of ##(2)## is equivalent to ##(1)##. The negation of ##(2)##, which we will call ##\neg(2)##, has the following expression: $$(\forall\mathbf{y}\in\mathbb{R}^m)\,\, A^\top\mathbf{y\leq 0}\Rightarrow \mathbf{b^\top y}\leq 0.$$ We proceed by proving ##\neg(2)\Rightarrow(1)## and ##(1)\Rightarrow\neg(2)##:
First, we will rewrite ##\neg(2)## in terms of preimages of sets: $$(A^\top)^{-1}(\mathbb{R}^n_{\leq 0})\subseteq(\mathbf{b}^\top)^{-1}(\mathbb{R}_{\leq0})$$ By Lemma 2, the preimage of any set under ##A^\top## splits thus: $$(A^\top)^{-1}(\mathbb{R}^n_{>0})=A^\top(\mathbb{R}^n_{>0})\oplus\mathrm{Im}\big(I-(A^+)^\top A^\top\big)$$ Since ##\mathbf{b}^\top## is surjective, it follows that ##(\mathbf{b}^\top\circ(\mathbf{b}^\top)^{-1})(S)=S##. Applying ##\mathbf{b^\top}## to both sides, we get $$(A\mathbf{b})^\top(\mathbb{R}^n_{\leq 0})+((I-AA^+)\mathbf{b})^\top(\mathbb{R})\subseteq\mathbb{R}_{\leq 0}$$ where ##+## here denotes the sumset or Minkowski sum. If ##((I-AA^+)\mathbf{b})^\top## is nonzero, then it is surjective by Lemma 3, which would imply that ##\mathbb{R}\subseteq\mathbb{R}_{\leq 0}##. Hence ##(I-AA^+)\mathbf{b}## is zero, and one has $$(A^+\mathbf{b})^\top(\mathbb{R}^n_{\leq 0})\subseteq\mathbb{R}_{\leq 0}.$$ This is equivalent to $$(A^+\mathbf{b})^\top(-1\cdot\mathbb{R}^n_{\geq 0})\subseteq-1\cdot\mathbb{R}_{\geq 0}$$ and by linearity we have $$(A^+\mathbf{b})^\top(\mathbb{R}^n_{\geq 0})\subseteq\mathbb{R}_{\geq 0}$$ in other words, ##A^+\mathbf{b}\geq\mathbf{0}## (Lemma 1). Moreover, since ##(I-AA^+)\mathbf{b}## is zero, or ##\mathbf{b}=AA^+\mathbf{b}##, it follows that if ##\mathbf{x}=A^+\mathbf{b}##, then ##A\mathbf{x}=AA^+\mathbf{b=b}.## We have proven that ##\neg(2)\Rightarrow(1).##

The proof in the other direction is much simpler. Suppose ##(\exists\mathbf{x}\geq 0\in\mathbb{R}^n)\,\,A\mathbf{x=b}.## By Lemma 1, it follows that if ##A^\top\mathbf{y\leq 0}##, then ##\mathbf{b^\top y}=\mathbf{x}^\top A^\top\mathbf{y}\leq 0.## We have proved that ##(1)\Rightarrow\neg(2)##. ##\blacksquare##
 
  • #62
suremarc said:
Lemma 2. ##(\forall \mathbf{x}\in\mathbb{R}^n)\,\, A\mathbf{x=b}\iff(\exists\mathbf{w}\in\mathbb{R}^n)\,\,\mathbf{x}=A^+\mathbf{b}+(I-A^+A)\mathbf{w}.##
##A^+## denotes the (possibly nonunique) Moore-Penrose pseudo-inverse of ##A##.
Do you have a reference for this? I'm not familiar with the Moore-Penrose pseudo-inverse.

And most important: why can you switch from preimages to the pseudo-inverse? And why is ##AA^\top(\mathbf{b})=\mathbf{b}\,?## You basically claim that ##A## is invertible on the subspace we need.
Lemma 3. The map ##\mathbf{v}^\top:V\rightarrow \mathbb{R}## for a nonzero vector ##\mathbf{v}## is surjective.
I assume you mean the map ##\mathbf{v}^\top(\mathbf{u})= \mathbf{v}^\tau\cdot \mathbf{u}##.

Edit: One of my solutions uses a hyperplane (strict separation theorem) and yours looks to me as a complicated version of it.
 
Last edited:
  • #63
fresh_42 said:
Do you have a reference for this? I'm not familiar with the Moore-Penrose pseudo-inverse.
The Wikipedia page, under Applications —> Obtaining all solutions of a linear system. ##A^+## essentially projects onto the image of ##A## and inverts it there.

Actually I think I have committed some hand-waving now that I look at my proof again, specifically in the expression for the preimage ##(A^\top)^{-1}(S)##..
 
  • #64
fresh_42 said:
And why is ##AA^\top(\mathbf{b}=\mathbf{b}##? You basically claim that ##A## is invertible on the subspace we need.
##AA^\top\mathbf{b}\neq\mathbf{b}##, but my claim was that ##AA^+\mathbf{b}=\mathbf{b}##.
 
  • #65
suremarc said:
##AA^\top\mathbf{b}\neq\mathbf{b}##, but my claim was that ##AA^+\mathbf{b}=\mathbf{b}##.
Sorry, was a typo. Why is it? It is no real inverse, so why is it for ##\mathbf{b}\,?##.
My proof goes ##\lnot (1) \Longrightarrow (2)##.
 
  • #66
fresh_42 said:
Sorry, was a typo. Why is it? It is no real inverse, so why is it for ##\mathbf{b}\,?##.
My proof goes ##\lnot (1) \Longrightarrow (2)##.
Well, I claim that ##(A^+\mathbf{b})^\top(\mathbf{R}^n_{\leq 0})+((I-AA^+)\mathbf{b})^\top(\mathbb{R}^m)\subseteq\mathbb{R}_{\leq 0}.## If ##((I-AA^+)\mathbf{b})\neq 0## then the second term is all of ##\mathbb{R}## since (in this case) nonzero linear functionals are surjective. Thus the sum of that set and the first term would also be all of ##\mathbb{R}##, but my original claim was that this is a subset of ##\mathbb{R}_{\leq 0}##, so this leads to a contradiction. Ergo, ##((I-AA^+)\mathbf{b})## must be zero.

edit: made a typo, it’s ##((I-AA^+)\mathbf{b})^\top(\mathbb{R}^m)##, not ##((I-AA^+)\mathbf{b})^\top(\mathbb{R})##
 
  • #67
suremarc said:
edit: made a typo, it’s ##((I-AA^+)\mathbf{b})^\top(\mathbb{R}^m)##, not ##((I-AA^+)\mathbf{b})^\top(\mathbb{R})##
And I think another one in the second equation where ##>## should be ##\leq##, I think.
 
  • #68
fresh_42 said:
And I think another one in the second equation where ##>## should be ##\leq##, I think.
Ah, yeah, you’re right.
I’m going to redo that section of the proof anyway, I don’t think the switch from preimage to pseudoinverse was justified now that I’m trying to work it out rigorously. D’oh.
 
  • #69
I think it is too complicated. A simple hyperplane would do (##\mathbb{R}_{\geq 0}^m## is a cone!), or the strong duality theorem. Your proof mimics the separation by a hyperplane I think.
 
  • Like
Likes suremarc
  • #70
I applied the hyperplans separation theorem, and things were pretty simple, surprisingly.

The negation of ##(1)## is ##(\forall x\geq 0\in\mathbb{R}^n)\,\,A\mathbf{x=b},## or in terms of sets, $$\mathbf{b}\notin A(\mathbb{R}^n_{\geq 0}).$$ ##A(\mathbb{R}^n_{\geq 0})## inherits the form of a closed cone from ##\mathbb{R}^n_{\geq 0}## due to the invariance of the property ##(\forall\mathbf{u,v}\in C)(\forall\alpha,\beta\in\mathbb{R}_{\geq 0}) \,\,\mathbf{\alpha u+\beta v}\in C## for a cone ##C##. We apply the hyperplane separation theorem for the convex sets ##{\mathbf{b}}## and ##A(\mathbb{R}^n_{\geq 0})##: $$(\exists\mathbf{y}\in\mathbb{R}^m-\{\mathbf{0}\})\,\,\mathbf{y^\top b}\notin\mathbf{y}^\top A(\mathbb{R}^n_{\geq 0})$$
Because ##\mathbb{R}^n_{\geq 0}## is a cone, ##\mathbf{y}^\top A(\mathbb{R}^n_{\geq 0})## is also a cone, which means that it is either ##\mathbb{R}_{\geq 0}##, ##\mathbb{R}_{\leq 0}##, or the zero cone. If it is ##\mathbb{R}_{\geq 0}##, replace ##\mathbf{y’=-y}## to get some ##\mathbf{y’}## for which ##\mathbf{y’^\top} A(\mathbb{R}^n_{\geq 0})\subseteq\mathbb{R}_{\leq 0}##. This implies that ##\mathbf{y’^\top b}\notin\mathbb{R}_{\leq 0}##: in other words, ##\mathbf{b^\top y’}>0## (one half of ##(2)##). Finally, note that ##\mathbf{y’}^\top A(\mathbb{R}^n_{\geq 0})=(A^\top\mathbf{y’})^\top(\mathbb{R}^n_{\geq 0})\subseteq\mathbb{R}_{\leq 0}##, which implies that ##A^\top\mathbf{y’\leq 0}.## We have proven ##(2)## from ##\neg(1).##

To prove the other direction, observe that ##(\exists\mathbf{y}\in\mathbb{R}^m)\,\,A^\top\mathbf{y}\leq 0\land\mathbf{b^\top y}>0## implies the following: $$(\exists\mathbf{y}\in\mathbb{R}^m)(\forall\mathbf{x}\in\mathbb{R}^n:\,A\mathbf{x=b})\,\,\mathbf{y}^\top A\mathbf{x}=\mathbf{b^\top y}=\mathbf{x}^\top(A^\top\mathbf{y})>0,$$ implying that ##\mathbf{x}\not\geq 0## since ##A^\top\mathbf{y}\leq 0## but ##x^\top(A^\top\mathbf{y})>0##.
 
  • Like
Likes fresh_42

Similar threads

2
Replies
61
Views
7K
2
Replies
61
Views
9K
3
Replies
93
Views
12K
2
Replies
42
Views
8K
2
Replies
60
Views
9K
4
Replies
114
Views
8K
2
Replies
67
Views
9K
2
Replies
56
Views
8K
3
Replies
100
Views
9K
3
Replies
102
Views
8K
Back
Top