- #1
Dethrone
- 717
- 0
Hi,
I guess this could be a rather silly question, but I got a bit confused about the "numerator layout notation" and "denominator layout notation" when working with matrix differentiation: ="http://https://en.wikipedia.org/w...a.org/wiki/Matrix_calculus#Layout_conventions
It says that with the denominator layout notation, we interpret differentiation of a scalar with respect to a vector as such: $\frac{\mathrm{d}L}{\mathrm{d}w_1}=[\frac{\mathrm{d}L}{\mathrm{d}w_{11}}\frac{\mathrm{d}L}{\mathrm{d}w_{12}} ... \frac{\mathrm{d}L}{\mathrm{d}w_{1n}}]^T$, $L$ a scalar and $w_1$ an $n$ x $1$ vector.
But what if we represent the scalar $L$ differently? e.g $L=w^Tx$, where $w$, $x \in \Bbb{R}^{n \times1}$.
Then we get $\frac{\mathrm{d}L}{\mathrm{d}w}=\frac{\mathrm{d}(w^Tx)}{\mathrm{d}w}=\frac{\mathrm{d}(x^Tw)}{\mathrm{d}w}=x^T$, which is a $1$ by $n$ vector. Doesn't this result disagree with the denominator layout notation? I read somewhere on the wiki that says one should stick to one type of notation, but if certain types of calculations favors one type of notation over the other, wouldn't that be problematic or confusing?
I guess this could be a rather silly question, but I got a bit confused about the "numerator layout notation" and "denominator layout notation" when working with matrix differentiation: ="http://https://en.wikipedia.org/w...a.org/wiki/Matrix_calculus#Layout_conventions
It says that with the denominator layout notation, we interpret differentiation of a scalar with respect to a vector as such: $\frac{\mathrm{d}L}{\mathrm{d}w_1}=[\frac{\mathrm{d}L}{\mathrm{d}w_{11}}\frac{\mathrm{d}L}{\mathrm{d}w_{12}} ... \frac{\mathrm{d}L}{\mathrm{d}w_{1n}}]^T$, $L$ a scalar and $w_1$ an $n$ x $1$ vector.
But what if we represent the scalar $L$ differently? e.g $L=w^Tx$, where $w$, $x \in \Bbb{R}^{n \times1}$.
Then we get $\frac{\mathrm{d}L}{\mathrm{d}w}=\frac{\mathrm{d}(w^Tx)}{\mathrm{d}w}=\frac{\mathrm{d}(x^Tw)}{\mathrm{d}w}=x^T$, which is a $1$ by $n$ vector. Doesn't this result disagree with the denominator layout notation? I read somewhere on the wiki that says one should stick to one type of notation, but if certain types of calculations favors one type of notation over the other, wouldn't that be problematic or confusing?
I came across this when trying to calculate $\frac{\mathrm{d}L}{\mathrm{d}W}=[\frac{\mathrm{d}L}{\mathrm{d}w_1}\frac{\mathrm{d}L}{\mathrm{d}w_2}...\frac{\mathrm{d}L}{\mathrm{d}w_c}]$, where $W$ is $n$ by $c$, and each $\frac{\mathrm{d}L}{\mathrm{d}w_i}$ is the derivative of $L$ with respect to the column vector $w_i$. As you can see fairly quickly, I started off with what wiki calls the the "denominator layout notation" but since each $\frac{\mathrm{d}L}{\mathrm{d}w_i}$'s ended up being $1$ by $n$, it didn't make much sense. Basically what I'm trying to say is that writing the scalar $L$ as $w^Tx$ caused my result to use numerator notation, but since I started off using denominator notation my answer gets messed up.
Last edited: