- #1
Dethrone
- 717
- 0
I'm trying to optimize the function below, but I'm not sure where I made a mistake. (This is an application in machine learning.)$$J(\theta)=\sum_{i=1}^n \left(\sum_{j=1}^{k}(\theta^Tx^{(i)}-y^{(i)})_j^2\right)$$
where $\theta$ is a $n$ by $k$ matrix and $x$ is a $n$ by 1 matrix.
$$=\sum_{i=1}^n \left((\theta^Tx^{(i)}-y^{(i)})_1^2+...+(\theta^Tx^{(i)}-y^{(i)})_k^2 \right)$$
Differentiating,
$$\pd{J}{\theta_{pq}}=2\sum_{i=1}^n \left( (\theta^Tx^{(i)}-y^{(i)})_1\pd{}{\theta_{pq}}(\theta^Tx^{(i)})_1+...+(\theta^Tx^{(i)}-y^{(i)})_k\pd{}{\theta_{pq}}(\theta^Tx^{(i)})_k\right)$$
But, if we look at the first term, $(\theta^Tx^{(i)}-y^{(i)})_1$ is a $k$ by 1 vector and $\pd{}{\theta_{pq}}(\theta^Tx^{(i)})$ is also a $k$ by 1 vector, so we can't multiply them together...(maybe unless we use tensors...). Where did I make a mistake?
where $\theta$ is a $n$ by $k$ matrix and $x$ is a $n$ by 1 matrix.
$$=\sum_{i=1}^n \left((\theta^Tx^{(i)}-y^{(i)})_1^2+...+(\theta^Tx^{(i)}-y^{(i)})_k^2 \right)$$
Differentiating,
$$\pd{J}{\theta_{pq}}=2\sum_{i=1}^n \left( (\theta^Tx^{(i)}-y^{(i)})_1\pd{}{\theta_{pq}}(\theta^Tx^{(i)})_1+...+(\theta^Tx^{(i)}-y^{(i)})_k\pd{}{\theta_{pq}}(\theta^Tx^{(i)})_k\right)$$
But, if we look at the first term, $(\theta^Tx^{(i)}-y^{(i)})_1$ is a $k$ by 1 vector and $\pd{}{\theta_{pq}}(\theta^Tx^{(i)})$ is also a $k$ by 1 vector, so we can't multiply them together...(maybe unless we use tensors...). Where did I make a mistake?