Gradient of a function containing a matrix?

In summary: Just write out the original function ##f = f(x_1,x_2, \ldots, x_n)## in detail so you can see what is happening; then, after finding the derivative you can re-write the answer using matrices again, if you want. So f = \frac{1}{2} \sum_{i=1}^n x_i^2 + \log \left(\sum_{i=1}^n \exp (b_i + \sum_{j=1}^n a_{ij} x_j ) \right) You could also use the fact that ##\log(\sum_i (\exp(g
  • #1
countzander
17
0

Homework Statement


http://i.imgur.com/TlDOllQ.png

Homework Equations


As stated.

The Attempt at a Solution


[/B]
I'm not sure how to slay this beast. I know the gradient is just a partial derivative and that the solution likely involves multiple partial derivatives, one for each element in the vector x. But how would the partial derivatives be computed, given the matrices?
 
Physics news on Phys.org
  • #2
countzander said:

Homework Statement


http://i.imgur.com/TlDOllQ.png

Homework Equations


As stated.

The Attempt at a Solution


[/B]
I'm not sure how to slay this beast. I know the gradient is just a partial derivative and that the solution likely involves multiple partial derivatives, one for each element in the vector x. But how would the partial derivatives be computed, given the matrices?

Just type out the problem here; do not use thumbnails. Read the pinned post 'Guidelines for students and helpers', by Vela, to see why. I cannot read your thumbnail on some media, so I will make no attempt to help.
 
  • #3
I'll help you out with the formatting, at least. This might help you if you want to post future questions.

Let ##A \in\mathbb R^{m\times n}## and ##B \in\mathbb R^m##. Compute the gradient of$$f:\mathbb R^n\rightarrow\mathbb R,f(x)=\frac{1}{2}x^Tx+log\left(e^TE(Ax+b)\right),$$ where ## e=\left(1,1,1,\ldots,1\right)\in\mathbb R^m ## and ##E:\mathbb R^m\rightarrow\mathbb R^m## is a component wise exponential function, i.e., ##(E(x))_i=exp(x_i)## for ##i=1,2,\ldots m.##
Use ##diag(v)## for a ##m\times m## diagonal matrix with diagonal elements given by ##v\in\mathbb R^m##
 
  • #4
Thanks for the formatting help.

I attempted a solution by differentiating with respect to ##x_n##.

$$\frac{\partial f}{\partial x_n} = 2x_n + \frac{A^T e^T exp(Ax+b)}{e^T exp(Ax+b)}$$

But this isn't correct, I don't think. Shouldn't ##A^T## cancel out somewhere? Can the gradient contain a matrix in the final solution?
 
  • #5
countzander said:
Thanks for the formatting help.

I attempted a solution by differentiating with respect to ##x_n##.

$$\frac{\partial f}{\partial x_n} = 2x_n + \frac{A^T e^T exp(Ax+b)}{e^T exp(Ax+b)}$$

But this isn't correct, I don't think. Shouldn't ##A^T## cancel out somewhere? Can the gradient contain a matrix in the final solution?

countzander said:
Thanks for the formatting help.

I attempted a solution by differentiating with respect to ##x_n##.

$$\frac{\partial f}{\partial x_n} = 2x_n + \frac{A^T e^T exp(Ax+b)}{e^T exp(Ax+b)}$$

But this isn't correct, I don't think. Shouldn't ##A^T## cancel out somewhere? Can the gradient contain a matrix in the final solution?

Just write out the original function ##f = f(x_1,x_2, \ldots, x_n)## in detail so you can see what is happening; then, after finding the derivative you can re-write the answer using matrices again, if you want. So
[tex] f = \frac{1}{2} \sum_{i=1}^n x_i^2 + \log \left(\sum_{i=1}^n \exp (b_i + \sum_{j=1}^n a_{ij} x_j ) \right) [/tex]
You could also use the fact that ##\log(\sum_i (\exp(g_i)) = \prod_i g_i##, but I don't know if that makes things better or worse---you would need to try it for yourself. Also, you don't want just ##\partial f / \partial x_n##, you want all the ##\partial f / \partial x_k, k = 1,
\dots,n##.

BTW: in TeX/LaTeX you should use "\exp" instead of "exp" and "\log" instead of "log"; this applies also to the other standard functions (the trig functions, the hyperbolic functions plus things like "max", "min", "mod", etc.) The results really do look better: you get ##\exp## instead of ##exp##, ##\log## instead of ##log##, etc.
 

FAQ: Gradient of a function containing a matrix?

What is the gradient of a function containing a matrix?

The gradient of a function containing a matrix is a vector that contains the partial derivatives of the function with respect to each variable in the matrix. It represents the rate of change of the function in each direction.

How is the gradient of a function containing a matrix calculated?

The gradient of a function containing a matrix is calculated by taking the partial derivatives of the function with respect to each variable in the matrix and arranging them into a vector.

What is the significance of the gradient in a function containing a matrix?

The gradient in a function containing a matrix is significant because it provides information about the direction and rate of change of the function at a specific point. It is a valuable tool for optimization and finding local extrema.

Can the gradient of a function containing a matrix be a matrix itself?

Yes, the gradient of a function containing a matrix can also be a matrix if the function has multiple variables in the matrix. In this case, the gradient will be a matrix with the same dimensions as the original matrix.

How is the gradient of a function containing a matrix used in machine learning?

In machine learning, the gradient of a function containing a matrix is used to update the parameters of a model during the training process. It is also used in gradient descent algorithms to find the minimum of a cost function and optimize the model's performance.

Similar threads

Replies
8
Views
1K
Replies
10
Views
2K
Replies
4
Views
2K
Replies
8
Views
1K
Replies
6
Views
3K
Replies
4
Views
2K
Back
Top