Derivative of Log Likelihood Function

In summary, I differentiated the log likelihood in a mixture of gaussians model and found that the numerator term is obtained by differentiation of the multivariate gaussian. I then found the denominator term by differentiating the log function.
  • #1
NATURE.M
301
0
So looking through my notes I can't seem to understand how to get from one step to the next. I have attached a screenshot of the 2 lines I'm very confused about. Thanks.

BTW: The equations are for the log likelihood in a mixture of gaussians model

EDIT: To elaborate I am particularly confused about how they get numerator term π_{k} N(x_{n}|μ_{k}, Σ). I can't seem to understand how they are differentiating this to obtain that. I understand how they obtain the denominator term from differentiating the log but that's about all. To differentiate the multivariate gaussian I would think the log function needs to be used to break up the internal terms. Although I can't put this intuition together.
 

Attachments

  • Screen Shot 2015-11-15 at 5.37.12 PM.png
    Screen Shot 2015-11-15 at 5.37.12 PM.png
    7.6 KB · Views: 595
Last edited:
Physics news on Phys.org
  • #2
I think it's because ##\Sigma_k## appears both inside and outside (as an inverse) the exponent in the cdf function ##\mathscr{N}##.
So
$$\frac{\partial}{\partial \Sigma_k}\mathscr{N}(\mu,\Sigma_k)=
\frac{\partial}{\partial \Sigma_k}\left[C\Sigma_k{}^{-1}\exp[f(\mu,\Sigma_k)]\right]$$
for known constant ##C## and function ##f##.
By the product rule, this is then equal to
$$C\exp[f(\mu,\Sigma_k)]\left[\frac{\partial}{\partial \Sigma_k}\Sigma_k{}^{-1}+\Sigma_k{}^{-1}\frac{\partial}{\partial \Sigma_k}f(\mu,\Sigma_k)]\right]$$

There will be some messy algebra involved.

You might find it easier to first work through the univariate case, differentiating wrt ##\sigma## and seeing if you can obtain an analogous expression. If that works out, it shouldn't be too hard to extend it to the multivar case.
 
  • #3
andrewkirk said:
I think it's because ##\Sigma_k## appears both inside and outside (as an inverse) the exponent in the cdf function ##\mathscr{N}##.
So
$$\frac{\partial}{\partial \Sigma_k}\mathscr{N}(\mu,\Sigma_k)=
\frac{\partial}{\partial \Sigma_k}\left[C\Sigma_k{}^{-1}\exp[f(\mu,\Sigma_k)]\right]$$
for known constant ##C## and function ##f##.
By the product rule, this is then equal to
$$C\exp[f(\mu,\Sigma_k)]\left[\frac{\partial}{\partial \Sigma_k}\Sigma_k{}^{-1}+\Sigma_k{}^{-1}\frac{\partial}{\partial \Sigma_k}f(\mu,\Sigma_k)]\right]$$

There will be some messy algebra involved.

You might find it easier to first work through the univariate case, differentiating wrt ##\sigma## and seeing if you can obtain an analogous expression. If that works out, it shouldn't be too hard to extend it to the multivar case.
I don't understand how you got $$C\Sigma_{k}^{-1}$$ In the multivariate gaussian we have $$\frac{1}{|\Sigma_{k}|}$$ How did you convert that determinant into an inverse ? Maybe you meant the same thing but forgot the determinant sign ?
 
Last edited:
  • #4
NATURE.M said:
I don't understand how you got $$C\Sigma_{k}^{-1}$$ In the multivariate gaussian we have $$\frac{1}{|\Sigma_{k}|}$$ How did you convert that determinant into an inverse ?
I didn't. What I wrote is only broadly indicative of the structure. I didn't look up the multivariate Gaussian formula. With your correction that line becomes:

$$C\exp[f(\mu,\Sigma_k)]\left[\frac{\partial}{\partial \Sigma_k}|\Sigma_k|^{-1}+|\Sigma_k|^{-1}\frac{\partial}{\partial \Sigma_k}f(\mu,\Sigma_k)]\right]$$
which is
$$C\exp[f(\mu,\Sigma_k)]\left[-|\Sigma_k|^{-2}\frac{\partial |\Sigma_k|}{\partial \Sigma_k}+|\Sigma_k|^{-1}\frac{\partial}{\partial \Sigma_k}f(\mu,\Sigma_k)]\right]$$

I think if you work through the univariate case first it'll become much clearer.
 
  • #5
andrewkirk said:
I didn't. What I wrote is only broadly indicative of the structure. I didn't look up the multivariate Gaussian formula. With your correction that line becomes:

$$C\exp[f(\mu,\Sigma_k)]\left[\frac{\partial}{\partial \Sigma_k}|\Sigma_k|^{-1}+|\Sigma_k|^{-1}\frac{\partial}{\partial \Sigma_k}f(\mu,\Sigma_k)]\right]$$
which is
$$C\exp[f(\mu,\Sigma_k)]\left[-|\Sigma_k|^{-2}\frac{\partial |\Sigma_k|}{\partial \Sigma_k}+|\Sigma_k|^{-1}\frac{\partial}{\partial \Sigma_k}f(\mu,\Sigma_k)]\right]$$

I think if you work through the univariate case first it'll become much clearer.

Okay so rewriting with exponents of -1/2 (for the gaussian) and repeating the operation we would have:
$$C\exp[f(\mu,\Sigma_k)]\left[-\frac{1}{2}|\Sigma_k|^{\frac{-3}{2}}\frac{\partial |\Sigma_k|}{\partial \Sigma_k}+|\Sigma_k|^{\frac{-1}{2}}\frac{\partial}{\partial \Sigma_k}f(\mu,\Sigma_k)]\right]$$
So the problem becomes the extra $$|\Sigma_k|^{-1}$$ that gets left over after we factor out $$|\Sigma_k|^{\frac{-1}{2}}$$ Any ideas ?
 
  • #6
So I think I resolved my troubles using a few properties outlined in the matrix cookbook.
 

Related to Derivative of Log Likelihood Function

1. What is the purpose of taking the derivative of the log likelihood function?

The derivative of the log likelihood function allows us to find the maximum likelihood estimate (MLE) for the parameters of a statistical model. This is important because it helps us determine the most likely values for the parameters based on the observed data, which in turn allows us to make more accurate predictions and inferences.

2. How is the derivative of the log likelihood function calculated?

The derivative of the log likelihood function is calculated by taking the partial derivatives of the function with respect to each parameter and setting them equal to 0. This results in a system of equations which can be solved to find the values of the parameters that maximize the log likelihood function.

3. What does it mean if the derivative of the log likelihood function is 0?

If the derivative of the log likelihood function is 0, it means that the current values of the parameters are the maximum likelihood estimates. In other words, the observed data is most likely to occur with these parameter values.

4. Can the derivative of the log likelihood function ever be negative?

No, the derivative of the log likelihood function cannot be negative. This is because the log likelihood function is always increasing or constant, and therefore its derivative is always non-negative. A negative derivative would indicate a decreasing function, which is not possible for the log likelihood function.

5. How does the derivative of the log likelihood function relate to statistical hypothesis testing?

The derivative of the log likelihood function is used in statistical hypothesis testing to determine the likelihood of observing the data under the null hypothesis versus the alternative hypothesis. It is also used in likelihood ratio tests, where the ratio of the likelihoods under the null and alternative hypotheses is compared to a critical value to make a decision about the hypotheses.

Similar threads

Replies
1
Views
1K
Replies
16
Views
2K
Replies
12
Views
2K
Replies
1
Views
1K
Replies
4
Views
2K
Replies
3
Views
1K
Replies
16
Views
4K
Replies
1
Views
2K
Back
Top