Log likelihood function and EM algorithm

In summary, we discussed an example of a multinomial distribution and its likelihood function. We then looked at the log likelihood function and applied log rules to simplify it. Next, we looked at partitioning one of the variables and finding the complete log likelihood function. Finally, we discussed the M-step and found a plot for the given data.
  • #1
Jameson
Gold Member
MHB
4,541
13
I'm going to start by asking about an example from class and then hopefully use that to work on the problem I need to solve. Here is an example:

Let's say we have a multinomial distribution $x \sim M(n;.5+.25\theta,0.25(1-\theta),0.25(1-\theta),0.5\theta)$.

The likelihood function of $\theta$ given the vector $x$ is:

\(\displaystyle f(x|\theta) = \frac{n!}{x_1!x_2!x_3!x_4!}(.5+.25\theta)^{x_1}(0.25(1-\theta)^{x_2+x_3}(0.5\theta)^{x_4}\).

This part is fine. It's this next step that confuses me.

\(\displaystyle \log (f(x|\theta))=x_1 \log(2+ \theta)+(x_2+x_3) \log(1-\theta)+x_4 \log(\theta)+\text{constant}\)

I understand the standard log rules I believe but I don't see how $\log(.5+.25\theta)^{x_1}=x_1 \log(2+ \theta)$. Obviously $x_1$ can be brought down in front of the expression but the stuff inside the parentheses makes no sense. :confused:
 
Physics news on Phys.org
  • #2
I would look at it as:

\(\displaystyle \log(.5+.25\theta)^{x_1}=x_1\log\left(\frac{1}{2}+\frac{\theta}{4} \right)=x_1\log\left(\frac{2+\theta}{4} \right)=x_1 \log(2+ \theta)+\text{constant}\)
 
  • #3
That's it! Nice eye. :)

Ok, so here is the problem I need to do. How does this look?

$\displaystyle f(x|\theta) = \frac{n!}{x_1!x_2!x_3!x_4!}(0.25\theta)^{x_1}(0.25(2+\theta))^{x_2}(0.5(1-2\theta))^{x_3}(0.5\theta)^{x_4}$

$\displaystyle f(x|\theta) = \frac{n!}{x_1!x_2!x_3!x_4!}\left( \frac{\theta}{4}\right)^{x_1}\left( \frac{2+\theta}{4} \right)^{x_2}\left( \frac{1-2\theta}{2} \right)^{x_3} \left(\frac{\theta}{2} \right)^{x_4}$

$\log (f(x|\theta))=x_1 \log(\theta)+x_2 \log(2+\theta)+ x_3 \log(1-2\theta)+x_4 \log(\theta) +\text{constant}$
 
  • #4
Jameson said:
That's it! Nice eye. :)

Ok, so here is the problem I need to do. How does this look?

$\displaystyle f(x|\theta) = \frac{n!}{x_1!x_2!x_3!x_4!}(0.25\theta)^{x_1}(0.25(2+\theta))^{x_2}(0.5(1-2\theta))^{x_3}(0.5\theta)^{x_4}$

$\displaystyle f(x|\theta) = \frac{n!}{x_1!x_2!x_3!x_4!}\left( \frac{\theta}{4}\right)^{x_1}\left( \frac{2+\theta}{4} \right)^{x_2}\left( \frac{1-2\theta}{2} \right)^{x_3} \left(\frac{\theta}{2} \right)^{x_4}$

$\log (f(x|\theta))=x_1 \log(\theta)+x_2 \log(2+\theta)+ x_3 \log(1-2\theta)+x_4 \log(\theta) +\text{constant}$

Yes, looks like you have properly applied the appropriate properties of logs to me. (Sun)
 
  • #5
So the next step according to my notes is to make a partition of one of the $x$ variables by writing $x_n = y_1+y_2$ for some $n$. My professor suggested partitioning $x_2$.

So $x_2=y_1+y_2 \rightarrow y_1 = 0.5 \text{ and } y_2=0.25\theta$, where $(y_1,y_2) \sim \text{Bin }(x_1; 0.5/(0.5+0.25\theta), 0.25\theta/(0.5+0.25\theta))$

Set the complete data to be $y=(x_1,y_1,y_2,x_3,x_4)$

The complete log likelihood function should now be $(x_1+y_2) \log(\theta)+ x_3 \log(1-2\theta)+x_4 \log(\theta) +\text{constant}$

I unfortunately am not even at the $E$ step yet for the EM algorithm but I believe this is how you must do it. Am I on the right track? I realize this might be a topic not many are familiar with but I've read that this algorithm is widely used.
 
  • #6
I'm not familiar with this topic at all, but, it appears to be that to get the complete log likelihood function you cite, we require:

\(\displaystyle y_2\log(\theta)=x_2\log(2+\theta)+\text{constant}\)

Does this look right?
 
  • #7
Good thinking, but I'm not sure that's the way to go. Luckily the next step was already done in the notes for $x_2=0.5+0.25\theta$. Don't ask me what any of the next stuff means though (Wasntme). I am going from my notes and would love it if someone could make sense of it. I do believe this is correct though and I'm making progress.

We first evaluate $\displaystyle E[y_2|\theta_{k},x]=\frac{\theta_{k}}{2+\theta_{k}}x_2$ and we can finally find:

$\displaystyle Q(\theta|\theta_{k},x)=\left(\frac{\theta_{k}}{ 2+\theta_{k}}x_2+x_1 \right)\log(\theta)+ x_3 \log(1-2\theta)+x_4 \log(\theta)$
 
  • #8
Yeah, I have to admit I am lost with those next steps as well, but hopefully one of our advanced statistics folks can help you make sense of it. (Nerd)
 
  • #9
Well here is the best it's going to get for me for now.

Now for the M-step, we find \(\displaystyle \frac{\partial Q}{\partial \theta}=0\).

$\displaystyle \theta_{k+1}=\frac{(\theta_k+2)x_1+\theta_k x_2 +(\theta_k+2)x_4}{2((\theta_k+2)x_1+\theta_k x_2+(\theta_k+2)(x_3+x_4))}$.

When applied for $x = [6,52,28,14]$ and $n=100$ I get the following plot.

View attachment 1455

I'll try to make sense of it tomorrow. Any insight is appreciated!
 

Attachments

  • hw6_1.png
    hw6_1.png
    1.4 KB · Views: 93
  • #10
Jameson said:
I'm going to start by asking about an example from class and then hopefully use that to work on the problem I need to solve. Here is an example:

Let's say we have a multinomial distribution $x \sim M(n;.5+.25\theta,0.25(1-\theta),0.25(1-\theta),0.5\theta)$.

Hmm, in a multinomial distribution the sum of the probabilities should be 1.
However, in your case it isn't.
Is it possible the last probability should be $0.25\theta$ instead of $0.5\theta$?
MarkFL said:
I would look at it as:

\(\displaystyle \log(.5+.25\theta)^{x_1}=x_1\log\left(\frac{1}{2}+\frac{\theta}{4} \right)=x_1\log\left(\frac{2+\theta}{4} \right)=x_1 \log(2+ \theta)+\text{constant}\)

I like to think of it as:
$$x_1\log\left(\frac{2+\theta}{4} \right)=x_1 \log(2+ \theta)-x_1 \log 4$$
Jameson said:
That's it! Nice eye. :)

Ok, so here is the problem I need to do. How does this look?

$\displaystyle f(x|\theta) = \frac{n!}{x_1!x_2!x_3!x_4!}(0.25\theta)^{x_1}(0.25(2+\theta))^{x_2}(0.5(1-2\theta))^{x_3}(0.5\theta)^{x_4}$

$\displaystyle f(x|\theta) = \frac{n!}{x_1!x_2!x_3!x_4!}\left( \frac{\theta}{4}\right)^{x_1}\left( \frac{2+\theta}{4} \right)^{x_2}\left( \frac{1-2\theta}{2} \right)^{x_3} \left(\frac{\theta}{2} \right)^{x_4}$

$\log (f(x|\theta))=x_1 \log(\theta)+x_2 \log(2+\theta)+ x_3 \log(1-2\theta)+x_4 \log(\theta) +\text{constant}$

Your likelihood function is actually:
$$\mathcal{L}(\theta|x) = f(x|\theta)$$
It is rather important that it's a function of $\theta$ and in particular not a function of $x$.
Therefore all terms containing only $x$ and not $\theta$ are effectively constants.
That is why all the $\log x_1!$ stuff and such disappears into a constant.

Hope this helps. :)
 
  • #11
Hi I like Serena!

Yep, you are correct. Here is a screenshot of the slide with this example problem.

View attachment 1458

Good point about the function being a function of $\theta$ and not $x$. It makes more sense now why we group of the terms not containing $\theta$ into one constant term.

The EM algorithm process has two main steps, E and M. The E step is still really tricky for me. It seems I need to first rewrite one of the variables in terms of two variables $y_1$ and $y_2$ and then find:

\(\displaystyle Q(\theta|\theta_k,x)=E[\log(f(y|\theta)|\theta_k,x)]\)

In the example provided to us this step is very computational intensive. Is it always so?
 

Attachments

  • Screen Shot 2013-10-09 at 2.58.57 PM.png
    Screen Shot 2013-10-09 at 2.58.57 PM.png
    31.8 KB · Views: 94
  • #12
Jameson said:
That's it! Nice eye. :)

Ok, so here is the problem I need to do. How does this look?

$\displaystyle f(x|\theta) = \frac{n!}{x_1!x_2!x_3!x_4!}(0.25\theta)^{x_1}(0.25(2+\theta))^{x_2}(0.5(1-2\theta))^{x_3}(0.5\theta)^{x_4}$

It seems you did not copy the expression properly.
The probabilities are not correct.
If you correct them you should get the proper result.
Jameson said:
Well here is the best it's going to get for me for now.

Now for the M-step, we find \(\displaystyle \frac{\partial Q}{\partial \theta}=0\).

$\displaystyle \theta_{k+1}=\frac{(\theta_k+2)x_1+\theta_k x_2 +(\theta_k+2)x_4}{2((\theta_k+2)x_1+\theta_k x_2+(\theta_k+2)(x_3+x_4))}$.

When applied for $x = [6,52,28,14]$ and $n=100$ I get the following plot.

View attachment 1455

I'll try to make sense of it tomorrow. Any insight is appreciated!

It appears they have turned it into an algorithm that approaches $\theta$ such that it has the maximum likelihood.
The result is $\theta \approx 0.24$.
The second graph might be of $\log \theta$ I guess.

It looks a bit convoluted to me though.
Let me give a simpler approach.

If we take the expression for the log-likelihood function:
$$\log \mathcal{L}(\theta|x) = x_1\log(2+\theta) + (x_2 + x_3) \log(1-\theta) + x_4 \log \theta + C$$
Since we want to find $\theta$ such that the likelihood function takes its maximal value, we take the derivative and set it equal to 0:
$$\frac d{d\theta} \log \mathcal{L}(\theta|x) = \frac{x_1}{2+\theta} - \frac{x_2 + x_3}{1-\theta} + \frac {x_4}\theta = 0$$
Solving this with the numbers given, yields the maximum likelihood estimation (MLE) of $\theta$:
$$\theta_{MLE} = 0.15$$
Hmm, that seems to be different from the given result. :confused:
 
Last edited:
  • #13
I believe you are looking at the example problem. The problem I was given to solve was $x \sim M(n;0.25\theta,0.25\theta(2+\theta),0.5(1-2\theta),0.5\theta)$. Does my log likelihood function look good now? (by the way I realize now I never explicitly gave the distribution for the problem I need to solve so this mix up is definitely my fault!)

The method you used is something I've seen but we were told to always partition one of the variables into two others, although I'm not really sure why. It seems like once you do this it is normally referenced as "complete data".

Anyway, when using the probabilities from the example instead of the ones in this post I get the following graph which seems to have the same result as the one you got. You are correct that the second graph is a log plot.

View attachment 1459

If your method works the same then I don't understand why I have to go through this tedious mess of partitioning one variable to get "complete data". On page three of this pdf they use both methods it seems. :confused:
 

Attachments

  • mhb_em.png
    mhb_em.png
    1.5 KB · Views: 76
  • #14
Jameson said:
I believe you are looking at the example problem. The problem I was given to solve was $x \sim M(n;0.25\theta,0.25\theta(2+\theta),0.5(1-2\theta),0.5\theta)$. Does my log likelihood function look good now?

Yes. Then it looks perfect. :)

The method you used is something I've seen but we were told to always partition one of the variables into two others, although I'm not really sure why. It seems like once you do this it is normally referenced as "complete data".

I don't understand either yet, but see below.
Anyway, when using the probabilities from the example instead of the ones in this post I get the following graph which seems to have the same result as the one you got. You are correct that the second graph is a log plot.

Good!
If your method works the same then I don't understand why I have to go through this tedious mess of partitioning one variable to get "complete data". On page three of this pdf they use both methods it seems. :confused:

It appears they are stepping up to a more complex problem than a standard maximum likelihood estimation.
The problem they describe is where you don't have all the data, but only a subset of it.
This is what would happen in genetics where you only observe the result of the dominant genes, but not the actual combination of the genes involved.
By splitting up the "incomplete data" into "complete data" they can obtain a more accurate estimate of the unknowns involved.
 
  • #15
I see. So although we have complete data for this problem we can construct an algorithm that will work even when one value is missing?

Here is the slide concerning the "E step" of the EM algorithm in the example problem. I really want to know if this process can be simplified. I will continue reading and try to find out but on the quiz we had today I couldn't do this step. :(

View attachment 1461
 

Attachments

  • Screen Shot 2013-10-09 at 4.50.21 PM.png
    Screen Shot 2013-10-09 at 4.50.21 PM.png
    29.9 KB · Views: 77
  • #16
Jameson said:
I see. So although we have complete data for this problem we can construct an algorithm that will work even when one value is missing?

Actually, I think it's the other way around.
They start with a multinomial model with complete data.
But then they say: suppose one of those parameters is really a combination of 2 invisible parameters, what will happen then?
With that more advanced model you don't have complete data anymore.

As long as we had complete data, we could do a "standard" maximum likelihood estimation (an "M" step).
With incomplete data, apparently we do "E" steps (expectation of the unknown data), and then we do the standard maximum likelihood estimation (the "M" step). After that we repeat the process.
Here is the slide concerning the "E step" of the EM algorithm in the example problem. I really want to know if this process can be simplified. I will continue reading and try to find out but on the quiz we had today I couldn't do this step. :(

Which sub step in this "E" step is causing you problems?
Or is it that an "E" step is used at all?
 
  • #17
I like Serena said:
...
Which sub step in this "E" step is causing you problems?
Or is it that an "E" step is used at all?

(I needed to take some time off from this problem to let things swirl around in my head.)

Ah, that makes more sense. I get it now why we call it "complete data" and "incomplete data".

The $E$ step is just tedious and not something I feel will become very quick to calculate even if I did many of these problems, but I'm sure I would get better. After talking to some of my classmates about this they seem to feel the same, so it's not just me. I'm satisfied with my understanding of this method for now as we're already moving on to a different topic.

Thanks again for your helpful comments.
 

FAQ: Log likelihood function and EM algorithm

What is a log likelihood function?

A log likelihood function is a mathematical function used in statistics to measure the probability of observing a set of data given a particular statistical model. It is typically denoted as L(theta), where theta represents the parameters of the model. The log likelihood function is used in maximum likelihood estimation, where the goal is to find the values of the parameters that maximize the likelihood of the observed data.

How is the log likelihood function related to the EM algorithm?

The EM algorithm, or Expectation-Maximization algorithm, is an iterative method used to estimate the parameters of a statistical model when there is missing data. The algorithm involves maximizing the log likelihood function at each iteration. This is because the log likelihood function provides a measure of how well the model fits the observed data, and maximizing it leads to better parameter estimates.

What is the role of the log likelihood function in model selection?

The log likelihood function plays a crucial role in model selection. It is used to compare different models by computing their respective log likelihood values and selecting the model with the highest value. The model with the highest log likelihood is considered to be the best fit for the observed data.

What are the advantages of using the log likelihood function in statistical analysis?

The log likelihood function has several advantages in statistical analysis. Firstly, it is a widely accepted and used measure of the fit of a model to the observed data. Secondly, it is a mathematically convenient way to express the likelihood of a set of data given a model. Additionally, it allows for comparison between different models and is often used in hypothesis testing.

Can the log likelihood function be used for any type of statistical model?

Yes, the log likelihood function can be used for any type of statistical model as long as the model can be expressed as a probability distribution. This includes commonly used models such as linear regression, logistic regression, and Gaussian mixture models. However, it is important to note that the log likelihood function may be more complex and difficult to compute for certain models compared to others.

Similar threads

Replies
3
Views
977
Replies
5
Views
2K
Replies
6
Views
2K
Replies
16
Views
2K
Replies
1
Views
1K
Replies
12
Views
2K
Replies
3
Views
1K
Back
Top