MHB Log likelihood function and EM algorithm

AI Thread Summary
The discussion revolves around the application of the EM algorithm to a multinomial distribution, specifically focusing on the log likelihood function and its transformation. Participants clarify the steps involved in deriving the log likelihood from the likelihood function, emphasizing the importance of grouping terms that do not depend on the parameter of interest, θ. They explore the E and M steps of the EM algorithm, noting the need to partition variables to handle incomplete data effectively. The conversation highlights the complexity of these calculations and the necessity of understanding both complete and incomplete data scenarios in statistical modeling. Overall, the thread illustrates the intricacies of applying the EM algorithm in practice while seeking clarity on the underlying concepts.
Jameson
Insights Author
Gold Member
MHB
Messages
4,533
Reaction score
13
I'm going to start by asking about an example from class and then hopefully use that to work on the problem I need to solve. Here is an example:

Let's say we have a multinomial distribution $x \sim M(n;.5+.25\theta,0.25(1-\theta),0.25(1-\theta),0.5\theta)$.

The likelihood function of $\theta$ given the vector $x$ is:

$$f(x|\theta) = \frac{n!}{x_1!x_2!x_3!x_4!}(.5+.25\theta)^{x_1}(0.25(1-\theta)^{x_2+x_3}(0.5\theta)^{x_4}$$.

This part is fine. It's this next step that confuses me.

$$\log (f(x|\theta))=x_1 \log(2+ \theta)+(x_2+x_3) \log(1-\theta)+x_4 \log(\theta)+\text{constant}$$

I understand the standard log rules I believe but I don't see how $\log(.5+.25\theta)^{x_1}=x_1 \log(2+ \theta)$. Obviously $x_1$ can be brought down in front of the expression but the stuff inside the parentheses makes no sense. :confused:
 
Physics news on Phys.org
I would look at it as:

$$\log(.5+.25\theta)^{x_1}=x_1\log\left(\frac{1}{2}+\frac{\theta}{4} \right)=x_1\log\left(\frac{2+\theta}{4} \right)=x_1 \log(2+ \theta)+\text{constant}$$
 
That's it! Nice eye. :)

Ok, so here is the problem I need to do. How does this look?

$\displaystyle f(x|\theta) = \frac{n!}{x_1!x_2!x_3!x_4!}(0.25\theta)^{x_1}(0.25(2+\theta))^{x_2}(0.5(1-2\theta))^{x_3}(0.5\theta)^{x_4}$

$\displaystyle f(x|\theta) = \frac{n!}{x_1!x_2!x_3!x_4!}\left( \frac{\theta}{4}\right)^{x_1}\left( \frac{2+\theta}{4} \right)^{x_2}\left( \frac{1-2\theta}{2} \right)^{x_3} \left(\frac{\theta}{2} \right)^{x_4}$

$\log (f(x|\theta))=x_1 \log(\theta)+x_2 \log(2+\theta)+ x_3 \log(1-2\theta)+x_4 \log(\theta) +\text{constant}$
 
Jameson said:
That's it! Nice eye. :)

Ok, so here is the problem I need to do. How does this look?

$\displaystyle f(x|\theta) = \frac{n!}{x_1!x_2!x_3!x_4!}(0.25\theta)^{x_1}(0.25(2+\theta))^{x_2}(0.5(1-2\theta))^{x_3}(0.5\theta)^{x_4}$

$\displaystyle f(x|\theta) = \frac{n!}{x_1!x_2!x_3!x_4!}\left( \frac{\theta}{4}\right)^{x_1}\left( \frac{2+\theta}{4} \right)^{x_2}\left( \frac{1-2\theta}{2} \right)^{x_3} \left(\frac{\theta}{2} \right)^{x_4}$

$\log (f(x|\theta))=x_1 \log(\theta)+x_2 \log(2+\theta)+ x_3 \log(1-2\theta)+x_4 \log(\theta) +\text{constant}$

Yes, looks like you have properly applied the appropriate properties of logs to me. (Sun)
 
So the next step according to my notes is to make a partition of one of the $x$ variables by writing $x_n = y_1+y_2$ for some $n$. My professor suggested partitioning $x_2$.

So $x_2=y_1+y_2 \rightarrow y_1 = 0.5 \text{ and } y_2=0.25\theta$, where $(y_1,y_2) \sim \text{Bin }(x_1; 0.5/(0.5+0.25\theta), 0.25\theta/(0.5+0.25\theta))$

Set the complete data to be $y=(x_1,y_1,y_2,x_3,x_4)$

The complete log likelihood function should now be $(x_1+y_2) \log(\theta)+ x_3 \log(1-2\theta)+x_4 \log(\theta) +\text{constant}$

I unfortunately am not even at the $E$ step yet for the EM algorithm but I believe this is how you must do it. Am I on the right track? I realize this might be a topic not many are familiar with but I've read that this algorithm is widely used.
 
I'm not familiar with this topic at all, but, it appears to be that to get the complete log likelihood function you cite, we require:

$$y_2\log(\theta)=x_2\log(2+\theta)+\text{constant}$$

Does this look right?
 
Good thinking, but I'm not sure that's the way to go. Luckily the next step was already done in the notes for $x_2=0.5+0.25\theta$. Don't ask me what any of the next stuff means though (Wasntme). I am going from my notes and would love it if someone could make sense of it. I do believe this is correct though and I'm making progress.

We first evaluate $\displaystyle E[y_2|\theta_{k},x]=\frac{\theta_{k}}{2+\theta_{k}}x_2$ and we can finally find:

$\displaystyle Q(\theta|\theta_{k},x)=\left(\frac{\theta_{k}}{ 2+\theta_{k}}x_2+x_1 \right)\log(\theta)+ x_3 \log(1-2\theta)+x_4 \log(\theta)$
 
Yeah, I have to admit I am lost with those next steps as well, but hopefully one of our advanced statistics folks can help you make sense of it. (Nerd)
 
Well here is the best it's going to get for me for now.

Now for the M-step, we find $$\frac{\partial Q}{\partial \theta}=0$$.

$\displaystyle \theta_{k+1}=\frac{(\theta_k+2)x_1+\theta_k x_2 +(\theta_k+2)x_4}{2((\theta_k+2)x_1+\theta_k x_2+(\theta_k+2)(x_3+x_4))}$.

When applied for $x = [6,52,28,14]$ and $n=100$ I get the following plot.

View attachment 1455

I'll try to make sense of it tomorrow. Any insight is appreciated!
 

Attachments

  • hw6_1.png
    hw6_1.png
    1.4 KB · Views: 125
  • #10
Jameson said:
I'm going to start by asking about an example from class and then hopefully use that to work on the problem I need to solve. Here is an example:

Let's say we have a multinomial distribution $x \sim M(n;.5+.25\theta,0.25(1-\theta),0.25(1-\theta),0.5\theta)$.

Hmm, in a multinomial distribution the sum of the probabilities should be 1.
However, in your case it isn't.
Is it possible the last probability should be $0.25\theta$ instead of $0.5\theta$?
MarkFL said:
I would look at it as:

$$\log(.5+.25\theta)^{x_1}=x_1\log\left(\frac{1}{2}+\frac{\theta}{4} \right)=x_1\log\left(\frac{2+\theta}{4} \right)=x_1 \log(2+ \theta)+\text{constant}$$

I like to think of it as:
$$x_1\log\left(\frac{2+\theta}{4} \right)=x_1 \log(2+ \theta)-x_1 \log 4$$
Jameson said:
That's it! Nice eye. :)

Ok, so here is the problem I need to do. How does this look?

$\displaystyle f(x|\theta) = \frac{n!}{x_1!x_2!x_3!x_4!}(0.25\theta)^{x_1}(0.25(2+\theta))^{x_2}(0.5(1-2\theta))^{x_3}(0.5\theta)^{x_4}$

$\displaystyle f(x|\theta) = \frac{n!}{x_1!x_2!x_3!x_4!}\left( \frac{\theta}{4}\right)^{x_1}\left( \frac{2+\theta}{4} \right)^{x_2}\left( \frac{1-2\theta}{2} \right)^{x_3} \left(\frac{\theta}{2} \right)^{x_4}$

$\log (f(x|\theta))=x_1 \log(\theta)+x_2 \log(2+\theta)+ x_3 \log(1-2\theta)+x_4 \log(\theta) +\text{constant}$

Your likelihood function is actually:
$$\mathcal{L}(\theta|x) = f(x|\theta)$$
It is rather important that it's a function of $\theta$ and in particular not a function of $x$.
Therefore all terms containing only $x$ and not $\theta$ are effectively constants.
That is why all the $\log x_1!$ stuff and such disappears into a constant.

Hope this helps. :)
 
  • #11
Hi I like Serena!

Yep, you are correct. Here is a screenshot of the slide with this example problem.

View attachment 1458

Good point about the function being a function of $\theta$ and not $x$. It makes more sense now why we group of the terms not containing $\theta$ into one constant term.

The EM algorithm process has two main steps, E and M. The E step is still really tricky for me. It seems I need to first rewrite one of the variables in terms of two variables $y_1$ and $y_2$ and then find:

$$Q(\theta|\theta_k,x)=E[\log(f(y|\theta)|\theta_k,x)]$$

In the example provided to us this step is very computational intensive. Is it always so?
 

Attachments

  • Screen Shot 2013-10-09 at 2.58.57 PM.png
    Screen Shot 2013-10-09 at 2.58.57 PM.png
    31.8 KB · Views: 133
  • #12
Jameson said:
That's it! Nice eye. :)

Ok, so here is the problem I need to do. How does this look?

$\displaystyle f(x|\theta) = \frac{n!}{x_1!x_2!x_3!x_4!}(0.25\theta)^{x_1}(0.25(2+\theta))^{x_2}(0.5(1-2\theta))^{x_3}(0.5\theta)^{x_4}$

It seems you did not copy the expression properly.
The probabilities are not correct.
If you correct them you should get the proper result.
Jameson said:
Well here is the best it's going to get for me for now.

Now for the M-step, we find $$\frac{\partial Q}{\partial \theta}=0$$.

$\displaystyle \theta_{k+1}=\frac{(\theta_k+2)x_1+\theta_k x_2 +(\theta_k+2)x_4}{2((\theta_k+2)x_1+\theta_k x_2+(\theta_k+2)(x_3+x_4))}$.

When applied for $x = [6,52,28,14]$ and $n=100$ I get the following plot.

View attachment 1455

I'll try to make sense of it tomorrow. Any insight is appreciated!

It appears they have turned it into an algorithm that approaches $\theta$ such that it has the maximum likelihood.
The result is $\theta \approx 0.24$.
The second graph might be of $\log \theta$ I guess.

It looks a bit convoluted to me though.
Let me give a simpler approach.

If we take the expression for the log-likelihood function:
$$\log \mathcal{L}(\theta|x) = x_1\log(2+\theta) + (x_2 + x_3) \log(1-\theta) + x_4 \log \theta + C$$
Since we want to find $\theta$ such that the likelihood function takes its maximal value, we take the derivative and set it equal to 0:
$$\frac d{d\theta} \log \mathcal{L}(\theta|x) = \frac{x_1}{2+\theta} - \frac{x_2 + x_3}{1-\theta} + \frac {x_4}\theta = 0$$
Solving this with the numbers given, yields the maximum likelihood estimation (MLE) of $\theta$:
$$\theta_{MLE} = 0.15$$
Hmm, that seems to be different from the given result. :confused:
 
Last edited:
  • #13
I believe you are looking at the example problem. The problem I was given to solve was $x \sim M(n;0.25\theta,0.25\theta(2+\theta),0.5(1-2\theta),0.5\theta)$. Does my log likelihood function look good now? (by the way I realize now I never explicitly gave the distribution for the problem I need to solve so this mix up is definitely my fault!)

The method you used is something I've seen but we were told to always partition one of the variables into two others, although I'm not really sure why. It seems like once you do this it is normally referenced as "complete data".

Anyway, when using the probabilities from the example instead of the ones in this post I get the following graph which seems to have the same result as the one you got. You are correct that the second graph is a log plot.

View attachment 1459

If your method works the same then I don't understand why I have to go through this tedious mess of partitioning one variable to get "complete data". On page three of this pdf they use both methods it seems. :confused:
 

Attachments

  • mhb_em.png
    mhb_em.png
    1.5 KB · Views: 109
  • #14
Jameson said:
I believe you are looking at the example problem. The problem I was given to solve was $x \sim M(n;0.25\theta,0.25\theta(2+\theta),0.5(1-2\theta),0.5\theta)$. Does my log likelihood function look good now?

Yes. Then it looks perfect. :)

The method you used is something I've seen but we were told to always partition one of the variables into two others, although I'm not really sure why. It seems like once you do this it is normally referenced as "complete data".

I don't understand either yet, but see below.
Anyway, when using the probabilities from the example instead of the ones in this post I get the following graph which seems to have the same result as the one you got. You are correct that the second graph is a log plot.

Good!
If your method works the same then I don't understand why I have to go through this tedious mess of partitioning one variable to get "complete data". On page three of this pdf they use both methods it seems. :confused:

It appears they are stepping up to a more complex problem than a standard maximum likelihood estimation.
The problem they describe is where you don't have all the data, but only a subset of it.
This is what would happen in genetics where you only observe the result of the dominant genes, but not the actual combination of the genes involved.
By splitting up the "incomplete data" into "complete data" they can obtain a more accurate estimate of the unknowns involved.
 
  • #15
I see. So although we have complete data for this problem we can construct an algorithm that will work even when one value is missing?

Here is the slide concerning the "E step" of the EM algorithm in the example problem. I really want to know if this process can be simplified. I will continue reading and try to find out but on the quiz we had today I couldn't do this step. :(

View attachment 1461
 

Attachments

  • Screen Shot 2013-10-09 at 4.50.21 PM.png
    Screen Shot 2013-10-09 at 4.50.21 PM.png
    29.9 KB · Views: 106
  • #16
Jameson said:
I see. So although we have complete data for this problem we can construct an algorithm that will work even when one value is missing?

Actually, I think it's the other way around.
They start with a multinomial model with complete data.
But then they say: suppose one of those parameters is really a combination of 2 invisible parameters, what will happen then?
With that more advanced model you don't have complete data anymore.

As long as we had complete data, we could do a "standard" maximum likelihood estimation (an "M" step).
With incomplete data, apparently we do "E" steps (expectation of the unknown data), and then we do the standard maximum likelihood estimation (the "M" step). After that we repeat the process.
Here is the slide concerning the "E step" of the EM algorithm in the example problem. I really want to know if this process can be simplified. I will continue reading and try to find out but on the quiz we had today I couldn't do this step. :(

Which sub step in this "E" step is causing you problems?
Or is it that an "E" step is used at all?
 
  • #17
I like Serena said:
...
Which sub step in this "E" step is causing you problems?
Or is it that an "E" step is used at all?

(I needed to take some time off from this problem to let things swirl around in my head.)

Ah, that makes more sense. I get it now why we call it "complete data" and "incomplete data".

The $E$ step is just tedious and not something I feel will become very quick to calculate even if I did many of these problems, but I'm sure I would get better. After talking to some of my classmates about this they seem to feel the same, so it's not just me. I'm satisfied with my understanding of this method for now as we're already moving on to a different topic.

Thanks again for your helpful comments.
 

Similar threads

Back
Top