MHB Log likelihood function and EM algorithm

Jameson · Oct 9, 2013

I'm going to start by asking about an example from class and then hopefully use that to work on the problem I need to solve. Here is an example:

Let's say we have a multinomial distribution $x \sim M(n;.5+.25\theta,0.25(1-\theta),0.25(1-\theta),0.5\theta)$.

The likelihood function of $\theta$ given the vector $x$ is:

$$f(x|\theta) = \frac{n!}{x_1!x_2!x_3!x_4!}(.5+.25\theta)^{x_1}(0.25(1-\theta)^{x_2+x_3}(0.5\theta)^{x_4}$$.

This part is fine. It's this next step that confuses me.

$$\log (f(x|\theta))=x_1 \log(2+ \theta)+(x_2+x_3) \log(1-\theta)+x_4 \log(\theta)+\text{constant}$$

I understand the standard log rules I believe but I don't see how $\log(.5+.25\theta)^{x_1}=x_1 \log(2+ \theta)$. Obviously $x_1$ can be brought down in front of the expression but the stuff inside the parentheses makes no sense.

MarkFL · Oct 9, 2013

I would look at it as:

$$\log(.5+.25\theta)^{x_1}=x_1\log\left(\frac{1}{2}+\frac{\theta}{4} \right)=x_1\log\left(\frac{2+\theta}{4} \right)=x_1 \log(2+ \theta)+\text{constant}$$

Jameson · Oct 9, 2013

That's it! Nice eye. :)

Ok, so here is the problem I need to do. How does this look?

$\displaystyle f(x|\theta) = \frac{n!}{x_1!x_2!x_3!x_4!}(0.25\theta)^{x_1}(0.25(2+\theta))^{x_2}(0.5(1-2\theta))^{x_3}(0.5\theta)^{x_4}$

$\displaystyle f(x|\theta) = \frac{n!}{x_1!x_2!x_3!x_4!}\left( \frac{\theta}{4}\right)^{x_1}\left( \frac{2+\theta}{4} \right)^{x_2}\left( \frac{1-2\theta}{2} \right)^{x_3} \left(\frac{\theta}{2} \right)^{x_4}$

$\log (f(x|\theta))=x_1 \log(\theta)+x_2 \log(2+\theta)+ x_3 \log(1-2\theta)+x_4 \log(\theta) +\text{constant}$

MarkFL · Oct 9, 2013

Jameson said:

That's it! Nice eye. :)

Ok, so here is the problem I need to do. How does this look?

$\displaystyle f(x|\theta) = \frac{n!}{x_1!x_2!x_3!x_4!}(0.25\theta)^{x_1}(0.25(2+\theta))^{x_2}(0.5(1-2\theta))^{x_3}(0.5\theta)^{x_4}$

$\displaystyle f(x|\theta) = \frac{n!}{x_1!x_2!x_3!x_4!}\left( \frac{\theta}{4}\right)^{x_1}\left( \frac{2+\theta}{4} \right)^{x_2}\left( \frac{1-2\theta}{2} \right)^{x_3} \left(\frac{\theta}{2} \right)^{x_4}$

$\log (f(x|\theta))=x_1 \log(\theta)+x_2 \log(2+\theta)+ x_3 \log(1-2\theta)+x_4 \log(\theta) +\text{constant}$

Yes, looks like you have properly applied the appropriate properties of logs to me. (Sun)

Jameson · Oct 9, 2013

So the next step according to my notes is to make a partition of one of the $x$ variables by writing $x_n = y_1+y_2$ for some $n$. My professor suggested partitioning $x_2$.

So $x_2=y_1+y_2 \rightarrow y_1 = 0.5 \text{ and } y_2=0.25\theta$, where $(y_1,y_2) \sim \text{Bin }(x_1; 0.5/(0.5+0.25\theta), 0.25\theta/(0.5+0.25\theta))$

Set the complete data to be $y=(x_1,y_1,y_2,x_3,x_4)$

The complete log likelihood function should now be $(x_1+y_2) \log(\theta)+ x_3 \log(1-2\theta)+x_4 \log(\theta) +\text{constant}$

I unfortunately am not even at the $E$ step yet for the EM algorithm but I believe this is how you must do it. Am I on the right track? I realize this might be a topic not many are familiar with but I've read that this algorithm is widely used.

MarkFL · Oct 9, 2013

I'm not familiar with this topic at all, but, it appears to be that to get the complete log likelihood function you cite, we require:

$$y_2\log(\theta)=x_2\log(2+\theta)+\text{constant}$$

Does this look right?

Jameson · Oct 9, 2013

Good thinking, but I'm not sure that's the way to go. Luckily the next step was already done in the notes for $x_2=0.5+0.25\theta$. Don't ask me what any of the next stuff means though (Wasntme). I am going from my notes and would love it if someone could make sense of it. I do believe this is correct though and I'm making progress.

We first evaluate $\displaystyle E[y_2|\theta_{k},x]=\frac{\theta_{k}}{2+\theta_{k}}x_2$ and we can finally find:

$\displaystyle Q(\theta|\theta_{k},x)=\left(\frac{\theta_{k}}{ 2+\theta_{k}}x_2+x_1 \right)\log(\theta)+ x_3 \log(1-2\theta)+x_4 \log(\theta)$

MarkFL · Oct 9, 2013

Yeah, I have to admit I am lost with those next steps as well, but hopefully one of our advanced statistics folks can help you make sense of it. (Nerd)

Jameson · Oct 9, 2013

Well here is the best it's going to get for me for now.

Now for the M-step, we find $$\frac{\partial Q}{\partial \theta}=0$$.

$\displaystyle \theta_{k+1}=\frac{(\theta_k+2)x_1+\theta_k x_2 +(\theta_k+2)x_4}{2((\theta_k+2)x_1+\theta_k x_2+(\theta_k+2)(x_3+x_4))}$.

When applied for $x = [6,52,28,14]$ and $n=100$ I get the following plot.

View attachment 1455

I'll try to make sense of it tomorrow. Any insight is appreciated!

I like Serena · Oct 9, 2013

Jameson said:

I'm going to start by asking about an example from class and then hopefully use that to work on the problem I need to solve. Here is an example:

Let's say we have a multinomial distribution $x \sim M(n;.5+.25\theta,0.25(1-\theta),0.25(1-\theta),0.5\theta)$.

Hmm, in a multinomial distribution the sum of the probabilities should be 1.
However, in your case it isn't.
Is it possible the last probability should be $0.25\theta$ instead of $0.5\theta$?

MarkFL said:

I would look at it as:

$$\log(.5+.25\theta)^{x_1}=x_1\log\left(\frac{1}{2}+\frac{\theta}{4} \right)=x_1\log\left(\frac{2+\theta}{4} \right)=x_1 \log(2+ \theta)+\text{constant}$$

I like to think of it as:
$$x_1\log\left(\frac{2+\theta}{4} \right)=x_1 \log(2+ \theta)-x_1 \log 4$$

Jameson said:

That's it! Nice eye. :)

Ok, so here is the problem I need to do. How does this look?

$\displaystyle f(x|\theta) = \frac{n!}{x_1!x_2!x_3!x_4!}(0.25\theta)^{x_1}(0.25(2+\theta))^{x_2}(0.5(1-2\theta))^{x_3}(0.5\theta)^{x_4}$

$\displaystyle f(x|\theta) = \frac{n!}{x_1!x_2!x_3!x_4!}\left( \frac{\theta}{4}\right)^{x_1}\left( \frac{2+\theta}{4} \right)^{x_2}\left( \frac{1-2\theta}{2} \right)^{x_3} \left(\frac{\theta}{2} \right)^{x_4}$

$\log (f(x|\theta))=x_1 \log(\theta)+x_2 \log(2+\theta)+ x_3 \log(1-2\theta)+x_4 \log(\theta) +\text{constant}$

Your likelihood function is actually:
$$\mathcal{L}(\theta|x) = f(x|\theta)$$
It is rather important that it's a function of $\theta$ and in particular not a function of $x$.
Therefore all terms containing only $x$ and not $\theta$ are effectively constants.
That is why all the $\log x_1!$ stuff and such disappears into a constant.

Hope this helps. :)

Jameson · Oct 9, 2013

Hi I like Serena!

Yep, you are correct. Here is a screenshot of the slide with this example problem.

View attachment 1458

Good point about the function being a function of $\theta$ and not $x$. It makes more sense now why we group of the terms not containing $\theta$ into one constant term.

The EM algorithm process has two main steps, E and M. The E step is still really tricky for me. It seems I need to first rewrite one of the variables in terms of two variables $y_1$ and $y_2$ and then find:

$$Q(\theta|\theta_k,x)=E[\log(f(y|\theta)|\theta_k,x)]$$

In the example provided to us this step is very computational intensive. Is it always so?

I like Serena · Oct 9, 2013

Jameson said:

That's it! Nice eye. :)

Ok, so here is the problem I need to do. How does this look?

$\displaystyle f(x|\theta) = \frac{n!}{x_1!x_2!x_3!x_4!}(0.25\theta)^{x_1}(0.25(2+\theta))^{x_2}(0.5(1-2\theta))^{x_3}(0.5\theta)^{x_4}$

It seems you did not copy the expression properly.
The probabilities are not correct.
If you correct them you should get the proper result.

Jameson said:

Well here is the best it's going to get for me for now.

Now for the M-step, we find $$\frac{\partial Q}{\partial \theta}=0$$.

$\displaystyle \theta_{k+1}=\frac{(\theta_k+2)x_1+\theta_k x_2 +(\theta_k+2)x_4}{2((\theta_k+2)x_1+\theta_k x_2+(\theta_k+2)(x_3+x_4))}$.

When applied for $x = [6,52,28,14]$ and $n=100$ I get the following plot.

View attachment 1455

I'll try to make sense of it tomorrow. Any insight is appreciated!

It appears they have turned it into an algorithm that approaches $\theta$ such that it has the maximum likelihood.
The result is $\theta \approx 0.24$.
The second graph might be of $\log \theta$ I guess.

It looks a bit convoluted to me though.
Let me give a simpler approach.

If we take the expression for the log-likelihood function:
$$\log \mathcal{L}(\theta|x) = x_1\log(2+\theta) + (x_2 + x_3) \log(1-\theta) + x_4 \log \theta + C$$
Since we want to find $\theta$ such that the likelihood function takes its maximal value, we take the derivative and set it equal to 0:
$$\frac d{d\theta} \log \mathcal{L}(\theta|x) = \frac{x_1}{2+\theta} - \frac{x_2 + x_3}{1-\theta} + \frac {x_4}\theta = 0$$
Solving this with the numbers given, yields the maximum likelihood estimation (MLE) of $\theta$:
$$\theta_{MLE} = 0.15$$
Hmm, that seems to be different from the given result.

Jameson · Oct 9, 2013

I believe you are looking at the example problem. The problem I was given to solve was $x \sim M(n;0.25\theta,0.25\theta(2+\theta),0.5(1-2\theta),0.5\theta)$. Does my log likelihood function look good now? (by the way I realize now I never explicitly gave the distribution for the problem I need to solve so this mix up is definitely my fault!)

The method you used is something I've seen but we were told to always partition one of the variables into two others, although I'm not really sure why. It seems like once you do this it is normally referenced as "complete data".

Anyway, when using the probabilities from the example instead of the ones in this post I get the following graph which seems to have the same result as the one you got. You are correct that the second graph is a log plot.

View attachment 1459

If your method works the same then I don't understand why I have to go through this tedious mess of partitioning one variable to get "complete data". On page three of this pdf they use both methods it seems.

I like Serena · Oct 9, 2013

Jameson said:

I believe you are looking at the example problem. The problem I was given to solve was $x \sim M(n;0.25\theta,0.25\theta(2+\theta),0.5(1-2\theta),0.5\theta)$. Does my log likelihood function look good now?

Yes. Then it looks perfect. :)

The method you used is something I've seen but we were told to always partition one of the variables into two others, although I'm not really sure why. It seems like once you do this it is normally referenced as "complete data".

I don't understand either yet, but see below.

Anyway, when using the probabilities from the example instead of the ones in this post I get the following graph which seems to have the same result as the one you got. You are correct that the second graph is a log plot.

Good!

If your method works the same then I don't understand why I have to go through this tedious mess of partitioning one variable to get "complete data". On page three of this pdf they use both methods it seems.

It appears they are stepping up to a more complex problem than a standard maximum likelihood estimation.
The problem they describe is where you don't have all the data, but only a subset of it.
This is what would happen in genetics where you only observe the result of the dominant genes, but not the actual combination of the genes involved.
By splitting up the "incomplete data" into "complete data" they can obtain a more accurate estimate of the unknowns involved.

Jameson · Oct 9, 2013

I see. So although we have complete data for this problem we can construct an algorithm that will work even when one value is missing?

Here is the slide concerning the "E step" of the EM algorithm in the example problem. I really want to know if this process can be simplified. I will continue reading and try to find out but on the quiz we had today I couldn't do this step. :(

View attachment 1461

I like Serena · Oct 9, 2013

Jameson said:

I see. So although we have complete data for this problem we can construct an algorithm that will work even when one value is missing?

Actually, I think it's the other way around.
They start with a multinomial model with complete data.
But then they say: suppose one of those parameters is really a combination of 2 invisible parameters, what will happen then?
With that more advanced model you don't have complete data anymore.

As long as we had complete data, we could do a "standard" maximum likelihood estimation (an "M" step).
With incomplete data, apparently we do "E" steps (expectation of the unknown data), and then we do the standard maximum likelihood estimation (the "M" step). After that we repeat the process.

Here is the slide concerning the "E step" of the EM algorithm in the example problem. I really want to know if this process can be simplified. I will continue reading and try to find out but on the quiz we had today I couldn't do this step. :(

Which sub step in this "E" step is causing you problems?
Or is it that an "E" step is used at all?

Jameson · Oct 13, 2013

I like Serena said:

...
Which sub step in this "E" step is causing you problems?
Or is it that an "E" step is used at all?

(I needed to take some time off from this problem to let things swirl around in my head.)

Ah, that makes more sense. I get it now why we call it "complete data" and "incomplete data".

The $E$ step is just tedious and not something I feel will become very quick to calculate even if I did many of these problems, but I'm sure I would get better. After talking to some of my classmates about this they seem to feel the same, so it's not just me. I'm satisfied with my understanding of this method for now as we're already moving on to a different topic.

Thanks again for your helpful comments.

MHB Log likelihood function and EM algorithm

Attachments

Attachments

Attachments

Attachments

Similar threads

Hot Threads

B A Little Probability Puzzle

I Need help solving this Existence Algorithm for truth

I Help me understand skewness in QQ-plots please

I What Are the Axioms of Fuzzy Logic and How Do They Extend Boolean Algebra?

A Distribution of Range of Samples taken from N(0,1)

Recent Insights

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers