- #1
bowlbase
- 146
- 2
Homework Statement
I have a DNA sequence generated by L throws of a 4 faced die with probabilities ##\pi_A, \pi_C, \pi_G, \pi_T##. Each probability is unknown. Task: estimate the probability of each side of the die. Hint: use a random variable defined by the sequence that has a binomial distribution then use the likelihood maximization.
Homework Equations
The Attempt at a Solution
So, as always with these problems my first attempt at this is wildly incorrect despite making perfect sense to me. My naive approach would just be to count the number of each A, C, G, T within the sequence and divide by the sequence length to get an approximate probability for that letter at any position within the sequence. But this doesn't use either hint.
The other way I would do this is to still use each letter as a separate random variable 4 binomial distributions. These would give me probability distribution of particular letters in the sequence. Then for each binomial distribution I would take the derivative and set to 0 to get the solution to the probability.
For instance: ##F(\pi_A)=\binom{L}{N_A}\pi_A^{N_A}(1-\pi_A)^{L-N_A}##
Doing log on both sides, taking the derivative and setting to 0.
##0=\frac{N_A}{\pi_A}- \frac{L-N_A}{1-\pi_A}##
I'm about to be late for class but at first glance this looks like I made a mistake or that my method is wrong here as well. But the idea is that I do this four letters.
Thanks for the help.