Help with proof for binomial distribution

DaanV · Jun 3, 2013

Greetings to you, Physics Forums regulars!

Please allow me to introduce myself a bit first. I'm a student in the Life Sciences, so I don't really have a lot of knowledge on mathematics past the basics.

I'm not sure if my problem belongs here. This is my first visit to this friendly-looking forum, so please forgive my ignorance if it should be elsewhere.

The problem I will describe here is what I came across during an internship. Also I'm sorry for the enormous amount of text in here. It's the only way I can make all of this make sense in my own head.

I'm fairly certain this whole thing can be solved without all the context, so if you wish you can skip straight to the bottom and see what it is I actually need help with.

Thanks a lot in advance for any help provided!

Homework Statement

Background information:
A fragment of DNA is fixed to a solid surface. It is subjected to subsequent cycles of varying conditions, and with each cycle the fragment is copied once. The new fragment will be 1 'step length' away from the parent fragment (in a random direction).

During the next cycle, both the parent and the daughter fragment are copied again, bringing about a total of 4 daughter strands (1 at the origin, 2 that have made one step and 1 that has made two steps).

Obviously the total amount of fragments ([itex]f_{tot}[/itex]) in this instance can be described by
[itex]f_{tot}=2^c[/itex]
With [itex]c[/itex] the number of cycles

The distribution of the amount of fragments that have made a certain number of steps as a function of the number of cycles ([itex]F(s,c)[/itex]) follows a binomial distribution:
[itex]F(s,c)=\left(^{c}_{s}\right)[/itex]
With [itex]s[/itex] the number of steps of each fragment

Sadly, life is hardly ever perfect. So it happens that the copying efficiency is not perfect. A relation can be found describing the efficiency, it's called [itex]E_f[/itex], and is a number between 0 and 1 (efficiency 0 means no copies are ever made, efficiency 1 means perfect replication). This efficiency is assumed to be constant throughout the process.

For example, with an efficiency of 0.9, the distribution would look (on average) as follows after [itex]c=2[/itex] cycles:

1 fragment at the origin (the original fragment), 1.8 fragments at one step length away (2*0.9 from the original) and 0.81 fragments at two step lengths away (0.9 * the 0.9 that were at [itex]s=1[/itex] after 1 cycle), for a total of 3.61 fragments.

The new equations now become:
[itex]f_{tot}=(E_f+1)^c[/itex]
[itex]F(s,c)=\left(^{c}_{s}\right)*E_f\;^s[/itex]

In order to calculate the average number of steps after a given number of cycles, I need the total number of steps, [itex]s_{tot}[/itex]. This is simply a summation over all possible amount of steps (no more than there are cycles) and multiplying by the number of fragments that have made that many steps. In formula form:

[itex]s_{tot}=\sum^{c}_{s=0}s *\left(^{c}_{s}\right)*E_f\;^s[/itex]

Now, here comes the question:
I am interested in finding a more 'direct' and elegant formula for this equation.
In my quest for this, I have found that the above formula can be rewritten to:
[itex]s_{tot}=c*E_f*(E_f+1)^{c-1}[/itex]

However, I am unsure how I can solidly proof this. So far I have only simulated the two equations for various values of [itex]c[/itex] and [itex]E_f[/itex] and it seems to fit perfectly. I'm aware though that this is no real evidence.

Homework Equations

The mission:
Prove that
[itex]\sum^{c}_{s=0}s *\left(^{c}_{s}\right)*E_f\;^s=c*E_f*(E_f+1)^{c-1}[/itex]

It seems fairly obvious that the Binomial Theorem comes into play:
[itex](x+y)^n=\sum^{n}_{k=0}\left(^{n}_{k}\right)*x^k*y^{n-k}[/itex]

I'm struggling however with the differences between it and my formula.

The Attempt at a Solution

I tried to implement 'proof by induction'.

Obviously the equation goes down without a problem for [itex]c=0[/itex]:
[itex]s_{tot}=0[/itex]
Which also makes sense: After 0 cycles, no fragments will have made any steps.

Another basis (in case [itex]c=0[/itex]) is not sufficient:
[itex]c=2[/itex]

[itex]\sum^{c}_{s=0}s *\left(^{c}_{s}\right)*E_f\;^s=0*1*E_f\;^0+1*2*E_f\;^1+2*1*E_f\;^2=2*E_f+2*E_f\;^2[/itex]

[itex]c*E_f*(E_f+1)^{c-1}=2*E_f*(E_f+1)^1=2*E_f+2*E_f\;^2[/itex]

Next up is what I consider the hard part, the actual induction. I guess the problem is that I don't know how to solve the summation when it goes to 'k+1':

Assume that for any given value k, the following is true:
[itex]\sum^{k}_{s=0}s *\left(^{k}_{s}\right)*E_f\;^s=k*E_f*(E_f+1)^{k-1}[/itex]

Now it's time to prove that the same is true for k+1:
[itex]\sum^{k+1}_{s=0}s *\left(^{k+1}_{s}\right)*E_f\;^s=^{?}(k+1)*E_f*(E_f+1)^{(k+1)-1}[/itex]

So.. Is there some axiom I'm missing that deals with summations up to an unknown value? How can I rewrite this?

Any help given would be very much appreciated. If things are unclear, please don't hesitate to ask.

Regards,

Daan

Ray Vickson · Jun 3, 2013

DaanV said:

Greetings to you, Physics Forums regulars!

Please allow me to introduce myself a bit first. I'm a student in the Life Sciences, so I don't really have a lot of knowledge on mathematics past the basics.

I'm not sure if my problem belongs here. This is my first visit to this friendly-looking forum, so please forgive my ignorance if it should be elsewhere.

The problem I will describe here is what I came across during an internship. Also I'm sorry for the enormous amount of text in here. It's the only way I can make all of this make sense in my own head.

I'm fairly certain this whole thing can be solved without all the context, so if you wish you can skip straight to the bottom and see what it is I actually need help with.

Thanks a lot in advance for any help provided!

Homework Statement

Background information:
A fragment of DNA is fixed to a solid surface. It is subjected to subsequent cycles of varying conditions, and with each cycle the fragment is copied once. The new fragment will be 1 'step length' away from the parent fragment (in a random direction).

During the next cycle, both the parent and the daughter fragment are copied again, bringing about a total of 4 daughter strands (1 at the origin, 2 that have made one step and 1 that has made two steps).

Obviously the total amount of fragments ([itex]f_{tot}[/itex]) in this instance can be described by
[itex]f_{tot}=2^c[/itex]
With [itex]c[/itex] the number of cycles

The distribution of the amount of fragments that have made a certain number of steps as a function of the number of cycles ([itex]F(s,c)[/itex]) follows a binomial distribution:
[itex]F(s,c)=\left(^{c}_{s}\right)[/itex]
With [itex]s[/itex] the number of steps of each fragment

Sadly, life is hardly ever perfect. So it happens that the copying efficiency is not perfect. A relation can be found describing the efficiency, it's called [itex]E_f[/itex], and is a number between 0 and 1 (efficiency 0 means no copies are ever made, efficiency 1 means perfect replication). This efficiency is assumed to be constant throughout the process.

For example, with an efficiency of 0.9, the distribution would look (on average) as follows after [itex]c=2[/itex] cycles:

1 fragment at the origin (the original fragment), 1.8 fragments at one step length away (2*0.9 from the original) and 0.81 fragments at two step lengths away (0.9 * the 0.9 that were at [itex]s=1[/itex] after 1 cycle), for a total of 3.61 fragments.

The new equations now become:
[itex]f_{tot}=(E_f+1)^c[/itex]
[itex]F(s,c)=\left(^{c}_{s}\right)*E_f\;^s[/itex]

In order to calculate the average number of steps after a given number of cycles, I need the total number of steps, [itex]s_{tot}[/itex]. This is simply a summation over all possible amount of steps (no more than there are cycles) and multiplying by the number of fragments that have made that many steps. In formula form:

[itex]s_{tot}=\sum^{c}_{s=0}s *\left(^{c}_{s}\right)*E_f\;^s[/itex]

Now, here comes the question:
I am interested in finding a more 'direct' and elegant formula for this equation.
In my quest for this, I have found that the above formula can be rewritten to:
[itex]s_{tot}=c*E_f*(E_f+1)^{c-1}[/itex]

However, I am unsure how I can solidly proof this. So far I have only simulated the two equations for various values of [itex]c[/itex] and [itex]E_f[/itex] and it seems to fit perfectly. I'm aware though that this is no real evidence.

Homework Equations

The mission:
Prove that
[itex]\sum^{c}_{s=0}s *\left(^{c}_{s}\right)*E_f\;^s=c*E_f*(E_f+1)^{c-1}[/itex]

It seems fairly obvious that the Binomial Theorem comes into play:
[itex](x+y)^n=\sum^{n}_{k=0}\left(^{n}_{k}\right)*x^k*y^{n-k}[/itex]

I'm struggling however with the differences between it and my formula.

The Attempt at a Solution

I tried to implement 'proof by induction'.

Obviously the equation goes down without a problem for [itex]c=0[/itex]:
[itex]s_{tot}=0[/itex]
Which also makes sense: After 0 cycles, no fragments will have made any steps.

Another basis (in case [itex]c=0[/itex]) is not sufficient:
[itex]c=2[/itex]

[itex]\sum^{c}_{s=0}s *\left(^{c}_{s}\right)*E_f\;^s=0*1*E_f\;^0+1*2*E_f\;^1+2*1*E_f\;^2=2*E_f+2*E_f\;^2[/itex]

[itex]c*E_f*(E_f+1)^{c-1}=2*E_f*(E_f+1)^1=2*E_f+2*E_f\;^2[/itex]

Next up is what I consider the hard part, the actual induction. I guess the problem is that I don't know how to solve the summation when it goes to 'k+1':

Assume that for any given value k, the following is true:
[itex]\sum^{k}_{s=0}s *\left(^{k}_{s}\right)*E_f\;^s=k*E_f*(E_f+1)^{k-1}[/itex]

Now it's time to prove that the same is true for k+1:
[itex]\sum^{k+1}_{s=0}s *\left(^{k+1}_{s}\right)*E_f\;^s=^{?}(k+1)*E_f*(E_f+1)^{(k+1)-1}[/itex]

So.. Is there some axiom I'm missing that deals with summations up to an unknown value? How can I rewrite this?

Any help given would be very much appreciated. If things are unclear, please don't hesitate to ask.

Regards,

Daan

Use the easily-demonstrated fact that ##s {c \choose s}## is ##0## if ##s = 0## and is equal to ##c s {c-1 \choose s-1}## if ##1 \leq s \leq c##.

DaanV · Jun 3, 2013

Thanks for your reply.

I presume you meant [itex]s\left(^c_s\right)[/itex] is equal to [itex]c\left(^{c-1}_{s-1}\right)[/itex], rather than [itex]cs\left(^{c-1}_{s-1}\right)[/itex].

Thank you so much so far, that has really helped me a lot. I'm not quite there yet though.

Using Bionomial Theorem with x=Ef, y=1, k=s-1 and n=c-1 brings me:

[itex]c*E_f*\left(E_f+1\right)^{c-1}\;=\;c*E_f*\sum_{s-1=0}^{c-1}\left(^{c-1}_{s-1}\right)E_f\,^{s-1}[/itex]

[itex]=\;\sum_{s=1}^{c-1}c\left(^{c-1}_{s-1}\right)E_f\,^{s}\;=\;\sum_{s=0}^{c-1}s\left(^{c}_{s}\right)E_f\,^{s}[/itex]

So.. Basically the only problem left now is that the summation only goes up to c-1, or have I done something wrong in the above?

Anyway, thanks a lot for your help so far!

Daan

Ray Vickson · Jun 3, 2013

DaanV said:

Thanks for your reply.

I presume you meant [itex]s\left(^c_s\right)[/itex] is equal to [itex]c\left(^{c-1}_{s-1}\right)[/itex], rather than [itex]cs\left(^{c-1}_{s-1}\right)[/itex].

Thank you so much so far, that has really helped me a lot. I'm not quite there yet though.

Using Bionomial Theorem with x=Ef, y=1, k=s-1 and n=c-1 brings me:

[itex]c*E_f*\left(E_f+1\right)^{c-1}\;=\;c*E_f*\sum_{s-1=0}^{c-1}\left(^{c-1}_{s-1}\right)E_f\,^{s-1}[/itex]

[itex]=\;\sum_{s=1}^{c-1}c\left(^{c-1}_{s-1}\right)E_f\,^{s}\;=\;\sum_{s=0}^{c-1}s\left(^{c}_{s}\right)E_f\,^{s}[/itex]

So.. Basically the only problem left now is that the summation only goes up to c-1, or have I done something wrong in the above?

Anyway, thanks a lot for your help so far!

Daan

Yes, sorry: that was a typo. In the original summation, so goes from 1 to c (because we can leave off the s=0 term, which is zero in s*C(c,s)E^s). Thus, s-1 = t goes from 0 to c-1 in the new sum, so we have
[tex] \sum_{s=0}^c s {c \choose s} E^s = \sum_{s=1}^c s {c \choose s} E^s
= \sum_{s=1}^c c {c-1 \choose s-1} E^s = c E \sum_{s-1=0}^{c-1} {c-1 \choose s-1} E^{s-1}\\ = cE \sum_{t=0}^{c-1} {c-1 \choose t} E^t = cE(1+E)^{c-1}.[/tex]

DaanV · Jun 4, 2013

Ah yes, I see where I went wrong now.

Thanks again for the excellent help given!
Now on to the next problem..

Help with proof for binomial distribution

Homework Statement

Homework Equations

The Attempt at a Solution

Homework Statement

Homework Equations

The Attempt at a Solution

Related to Help with proof for binomial distribution

What is a binomial distribution?

How is the binomial distribution calculated?

What are the characteristics of a binomial distribution?

What is the use of a binomial distribution in real life?

How is the binomial distribution related to the normal distribution?

Similar threads

Hot Threads

Recent Insights