Help with proof for binomial distribution

In summary, the student's problem is that the distribution of the amount of fragments that have made a certain number of steps as a function of the number of cycles follows a binomial distribution, but they are unsure how to prove this. They are also struggling with the differences between the binomial theorem and their equation.
  • #1
DaanV
26
0
Greetings to you, Physics Forums regulars!

Please allow me to introduce myself a bit first. I'm a student in the Life Sciences, so I don't really have a lot of knowledge on mathematics past the basics.

I'm not sure if my problem belongs here. This is my first visit to this friendly-looking forum, so please forgive my ignorance if it should be elsewhere.

The problem I will describe here is what I came across during an internship. Also I'm sorry for the enormous amount of text in here. It's the only way I can make all of this make sense in my own head.

I'm fairly certain this whole thing can be solved without all the context, so if you wish you can skip straight to the bottom and see what it is I actually need help with.

Thanks a lot in advance for any help provided!


Homework Statement


Background information:
A fragment of DNA is fixed to a solid surface. It is subjected to subsequent cycles of varying conditions, and with each cycle the fragment is copied once. The new fragment will be 1 'step length' away from the parent fragment (in a random direction).

During the next cycle, both the parent and the daughter fragment are copied again, bringing about a total of 4 daughter strands (1 at the origin, 2 that have made one step and 1 that has made two steps).

Obviously the total amount of fragments ([itex]f_{tot}[/itex]) in this instance can be described by
[itex]f_{tot}=2^c[/itex]
With [itex]c[/itex] the number of cycles

The distribution of the amount of fragments that have made a certain number of steps as a function of the number of cycles ([itex]F(s,c)[/itex]) follows a binomial distribution:
[itex]F(s,c)=\left(^{c}_{s}\right)[/itex]
With [itex]s[/itex] the number of steps of each fragment

Sadly, life is hardly ever perfect. So it happens that the copying efficiency is not perfect. A relation can be found describing the efficiency, it's called [itex]E_f[/itex], and is a number between 0 and 1 (efficiency 0 means no copies are ever made, efficiency 1 means perfect replication). This efficiency is assumed to be constant throughout the process.

For example, with an efficiency of 0.9, the distribution would look (on average) as follows after [itex]c=2[/itex] cycles:

1 fragment at the origin (the original fragment), 1.8 fragments at one step length away (2*0.9 from the original) and 0.81 fragments at two step lengths away (0.9 * the 0.9 that were at [itex]s=1[/itex] after 1 cycle), for a total of 3.61 fragments.

The new equations now become:
[itex]f_{tot}=(E_f+1)^c[/itex]
[itex]F(s,c)=\left(^{c}_{s}\right)*E_f\;^s[/itex]

In order to calculate the average number of steps after a given number of cycles, I need the total number of steps, [itex]s_{tot}[/itex]. This is simply a summation over all possible amount of steps (no more than there are cycles) and multiplying by the number of fragments that have made that many steps. In formula form:

[itex]s_{tot}=\sum^{c}_{s=0}s *\left(^{c}_{s}\right)*E_f\;^s[/itex]

Now, here comes the question:
I am interested in finding a more 'direct' and elegant formula for this equation.
In my quest for this, I have found that the above formula can be rewritten to:
[itex]s_{tot}=c*E_f*(E_f+1)^{c-1}[/itex]

However, I am unsure how I can solidly proof this. So far I have only simulated the two equations for various values of [itex]c[/itex] and [itex]E_f[/itex] and it seems to fit perfectly. I'm aware though that this is no real evidence.

Homework Equations


The mission:
Prove that
[itex]\sum^{c}_{s=0}s *\left(^{c}_{s}\right)*E_f\;^s=c*E_f*(E_f+1)^{c-1}[/itex]

It seems fairly obvious that the Binomial Theorem comes into play:
[itex](x+y)^n=\sum^{n}_{k=0}\left(^{n}_{k}\right)*x^k*y^{n-k}[/itex]

I'm struggling however with the differences between it and my formula.

The Attempt at a Solution


I tried to implement 'proof by induction'.

Obviously the equation goes down without a problem for [itex]c=0[/itex]:
[itex]s_{tot}=0[/itex]
Which also makes sense: After 0 cycles, no fragments will have made any steps.

Another basis (in case [itex]c=0[/itex]) is not sufficient:
[itex]c=2[/itex]

[itex]\sum^{c}_{s=0}s *\left(^{c}_{s}\right)*E_f\;^s=0*1*E_f\;^0+1*2*E_f\;^1+2*1*E_f\;^2=2*E_f+2*E_f\;^2[/itex]

[itex]c*E_f*(E_f+1)^{c-1}=2*E_f*(E_f+1)^1=2*E_f+2*E_f\;^2[/itex]


Next up is what I consider the hard part, the actual induction. I guess the problem is that I don't know how to solve the summation when it goes to 'k+1':

Assume that for any given value k, the following is true:
[itex]\sum^{k}_{s=0}s *\left(^{k}_{s}\right)*E_f\;^s=k*E_f*(E_f+1)^{k-1}[/itex]

Now it's time to prove that the same is true for k+1:
[itex]\sum^{k+1}_{s=0}s *\left(^{k+1}_{s}\right)*E_f\;^s=^{?}(k+1)*E_f*(E_f+1)^{(k+1)-1}[/itex]

So.. Is there some axiom I'm missing that deals with summations up to an unknown value? How can I rewrite this?

Any help given would be very much appreciated. If things are unclear, please don't hesitate to ask.

Regards,

Daan
 
Physics news on Phys.org
  • #2
DaanV said:
Greetings to you, Physics Forums regulars!

Please allow me to introduce myself a bit first. I'm a student in the Life Sciences, so I don't really have a lot of knowledge on mathematics past the basics.

I'm not sure if my problem belongs here. This is my first visit to this friendly-looking forum, so please forgive my ignorance if it should be elsewhere.

The problem I will describe here is what I came across during an internship. Also I'm sorry for the enormous amount of text in here. It's the only way I can make all of this make sense in my own head.

I'm fairly certain this whole thing can be solved without all the context, so if you wish you can skip straight to the bottom and see what it is I actually need help with.

Thanks a lot in advance for any help provided!


Homework Statement


Background information:
A fragment of DNA is fixed to a solid surface. It is subjected to subsequent cycles of varying conditions, and with each cycle the fragment is copied once. The new fragment will be 1 'step length' away from the parent fragment (in a random direction).

During the next cycle, both the parent and the daughter fragment are copied again, bringing about a total of 4 daughter strands (1 at the origin, 2 that have made one step and 1 that has made two steps).

Obviously the total amount of fragments ([itex]f_{tot}[/itex]) in this instance can be described by
[itex]f_{tot}=2^c[/itex]
With [itex]c[/itex] the number of cycles

The distribution of the amount of fragments that have made a certain number of steps as a function of the number of cycles ([itex]F(s,c)[/itex]) follows a binomial distribution:
[itex]F(s,c)=\left(^{c}_{s}\right)[/itex]
With [itex]s[/itex] the number of steps of each fragment

Sadly, life is hardly ever perfect. So it happens that the copying efficiency is not perfect. A relation can be found describing the efficiency, it's called [itex]E_f[/itex], and is a number between 0 and 1 (efficiency 0 means no copies are ever made, efficiency 1 means perfect replication). This efficiency is assumed to be constant throughout the process.

For example, with an efficiency of 0.9, the distribution would look (on average) as follows after [itex]c=2[/itex] cycles:

1 fragment at the origin (the original fragment), 1.8 fragments at one step length away (2*0.9 from the original) and 0.81 fragments at two step lengths away (0.9 * the 0.9 that were at [itex]s=1[/itex] after 1 cycle), for a total of 3.61 fragments.

The new equations now become:
[itex]f_{tot}=(E_f+1)^c[/itex]
[itex]F(s,c)=\left(^{c}_{s}\right)*E_f\;^s[/itex]

In order to calculate the average number of steps after a given number of cycles, I need the total number of steps, [itex]s_{tot}[/itex]. This is simply a summation over all possible amount of steps (no more than there are cycles) and multiplying by the number of fragments that have made that many steps. In formula form:

[itex]s_{tot}=\sum^{c}_{s=0}s *\left(^{c}_{s}\right)*E_f\;^s[/itex]

Now, here comes the question:
I am interested in finding a more 'direct' and elegant formula for this equation.
In my quest for this, I have found that the above formula can be rewritten to:
[itex]s_{tot}=c*E_f*(E_f+1)^{c-1}[/itex]

However, I am unsure how I can solidly proof this. So far I have only simulated the two equations for various values of [itex]c[/itex] and [itex]E_f[/itex] and it seems to fit perfectly. I'm aware though that this is no real evidence.

Homework Equations


The mission:
Prove that
[itex]\sum^{c}_{s=0}s *\left(^{c}_{s}\right)*E_f\;^s=c*E_f*(E_f+1)^{c-1}[/itex]

It seems fairly obvious that the Binomial Theorem comes into play:
[itex](x+y)^n=\sum^{n}_{k=0}\left(^{n}_{k}\right)*x^k*y^{n-k}[/itex]

I'm struggling however with the differences between it and my formula.

The Attempt at a Solution


I tried to implement 'proof by induction'.

Obviously the equation goes down without a problem for [itex]c=0[/itex]:
[itex]s_{tot}=0[/itex]
Which also makes sense: After 0 cycles, no fragments will have made any steps.

Another basis (in case [itex]c=0[/itex]) is not sufficient:
[itex]c=2[/itex]

[itex]\sum^{c}_{s=0}s *\left(^{c}_{s}\right)*E_f\;^s=0*1*E_f\;^0+1*2*E_f\;^1+2*1*E_f\;^2=2*E_f+2*E_f\;^2[/itex]

[itex]c*E_f*(E_f+1)^{c-1}=2*E_f*(E_f+1)^1=2*E_f+2*E_f\;^2[/itex]


Next up is what I consider the hard part, the actual induction. I guess the problem is that I don't know how to solve the summation when it goes to 'k+1':

Assume that for any given value k, the following is true:
[itex]\sum^{k}_{s=0}s *\left(^{k}_{s}\right)*E_f\;^s=k*E_f*(E_f+1)^{k-1}[/itex]

Now it's time to prove that the same is true for k+1:
[itex]\sum^{k+1}_{s=0}s *\left(^{k+1}_{s}\right)*E_f\;^s=^{?}(k+1)*E_f*(E_f+1)^{(k+1)-1}[/itex]

So.. Is there some axiom I'm missing that deals with summations up to an unknown value? How can I rewrite this?

Any help given would be very much appreciated. If things are unclear, please don't hesitate to ask.

Regards,

Daan

Use the easily-demonstrated fact that ##s {c \choose s}## is ##0## if ##s = 0## and is equal to ##c s {c-1 \choose s-1}## if ##1 \leq s \leq c##.
 
  • #3
Thanks for your reply.

I presume you meant [itex]s\left(^c_s\right)[/itex] is equal to [itex]c\left(^{c-1}_{s-1}\right)[/itex], rather than [itex]cs\left(^{c-1}_{s-1}\right)[/itex].

Thank you so much so far, that has really helped me a lot. I'm not quite there yet though.

Using Bionomial Theorem with x=Ef, y=1, k=s-1 and n=c-1 brings me:

[itex]c*E_f*\left(E_f+1\right)^{c-1}\;=\;c*E_f*\sum_{s-1=0}^{c-1}\left(^{c-1}_{s-1}\right)E_f\,^{s-1}[/itex]

[itex]=\;\sum_{s=1}^{c-1}c\left(^{c-1}_{s-1}\right)E_f\,^{s}\;=\;\sum_{s=0}^{c-1}s\left(^{c}_{s}\right)E_f\,^{s}[/itex]

So.. Basically the only problem left now is that the summation only goes up to c-1, or have I done something wrong in the above?

Anyway, thanks a lot for your help so far!

Daan
 
  • #4
DaanV said:
Thanks for your reply.

I presume you meant [itex]s\left(^c_s\right)[/itex] is equal to [itex]c\left(^{c-1}_{s-1}\right)[/itex], rather than [itex]cs\left(^{c-1}_{s-1}\right)[/itex].

Thank you so much so far, that has really helped me a lot. I'm not quite there yet though.

Using Bionomial Theorem with x=Ef, y=1, k=s-1 and n=c-1 brings me:

[itex]c*E_f*\left(E_f+1\right)^{c-1}\;=\;c*E_f*\sum_{s-1=0}^{c-1}\left(^{c-1}_{s-1}\right)E_f\,^{s-1}[/itex]

[itex]=\;\sum_{s=1}^{c-1}c\left(^{c-1}_{s-1}\right)E_f\,^{s}\;=\;\sum_{s=0}^{c-1}s\left(^{c}_{s}\right)E_f\,^{s}[/itex]

So.. Basically the only problem left now is that the summation only goes up to c-1, or have I done something wrong in the above?

Anyway, thanks a lot for your help so far!

Daan

Yes, sorry: that was a typo. In the original summation, so goes from 1 to c (because we can leave off the s=0 term, which is zero in s*C(c,s)E^s). Thus, s-1 = t goes from 0 to c-1 in the new sum, so we have
[tex] \sum_{s=0}^c s {c \choose s} E^s = \sum_{s=1}^c s {c \choose s} E^s
= \sum_{s=1}^c c {c-1 \choose s-1} E^s = c E \sum_{s-1=0}^{c-1} {c-1 \choose s-1} E^{s-1}\\ = cE \sum_{t=0}^{c-1} {c-1 \choose t} E^t = cE(1+E)^{c-1}.[/tex]
 
Last edited:
  • #5
Ah yes, I see where I went wrong now.

Thanks again for the excellent help given!
Now on to the next problem..
 

Related to Help with proof for binomial distribution

What is a binomial distribution?

A binomial distribution is a probability distribution that is used to model the number of successes in a fixed number of independent trials. It is based on the assumption that each trial has only two possible outcomes, success or failure, and the probability of success remains constant throughout all the trials.

How is the binomial distribution calculated?

The binomial distribution can be calculated using the formula P(x) = nCx * p^x * (1-p)^(n-x), where n is the total number of trials, x is the number of successes, and p is the probability of success in each trial. Alternatively, it can also be calculated using statistical software or a binomial distribution table.

What are the characteristics of a binomial distribution?

A binomial distribution has the following characteristics:

  • The number of trials is fixed.
  • Each trial has two possible outcomes.
  • The probability of success remains constant throughout all the trials.
  • The trials are independent of each other.

What is the use of a binomial distribution in real life?

The binomial distribution is commonly used in real life to model and analyze situations where there are only two possible outcomes, such as in coin tosses, election results, and success or failure of a new product launch. It can also be used in quality control, market research, and medical studies.

How is the binomial distribution related to the normal distribution?

The binomial distribution is related to the normal distribution through the central limit theorem, which states that as the sample size (n) increases, the binomial distribution approaches a normal distribution with mean np and standard deviation √(np(1-p)). This relationship is useful in approximating binomial probabilities when the sample size is large.

Similar threads

  • Precalculus Mathematics Homework Help
Replies
18
Views
848
  • Precalculus Mathematics Homework Help
Replies
18
Views
1K
  • Introductory Physics Homework Help
Replies
7
Views
963
  • Introductory Physics Homework Help
Replies
1
Views
1K
  • Introductory Physics Homework Help
Replies
12
Views
2K
  • Precalculus Mathematics Homework Help
Replies
15
Views
2K
  • Atomic and Condensed Matter
Replies
16
Views
2K
  • Precalculus Mathematics Homework Help
Replies
2
Views
2K
  • Precalculus Mathematics Homework Help
Replies
10
Views
2K
  • Precalculus Mathematics Homework Help
Replies
4
Views
1K
Back
Top