What is the binomial distribution?

In summary, the four concepts defined by pairs are: (0, 0) probability mass function (only discrete), (0, 1) probability density function (only continuous), (1, 0) a probability distribution, and (n, 1) cumulative distribution functions.
  • #1
Rasalhague
1,387
2
I'm trying to learn how the names density and distribution, and related terms, are used in statistics and probability theory. Here are four concepts which I've labelled by pairs:

[tex](0, 0) = f:\mathbb{R} \to [0,1] \; : \; f(x)=P(X=x)=P(\{s \in S : X(s)=x\})[/tex]

where s is an element of a sample space, S.

[tex](0, 1) = F(x)=P(X \leq x)=\sum_{t=-\infty}^{x}f(t)[/tex]

[tex](1, 0) = g(x) : [/tex]

[tex](i) \, g(x) \geq 0;[/tex]

[tex](ii) \, \int_{-\infty}^{\infty}g(x) \, dx = 1;[/tex]

[tex](iii) \, \int_{a}^{b} g(x) \, dx = P(a < X < b)[/tex]

where a and b are any two values of X such that a < b.

[tex](1, 1) = G(x)=P(X\leq x) = \int_{-\infty}^{x}g(t) \, dt[/tex]

Hoel, in Introduction to Mathematical Statistics, calls concepts (n, 0) probability densities. He calls concepts (n, 1) probability distributions. Wolfram Mathworld gives the same definitions, but adds that some people use the term "cumulative distribution function", CDF, in place of probability distribution. Wikipedia calls (0, 0) a probability mass function (only discrete), (1, 0) a probability density function (only continuous), and (n, 1) cumulative distribution functions, discrete and continuous.

However, Mathworld seems to use the term distribution in a different sense with reference to, for example, the binomial and normal "distributions". The formula they define the binomial distribution by here [ http://mathworld.wolfram.com/BinomialDistribution.html ] is called a "probability density function" by Wolfram Alpha and a "probability mass function" by Wikipedia. Hoel also seems to neglect his earlier definitions and use the terms distribution and density interchangeably with respect to the binomial and normal distribution (or density).

I get the impression that for everyone in the context of the binomial and normal distributions, and for Wikipedia in general, "(probability) distribution" may refer to something broader than (Inclusive of? Related to?) Hoel's densities and distributions, so that the bell curve is only the graph of the probability density function associated with the normal distribution, and neither this density nor the corresponding cumulative distribution function are to be identified as the normal distribution itself. Is that right? Could someone explain more exactly what distribution is here?

I'm afraid I don't understand enough of the supporting terminology to understand Wikipedia's definition of probability distribution [ http://en.wikipedia.org/wiki/Probability_distribution ], but I notice that it avoids identifying it with either probability mass function / probability density, or cumulative distribution function. Instead it just says that probability distributions are "characterized by" these. What exactly is the relationship?

This article uses the notation Pr[X = x] and Pr[X < or = x]. So their probability distribution seems to be a function of two variables, Pr(x, r), where x is a real number, and r is one of two relations, either "=" or "< or =", so that, depending on the relation, it could manifest as Hoel's/Mathworld's density, or Hoel's/Mathworld's (cumulative) distribution. Is that anywhere near the mark?
 
Last edited:
Physics news on Phys.org
  • #2
Would this be a fair dictionary-entry style answer to the question "What does distribution mean?" in terms of f, g, F, G, as these are defined in #1?

(1) A function Q:Rx{ =, <or= }x{ discrete, nondiscrete } --> [0,1], such that

(a) Q(x,=,discrete) = f(x)
PMF (probability mass function), "discrete density"

(b) Q(x,<or=,discrete) = F(x)
CDF (cumulative distribution function, discrete)

(c) Q(x,=,nondiscrete) = g(x)
PDF (probability density function), "continuous density"

(d) Q(x,<or=,nondiscrete) = G(x)
CDF (cumulative distribution function, continuous)

(Or more generally, Q:Rnx{ =, <or= }x{ discrete, nondiscrete } --> [0,1], with the appropriate generalisations of f, g, F, G to multivariable functions.)

(2) A synonym for CDF functions, i.e. functions of the form F(x) and G(x).

*

I see that Excel does something similar, in that you can enter formulas of the form DISTRIBUTION1-NAME[PARAMETER1,...,PARAMETERn,FALSE] or DISTRIBUTION1-NAME[___,...,___,TRUE] to select for PMF/PDF/density (false), or CFD/distribution2 (true).
 
  • #3
This discussion doesn't rely a measure theoretic approach. You can find a discussion in a text on probability theory (Chung's "A Course in Probability Theory" is a good one).

If [tex] X [/tex] is a continuous random variable, its (cumulative) distribution function [tex] F [/tex] satisfies

[tex]
P(X \le x) = F(x)
[/tex]

The density function in the case of a continuous random variable is a function [tex] f [/tex] that is non-negative and satisfies

[tex]
\int_{-\infty}^\infty f(x) \, dx = 1, \qquad F' = f
[/tex]


Either the cdf or the density can be used for calculation. The classic example is

[tex]
\begin{align*}
P(a \le X \le b) & = F(b) - F(a) \\
P(A \le X \le b) & = \int_a^b f(x) \, dx
\end{align*}
[/tex]



In the discrete case the cumulative distribution function [tex] G [/tex] still satisfies

[tex]
P(X \le x) = G(x)
[/tex]

The function that is in some sense analogous to the density is often called the probability mass function. It satisfies

[tex]
P(X = x) = g(x)
[/tex]

Neither [tex] F [/tex] nor [tex] G [/tex] is the actual distribution: that name refers to the "rule" that governs the assignment of probabilities: normal distribution, exponential distribution, binomial distribution, Poisson, are some examples.
 
  • #4
statdad said:
Neither [tex] F [/tex] nor [tex] G [/tex] is the actual distribution: that name refers to the "rule" that governs the assignment of probabilities: normal distribution, exponential distribution, binomial distribution, Poisson, are some examples.

Okay, so a distribution, in this sense, is not even a function, just the probability-assigning rule associated with these various related functions (if discrete: pmf and cdf, if continuous: pdf, cdf). Thanks for your comments, statdad, and for the book recommendation. The Hoel book was originally published in 1947, although revised in 1983. But given the Mathworld articles, presumably some people still use that terminology, distribution function for cdf; still, I suppose if they always use the word function in that expression, it wouldn't be ambiguous, only confusing to a novice :~)
 
  • #5
Rasalhague said:
Okay, so a distribution, in this sense, is not even a function, just the probability-assigning rule associated with these various related functions (if discrete: pmf and cdf, if continuous: pdf, cdf). Thanks for your comments, statdad, and for the book recommendation. The Hoel book was originally published in 1947, although revised in 1983. But given the Mathworld articles, presumably some people still use that terminology, distribution function for cdf; still, I suppose if they always use the word function in that expression, it wouldn't be ambiguous, only confusing to a novice :~)

Yes, distribution function and cdf are often used interchangeably. Terminology is always fun - the most obvious difference in language is the use of "normal distribution" in the US for what much, if not most, of the rest of the world refers to as the "Gaussian distribution".
 
  • #6
statdad said:
Yes, distribution function and cdf are often used interchangeably. Terminology is always fun - the most obvious difference in language is the use of "normal distribution" in the US for what much, if not most, of the rest of the world refers to as the "Gaussian distribution".

Ha ha, tooth-grindingly fun! Thanks for that tip too. Just to be more specific, say the experiment is to flip a coin 3 times. Is the binomial distribution for this experiment the rule that assigns an equal probability of 1/8 to each simple event, i.e. each set containing one element of the underlying set, S, of a sample space? Or is it the function which uses this rule to map subsets of S to the interval [0,1]? Or is it a rule that also specifies what counts as a success, e.g. getting heads? Is it a rule, or a function, that depends on the random variable? Hoel describes the random variable as generating a new sample space. In this case, if the random variable X:S-->R such that X(s), for each s in S, is the number of heads obtained, would the binomial distribution perhaps be the probability rule that depends both on X and on the probabilities assigned to subsets of S and that assigns unequal probabilities to the integers {0, 1, 2, 3} (i.e. the rule of the PMF)?
 

Related to What is the binomial distribution?

1. What is density and how is it measured?

Density is the measure of how much mass is contained in a given volume. It is typically measured in units of mass per unit volume, such as grams per cubic centimeter. To measure density, you need to know the mass of the object and its volume. The formula for density is density = mass/volume.

2. What factors affect the distribution of a population?

The distribution of a population is affected by a variety of factors, including the availability of resources, competition for resources, predation, migration, and environmental conditions. These factors can impact the survival, growth, and reproduction of individuals within a population, ultimately influencing how the population is distributed in a given area.

3. How does density impact population dynamics?

Density can have a significant impact on population dynamics. When a population reaches high densities, competition for resources increases, which can lead to decreased survival and reproduction rates. As a result, the population may experience a decline in numbers. On the other hand, low population densities may result in increased resources and therefore, improved survival and reproduction rates.

4. What is the difference between density-dependent and density-independent factors?

Density-dependent factors are those that are influenced by the size or density of a population. These can include competition for resources, predation, and disease. On the other hand, density-independent factors are those that impact a population regardless of its size or density. These can include natural disasters, climate change, and human activities.

5. How does the concept of density apply to the study of ecology?

In ecology, density is an important concept as it helps scientists understand how populations interact with their environment and with one another. By studying the density and distribution of different species, ecologists can gain insights into the relationships between organisms and their environment, as well as the factors that influence population dynamics. This information is crucial for understanding and managing ecosystems and conserving biodiversity.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
496
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
313
  • Poll
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
Replies
3
Views
622
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
832
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
4K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
Back
Top