Meaning of PDFs in the context of statistics

  • Thread starter Cole A.
  • Start date
  • Tags
    Statistics
In summary: In probability theory, we use PDFs and PMFs to describe random variables. (The term "random variable" carries a certain connotation of "unpredictable variations when repeated.") For example, the number of heads we will see in 50 flips of a coin is a random variable. This number will change from one set of 50 flips to the next. And the changes cannot be predicted beforehand. The random variable could be described by a Bin(50, p) PMF. We could even set p = 0.5 if the coin was fair. But in statistics, PDFs and PMFs still seem to be used, to describe any quantity that we do not know for
  • #1
Cole A.
12
0
This is a really basic point that I am getting held up on.

In probability theory, we use PDFs and PMFs to describe random variables. (The term "random variable" carries a certain connotation of "unpredictable variations when repeated.") For example, the number of heads we will see in 50 flips of a coin is a random variable. This number will change from one set of 50 flips to the next. And the changes cannot be predicted beforehand. The random variable could be described by a Bin(50, p) PMF. We could even set p = 0.5 if the coin was fair.

But in statistics, PDFs and PMFs still seem to be used, to describe any quantity that we do not know for sure --- even when there isn't a scope of repetition associated with that quantity, i.e. the quantity has a real, fixed, unchanging value, just presently unknown to us. This is what is confusing me: If someone looks at a building and says that its height in feet is described by N(100, 50), and another claims that its height is described by Unif(0, 200), what are they saying exactly? The building's height is an absolute, fixed, unchanging number. What meaning does a PDF possibly have in this context?
 
Physics news on Phys.org
  • #2
We use the probability functions because, even though the result of a particular event is not predictable, the pattern of the unpredictability is, itself, predictable. You should have seen this already. It is very useful, for example, to realize that the sum of two dice is most likely to be 7, and cannot be more than 12 or less than 2. There are fortunes being made by being able to predict just how often each possible number will show up in many rolls.

In the case of a physical object's height, it is an open matter as to whether or not it has one certain and absolute height at a particular time or not as there is no way of telling.

All we can do is measure it to finite precision with our equipment. When someone reports a PDF for a distribution of possible heights, they are telling you that the outcome of many repeated measurements of the height will return values that conform with that distribution.

If someone says "I will build you a building with a height distributed as follows..." then they are telling you how well they can build to the specification. This is why we have margins for error and error analysis.
 
  • #3
Simon Bridge said:
When someone reports a PDF for a distribution of possible heights, they are telling you that the outcome of many repeated measurements of the height will return values that conform with that distribution.

This clarified things immensely for me. Thank you.
 
  • #4
Cole A. said:
If someone looks at a building and says that its height in feet is described by N(100, 50), and another claims that its height is described by Unif(0, 200), what are they saying exactly? The building's height is an absolute, fixed, unchanging number. What meaning does a PDF possibly have in this context?

Unless you are studying Bayesian statistics, you don't find such statements in a statistics text.
In "frequentist" statistics, the kind normally studied in introductory courses, you would not find the height of one building described by a probability distribution. You might find the height of a randomly selected building from a population of buildings described by a distribution. You might find the measured height of a single building described by a probability distribution if the measurement has a random error.

If you are studying Bayesian statistics, you might use a probability distribution for the height of one building. The distribution can be regarded as stating a "belief" about the height or you can pretend that when the building was built, it's height was selected at random from a population of possible heights that might have occurred.
 
  • #5
Hmmm... and N(100,50) feet seems a little odd - that would be a normal distribution with a mean at 100' and a variance of 250' ... suggesting there is a non-zero probability that the "building" is actually a basement. (Quite aside from that the tails of the normal distribution don't hit zero so there's a faint chance that the building is actually a space-elevator or a tunnel through the Earth.)
 

FAQ: Meaning of PDFs in the context of statistics

What does PDF stand for?

PDF stands for Probability Density Function.

What is the meaning of PDF in statistics?

In statistics, PDF refers to a function that describes the probability of a continuous random variable taking on a specific value or falling within a certain range of values. It is used to model the distribution of data and calculate the likelihood of certain outcomes.

How is a PDF different from a CDF?

A PDF is the derivative of a Cumulative Distribution Function (CDF). While a PDF gives the probability of a specific value, a CDF gives the probability of a value less than or equal to a specific value. In other words, a PDF shows the likelihood of a single value, while a CDF shows the likelihood of a range of values.

What is the importance of PDFs in statistics?

PDFs are important in statistics because they allow us to understand the distribution of data and make predictions about future outcomes. They are used in a variety of statistical analyses, such as hypothesis testing, regression, and modeling.

How are PDFs used in real-world applications?

PDFs are used in a wide range of real-world applications, including finance, engineering, and social sciences. They are used to model and analyze data in fields such as risk assessment, market analysis, and population studies. PDFs are also used in machine learning and artificial intelligence algorithms to make predictions and decisions based on data.

Similar threads

Back
Top