Understanding probability, is probability defined?

bobby2k · Aug 4, 2013

I have taken a course in probability and statistics, and did well, but still I feel that I do not grasp the core of what holds the theory together. It is a little weird that I should use a lot of theory when I do not get the simple building block of the theory.

I am basically wondering if probability is defined in some way?

In the statistics books I have looked in, probability is not defined, but at the beginning of the book, they give a describtion of how we can look at probability, and this is usually the relative frequency model, but they never define it to be this?

These steps is what I seem to see in a statistics books, do they seem fair?

1. Probability is described in terms of events, outcomes and relative frequency, but never defined.
2. A lot of theory is then built regarding probability.
3. Then with the help of Chebychevs inequality, we are able to show that the relative frequency model is correct. That is, if the probability for an event is p, and X is a bernoulli random variable, then mean(X) will converge to p.

Do you see my problem? If we say that the probability for an event is p, then we can show that the relative frequency of the of the event in the long run is p. In order to show this, we used all the theory of linear combinations, variance etc.. But this means that the relative frequency model is a consequence of our theory, correct?

I mean, we can not say that the probability is the relative frequency, then develeop a lot of theory, and then prove that p equals the relative frequency, then we are going in a circle?

Stephen Tashi · Aug 4, 2013

bobby2k said:

I mean, we can not say that the probability is the relative frequency, then develeop a lot of theory, and then prove that p equals the relative frequency, then we are going in a circle?

If you look closely about what theorems in probably say about relative frequency, they only talk about the probability of relative frequency taking on a certain value. They may have wording such as "the probability approaches 1 as the number of trials approaches infinity", but this is still not a guarantee that relative frequency will behave in a certain manner - it just probably will.

Probability theory not circular in the way that you describe, but it is circular in that results of probability theory are results about probabilities of things, not guarantees of actual outcomes.

In the axiomatic statement of probability theory, probability is not defined in terms of relative frequency. It is defined abstractly as a "measure". If you look closely at the axiomatic development of probability theory ( the high class approach, not the approach taken in elementary texts) you will find that there isn't any discussion of whether an event actually happens or not. There isn't even any assumption that you can take a random sample - there are only statements that random variables of various kinds (which people think of as representing random samples) have certain distributions.

The mathematical theory of probability does not describe any way to measure "probability" in the same way that physical theories describe how to measure a quantity like "mass" or "force". It is not clear whether probability has any physical reality. If it does then it is rather mysterious. Consider how the probability of an event changes. If a prize is placed "at random" behind one of 3 doors, the probability of it being behind the second door is 1/3. If we open the first door and the prize is not there then the probability of it being behind the second door changes to 1/2. Does this involve a physical change in the doors? Does the probability change from 1/3 to 1/2 instantaneously or does it go from 1/3 up to 1/2 in a finite amount of time? The mathematical theory of probability does not deal with such questions. A person who applies probability theory may tackle them, but mathematically he is on his own.

economicsnerd · Aug 4, 2013

Stephen Tashi said:

In the axiomatic statement of probability theory, probability is not defined in terms of relative frequency. It is defined abstractly as a "measure". If you look closely at the axiomatic development of probability theory ( the high class approach, not the approach taken in elementary texts) you will find that there isn't any discussion of whether an event actually happens or not. There isn't even any assumption that you can take a random sample - there are only statements that random variables of various kinds (which people think of as representing random samples) have certain distributions.

^ This.
The most popular formalism for probability consists of (i) states of the world, (ii) events (where an event is just a collection of states), and (iii) a number attached to each event, which is just called the "probability" of said event.

The formalism exists even without any interpretation.

bobby2k said:

Then... we are able to show that the relative frequency model is correct. That is, if the probability for an event is p, and X is a bernoulli random variable, then mean(X) will converge to p.

Indeed, one popular interpretation of probability is the frequentist one. The (strong) law of large numbers---the theorem to which you alluded here---suggests that the formalism somehow agrees with the frequentist interpretation. It suggests that, if somebody really wants to think of probability in terms of long-run frequencies, then then it usually won't lead them astray in doing rigorous study of probability theory.

bobby2k · Aug 5, 2013

Stephen Tashi said:

In the axiomatic statement of probability theory, probability is not defined in terms of relative frequency. It is defined abstractly as a "measure". If you look closely at the axiomatic development of probability theory ( the high class approach, not the approach taken in elementary texts) you will find that there isn't any discussion of whether an event actually happens or not. There isn't even any assumption that you can take a random sample - there are only statements that random variables of various kinds (which people think of as representing random samples) have certain distributions.

So, probability in it's core, is just a measure of likelihood?, with 0 not happening, 1 certain to happen, and if P(A) > P(B), then it is more likey that the event A happens over B? We can say no more of probability as it is defined at the bottom of all the theory?

It may sound stupid, but I still feel that there is a gap between saying that probability is a measure of something, to us beeing able to calculate probabilities, make confindence-intervals and all that stuff.

From what I see we can do is this:

1. definie probability as a measure of likelihood as you said
2. define events, outcomes etc
3. define random variables, both continuous and discrete etc.
4. define probability distribution functions for the random variables
5. define expected values and variance
6. calculate expected values of linear combinations, show that the law of large numbers etc. holds(chebychev)

If I do the things in this list, I run into a problem at step 5. If I do not allready have the relative frequency model in the back of my mind, step 5 does not make any sence. I mean when I learned to understand the expected value, I thought of the probability as relative frequencies for expected values to make sense(it was the avarage in the long run, for this to work, we have to look at probability as frequencies). But I can not really do this, because this comes in step 6, after expected value have been defined. How is this explained?

Thanks for your time guys, this is really important for me to understand.

Stephen Tashi · Aug 5, 2013

bobby2k said:

1. definie probability as a measure of likelihood as you said

I did not say probability is defined as a "measure of liklihood". I just said it was defined as a "measure". A "measure" in mathematics is an abstraction of ideas connected with the physical ideas of length, area, volume etc. When we apply probability theory, we think of probability as a tendency for a thing to happen - but that thought is not expressed in the axioms of probability theory.

An attempt to define probability as "tendency for something to happen" or a "liklihood" merely offers undefined words such as "tendency" or "liklihood" for the undefined word "probability". Such a definition has no mathematical content. (As a matter of fact, the word "liklihood" has a technical definition in probability and statistics that is different than the man-in-the-street's idea of what liklihood means.)

You apparently are seeking a formulation of probability theory that somehow guarantees some connection between the mathematics of probability and applications to real world problems. There is no such mathematical theory. Applications of any sort of math to the real world involve assuming certain math is a correct model. There is no mathematical proof that mathematics applies to the real world. There is no mathematical proof or definition that says probability is a frequency of occurrence. The only connection between probability theory and observed frequency is that probability theory tells you about the probability of various frequencies.

The expectation of a random variable can be thought of as the average of taking infinitely many independent samples of the random variable, but such a thought is a way of thinking about how to apply probability theory. It isn't part of the mathematical theory of probablity.

bobby2k · Aug 5, 2013

Thanks, I still have some follow-ups, I hope that's ok, I am getting closer to the end though.

Does this step by step seem fair then:

1. Probability is a "measure" but undifined. However we say that it is a measure of how likely something will happen.
2. We define the basic probability axioms, these are mathematical.(P(S)=1 etc.)
3. We define dependent and indepent variables. We define that the measure of two independents events are going to happen, to be the product of those two individual measures.
4. We define expected value and variance mathematically, we don't give them any other meaning.
5. Since we know have defined the measure of indepentent events, if X is a bernoulli random variable, we get that the measure of mean(x) beeing close to p, approces one as the number of events goes to infinity. All this is still only matehmatical, and all it means is that the measure goes to 1.

Then we start assuming things:
6. Let's say there is a price between one of three doors. Since we assume that it is equally likely that each door has the price, P(door 1 has price)=1/3. Still, this is just a meassure of how likely the price is there.
7. Then we choose a door many times, and count how many times we are correct. Now in our real physical world, we assume that the it is equally likely to get the price each time, no matter what we get the previous times. Now we adopt the mathematical model, we say that since they are physically indepentent, we assume that their probabilities can me multiplied. Then we get that the probability that we will guess correct 1/3 of the times approces 1, as the number of trials goes to infinity.

What more do I need to do/assume, to be able to say that the relative frequency of the number of correct guesses will approch 1/3? Is it ok to say that since one axiom defines probability to be maximum 1, then we can say that it is extremely likely that the relative frequency will approch 1/3?

jostpuur · Aug 5, 2013

One of the most important lessons in philosophy is that nearly nothing can be defined properly. Everytime you define something, you use some other concepts in the definition, and the definitions of the used concepts become new problems.

The concept of probability is one of those eternal philosophical problems. It seems intuitive, but cannot be defined.

Mathematicians have rigor definitions for measures and random variables, but these definitions don't give answers to what probability is. In the mathematical approach, the intuitive idea of probability is assumed accepted in the beginning, and the theory is then developed with rigor mathematical definitions into which some intuitive interpretations are attached.

Science is not only about knowing as much as possible, but also about knowing what you don't know.

jostpuur · Aug 5, 2013

Keep in mind that likelihood has its own meaning in statistical inference. Likelihood and probability are different things, and in fact probability is needed in the definition of likelihood.

http://en.wikipedia.org/wiki/Likelihood_function

So likelihood should not be used rhetorically when attempting to define probability.

Stephen Tashi · Aug 5, 2013

bobby2k said:

Does this step by step seem fair then:

Can you explain what the goal of these steps is supposed to be?

1. Probability is a "measure" but undifined. However we say that it is a measure of how likely something will happen.

You aren't paying attention to the previous posts. It doesn't do any good, mathematically, to say that "probabllity" is a measure of "how likely" something is to happen. The idea of "how likely" contains no more information that the word "probability".

2. We define the basic probability axioms, these are mathematical.(P(S)=1 etc.)
3. We define dependent and indepent variables. We define that the measure of two independents events are going to happen, to be the product of those two individual measures.
4. We define expected value and variance mathematically, we don't give them any other meaning.

You are correct that the basics of probability theory are implemented as definitions.

5. Since we know have defined the measure of indepentent events, if X is a bernoulli random variable, we get that the measure of mean(x) beeing close to p, approces one as the number of events goes to infinity.

I think you want to phrase that in terms of N independent realizations of X and in terms of the mean of those realizations, not in terms of the single random variable X.

All this is still only matehmatical, and all it means is that the measure goes to 1.

Limits of things involving probabilities are complicated to state exactly. They are more complicated that the limits used in ordinary calculus. To make your statement precise, you'll have to study the various kinds of limits involved in probability theory.

Then we start assuming things:

6. Let's say there is a price between one of three doors. Since we assume that it is equally likely that each door has the price, P(door 1 has price)=1/3. Still, this is just a meassure of how likely the price is there.
7. Then we choose a door many times, and count how many times we are correct. Now in our real physical world, we assume that the it is equally likely to get the price each time, no matter what we get the previous times. Now we adopt the mathematical model, we say that since they are physically indepentent, we assume that their probabilities can me multiplied. Then we get that the probability that we will guess correct 1/3 of the times approces 1, as the number of trials goes to infinity.

Again, limits of probabilities are complicated. If the number of trials is not a multiple of 3, the fraction that are bernoulli "successes" can't be exactly 1/3. So the limit of the probability of getting exactly 1/3 successes doesn't approach 1 as the number of trials approaches infinity. To express the general idea that you have in mind takes more complicated language.

What more do I need to do/assume, to be able to say that the relative frequency of the number of correct guesses will approch 1/3?

You can't say that it "will" by any standard assumptions of probability theory. If you express your idea precisely, you can say "it probably will".

Is it ok to say that since one axiom defines probability to be maximum 1, then we can say that it is extremely likely that the relative frequency will approch 1/3?

Realize that when you say "extremely likely", you aren't saying anything that has mathematical consequences. You are just using words that make you feel psychologically more comfortable. There is no mathematical definition for "extremely likely" except in terms of "probability".

Look at the formal statement of the weak and strong laws of large numbers and look at the sophisticated concepts of limits that are used ("convergence in probability" and "almost sure convergence").

You aren't going to get around the fact that probability theory provides no guarantees about the observed frequency of events, or about the limits of observed frequencies except for those theorems that say something about the probability of those frequencies. You are presenting your series of steps as if the goal is to say something non-probabilistic about observed frequencies or to prove that "probability" amounts to some kind of observed frequency. This is not the goal of probability theory.

bobby2k · Aug 5, 2013

Stephen Tashi said:

Can you explain what the goal of these steps is supposed to be?

Thanks for still beeing in the thread, really appreciate it!
My goal in the steps is to have a pathway from the basic building-blocks to the more complex usage, and results. For instance I really liked that the theory of integration and derivation, can be built from the 10 basic axioms+the axiom of the least upper bound. It is very interesting when you see the complex theorems beeing built with starting with these axioms and then making the exterme value theorem, intermediate value theorem etc. and then going on. I want to see something similar in probability theory, but it is difficult.

You aren't going to get around the fact that probability theory provides no guarantees about the observed frequency of events, or about the limits of observed frequencies except for those theorems that say something about the probability of those frequencies. You are presenting your series of steps as if the goal is to say something non-probabilistic about observed frequencies or to prove that "probability" amounts to some kind of observed frequency. This is not the goal of probability theory.

Ok, I get that we can say that the probability of those frequencies goes to 1. But what does this mean then? That it is "probable" that the frequencies will behave like this?

Stephen Tashi · Aug 5, 2013

bobby2k said:

Ok, I get that we can say that the probability of those frequencies goes to 1. But what does this mean then?

As I said, if you want to know what it means, you have to deal with the various ways that limits involving probabilities are defined. To say "the probability of those frequencies goes to 1" is not a precise statement. (In fact the probability of observing a frequency of successes exactly equal to he probability of success for a bernoulli random variable that is the subject of a large number of independent trials goes to zero as the number of trials increases.) If you want to understand what probability theory says about the limiting probability of observed frequencies, you have to be willing to deal with the details of how the various limits are defined.

bobby2k · Aug 5, 2013

Can you reccomend a good book so that I will be able to learn what I need, to understand what I want?
I'd like it to be not so long, and easy to read if possible. I have not taken real analysis yet(or measure theory), but I have read about logic and set theory on my own, so I can del with that if the book contains it.

Stephen Tashi · Aug 5, 2013

bobby2k said:

Can you reccomend a good book so that I will be able to learn what I need, to understand what I want?
I'd like it to be not so long, and easy to read if possible. I have not taken real analysis yet(or measure theory), but I have read about logic and set theory on my own, so I can del with that if the book contains it.

I didn't encounter the various types of limits used in probability theory until I took graduate courses, so I can't recommend a book. I'll keep my eyes out for something online that explains the various types of limits ( usually referred to as types of "convergence of sequences of functions").

Perhaps some other forum member knows a good book.

jostpuur · Aug 6, 2013

bobby2k, it looks like you feel like looking for something, that cannot be found (at our time at least). IMO you have already understood the essential. You only need to calm down, take a step back, and try to see the big picture.

bobby2k said:

I mean, we can not say that the probability is the relative frequency, then develeop a lot of theory, and then prove that p equals the relative frequency, then we are going in a circle?

Yes, you are seeing a real problem with a circular thinking. If you define (or attempt to define) probability with frequencies, and then use probability concept to prove some basic frequency results, you are going in a circle.

Some of the basic results related to the frequencies are important, so I wouldn't speak bad about them. But if we are discussing attempts to define probability (which is philosophy IMO), the circular thinking should be recognized.

bobby2k said:

Ok, I get that we can say that the probability of those frequencies goes to 1. But what does this mean then? That it is "probable" that the frequencies will behave like this?

(The choice of words is confusing, but you can tell what's the point in the quote)

It means that we have assumed the probability concept defined and accepted, and then we have proven something technical / mathematical about probabilities of some sequences.

Stephen Tashi · Aug 6, 2013

bobby2k said:

My goal in the steps is to have a pathway from the basic building-blocks to the more complex usage, and results. For instance I really liked that the theory of integration and derivation, can be built from the 10 basic axioms+the axiom of the least upper bound. It is very interesting when you see the complex theorems beeing built with starting with these axioms and then making the exterme value theorem, intermediate value theorem etc. and then going on. I want to see something similar in probability theory, but it is difficult.

Probability theory is more of a tangle than single variable calculus. (In fact, I've read that developing probability theory is one of the main reasons that calculus was extended to include ideas like Stieltjes integration and the more general idea of "measures".)

In my opinion, mathematical topics have a "flat" and simple character when they involve the interaction between one kind of thing and a distinct kind of thing. For example, in introductory calculus, you study the limit of a function in the situation where the limit is a number. When a mathematical subject begins to study the interaction of a thing with the same kind of thing, it takes on the complexities of the "snake eating its tail" sort. For example, in real analysis you study the situation where the limit of a sequence of functions is another function. It turns out that this type of limit can be defined in several different non-equivalent ways, so even the definition of limits becomes complicated.

In probability theory, if you think of the object of study as a single "random variable" then the situation appears "flat". However, as soon as you begin to study anything involving several samples from that random variable, you introduce other random variables. Typically you have one random variable (with it's associated mean, variance etc.) and you have some sampling procedure for it. The sample value is itself a random variable. (Technically you aren't supposed to say things like "the mean of the sample 'is' 2.35" since "2.35" is only a "realization" of the sample mean. Of course both non-statisticians and statisticians say such things!) Since the sample mean is a random variable, it has its own mean and variance. The variance of the sample is also a random variable and has its own mean and variance. There is even ambiguity about how the quantity "the sample variance" is defined.

justpuur says to be calm. I'll put it this way. As you study math, you will find calm quiet areas where complex things are developed from simple things. However, there are also many turbulent places where things are developed from the same kind of thing. Don't get upset when this happens. Don't get upset because theorems in probability theory only tell you about probabilities.

jostpuur · Aug 6, 2013

I just understood some stuff about this thread.

For example, in calculus, beginner students often feel like there is something about infinitesimals that they have not understood, but which could be understood (which is true). Then they also observe that they are unable to solve some technical calculation problems. Then, from the point of view of the beginner student, it might make sense to contemplate on the infinitesimals, because it seems like it could be, that better understanding of infinitesimals could lead to improved capability to solve technical calculation problems.

Then it takes some time for the student to learn that actually better understanding of infinitesimals will not improve capability to solve technical calculation problems, but anyway, it seemed a reasonable idea from the point of view of a beginner.

In this thread bobby2k began with quite philosophical touch (IMO), asking about circular definitions with sequences and so on. But then...

bobby2k said:

My goal in the steps is to have a pathway from the basic building-blocks to the more complex usage, and results.

Ok so now bobby2k is only wanting to learn the definitions for sake of pratical applications?

These are elements of the thread:

There are some philosophical problems related the definition of probability.

There are complicated probability problems, whose mathematical treatment isn't obvious (at least not to everyone, beginners, us...)

Perhaps better understanding of the definitions would lead to better capability to solve technical problems?

Well there's no way to know in advance what turns out to be useful and what useless. You have to keep open mind, and remember not to get stuck in ideas to don't seem to lead anywhere.

atyy · Aug 6, 2013

The idea that probability is relative frequency is not part of the mathematical structure of probability theory. The mathematical theory just defines abstract mathematical concepts with names like measure and expectation. When we say that probability is relative frequency, we are interpreting the mathematics and giving the abstract concepts operational meaning, so that the mathematics has the possibility of being used to describe and predict the results of experiments.

It is the same with geometry. Points and lines are abstract concepts. When you think of a point as the mark you make with a pencil on paper, then you are interpreting the mathematics.

In both cases, the mathematics exists without the science. Probability theory exists without relative frequency, and geometry exists without pencil and paper. The idea of probability as relative frequency, or that a point is something you draw with a pencil on paper are additional things you add so that you can pass from mathematics to science.

In Bayesian interpretations of probability, probability is not necessarily relative frequency. Some Bayesian interpretations, like de Finetti's are beautiful, if impractical to carry out exactly. Others are very practical and powerful, for example in providing evidence for a positive cosmological constant like in http://arxiv.org/abs/astro-ph/9812133 and http://arxiv.org/abs/astro-ph/9805201.

bobby2k · Aug 8, 2013

Thanks for your patience guys. I was maybe not clear enough in my question. Maybe a better formulation would have been, "why does probability theory work, when the intuiative(relative frequency) part of probability, is not defined in the axioms?".

jostpuur said:

I just understood some stuff about this thread.
Ok so now bobby2k is only wanting to learn the definitions for sake of pratical applications?

You may say that, it was not meant as a deep question. But it isn't really to do better in practical applications. Because just by having the intuitive explanation in the back of my mind, I can solve all the problems. But it is more rewarding to understand why we can use this to solve the problems.

I think my main problem is/was that I struggle to see where we go from the math to making assumptions and making a model. It seems a lot easier in physics, there you may model a car with friction, it is clear what is the mathematical model, and what is the real thing. I mean, I thought that flipping coins hypothetically could be part of the mathematics, but it seems as though we have moved out of the probability world, even though it is just hypothetical.

I think I have finally wrapped my head around what many of you say. That the CLT only says something about what the probability of an event is, nothing more about what probability really is.

I made a picture to try and communicate how I view it now. Have I understood it?:

bahamagreen · Aug 8, 2013

A strange thing about probability is that it is not like other fundamental theories - it is not time reversible even at the smallest scale.

An event (to happen at a particular time) is said to have a probability before that time, but afterwards, what happens to that probability? It disappears or changes into a certainty?

That is, one can only use probabilities to describe things in the unknown future, not the certain past. One does not continue to say that the probability of a past event is 1/3 or 1/5; after the event the historical probability would have to be 1 or 0... but what if you don't know yet?

It gets more mysterious when a past event would seem to have either occurred or not, but we have not checked it yet to know which way it came out... can there be a probability for the person who has not checked, and a certainly for one who has?

bobby2k · Aug 10, 2013

That is some very interesting points bahamagreen. :)

But do you guys think that at the basic level, my picture above describes the interaction with probability and using probability in an adequate way? I am eager to get closure. :)

Stephen Tashi · Aug 10, 2013

bobby2k said:

But do you guys think that at the basic level, my picture above describes the interaction with probability and using probability in an adequate way?

No. People who apply probability theory (correctly know that probabilities do not represent relative frequencies. And you haven't yet dealt with the meaning of "mean(x) -> p".

To apply math, you need "understanding", not "closure".

bobby2k · Aug 10, 2013

Stephen Tashi said:

No. People who apply probability theory (correctly know that probabilities do not represent relative frequencies. And you haven't yet dealt with the meaning of "mean(x) -> p".

To apply math, you need "understanding", not "closure".

But is the text I have marked in red in my statistics book here then wrong? It deals with interpreting a confidence interval, but it may as well be interpreting the probability of a confidence interval.

Stephen Tashi · Aug 10, 2013

bobby2k said:

But is the text I have marked in red in my statistics book here then wrong?

Yes. It's wrong. The problem is the the statement that "A will occur 95% of the time". If you want to say something like "I'd be willing to bet that A occurs about 95% of the time" or "We will do calculations assuming A occurs 95%" of the time, those statements could be called an "interpretation".

As I pointed out before, if an event has probability .95 of occurring and you conduct a large number of independent trials, the probability that the event happens with a relative frequency of exactly .95 in the trials approaches zero as the number of trials approaches infinity.

By the way, from that page, it looks like your textbook is about to make an important point regarding confidence intervals. Do you understand how to interpret confidence intervals correctly?

atyy · Aug 10, 2013

bobby2k said:

But do you guys think that at the basic level, my picture above describes the interaction with probability and using probability in an adequate way? I am eager to get closure. :)

I'm not sure whether it is "exactly" ok, but it looks fine to me. My feeling is that one shouldn't lose sleep over this.

I don't think the problem is related only to probability theory. What is an electron? It is a particle that is deflected by an electric field. What is an electric field? It is a thing that deflects electrons. There is no problem if we consider electron and electric field as mathematical objects, but what happens if I give you an unidentified particle and ask you to show me that it is an electron?

So to connect mathematics with physics, it seems we always need some circularity. We accept as useful the mathematics and the interpretation as long as their predictions are consistent with observation.

From David O. Siegmund's Britannica article:
"Insofar as an event which has probability very close to 1 is practically certain to happen, this result justifies the relative frequency interpretation of probability. Strictly speaking, however, the justification is circular because the probability in the above equation, which is very close to but not equal to 1, requires its own relative frequency interpretation. Perhaps it is better to say that the weak law of large numbers is consistent with the relative frequency interpretation of probability."

bobby2k · Aug 16, 2013

atyy said:

I'm not sure whether it is "exactly" ok, but it looks fine to me. My feeling is that one shouldn't lose sleep over this.

I don't think the problem is related only to probability theory. What is an electron? It is a particle that is deflected by an electric field. What is an electric field? It is a thing that deflects electrons. There is no problem if we consider electron and electric field as mathematical objects, but what happens if I give you an unidentified particle and ask you to show me that it is an electron?

So to connect mathematics with physics, it seems we always need some circularity. We accept as useful the mathematics and the interpretation as long as their predictions are consistent with observation.

From David O. Siegmund's Britannica article:
"Insofar as an event which has probability very close to 1 is practically certain to happen, this result justifies the relative frequency interpretation of probability. Strictly speaking, however, the justification is circular because the probability in the above equation, which is very close to but not equal to 1, requires its own relative frequency interpretation. Perhaps it is better to say that the weak law of large numbers is consistent with the relative frequency interpretation of probability."

My main problem when I started the thread was that I thought that the law of large numbers somehow validated that we could look at probabilities as relative frequencies(since both contained relative frequencies). But as we can read in the link you give says that the law of large numbers is consistent with the relative frequency interpretation. So this means that if we choose to use the relative frequency interpretation, then the mathematical theory seems "fair" to use?

Stephen Tashi said:

No. People who apply probability theory (correctly know that probabilities do not represent relative frequencies. And you haven't yet dealt with the meaning of "mean(x) -> p".

To apply math, you need "understanding", not "closure".

But how would you connect the relative frequency interpretation to the axiomatic mathematical theory? I do not mean a specific connection in the sense of a theorem that guarantees something. But some connection there must surely be? And the only connection I see is that when people assume probabilities represent relative frequencies, then the axiomatic theory seems fair. Do you agree that the connection comes only when you choose how to view a probability?

Stephen Tashi said:

Yes. It's wrong. The problem is the the statement that "A will occur 95% of the time". If you want to say something like "I'd be willing to bet that A occurs about 95% of the time" or "We will do calculations assuming A occurs 95%" of the time, those statements could be called an "interpretation".

As I pointed out before, if an event has probability .95 of occurring and you conduct a large number of independent trials, the probability that the event happens with a relative frequency of exactly .95
in the trials approaches zero as the number of trials approaches infinity.

By the way, from that page, it looks like your textbook is about to make an important point regarding confidence intervals. Do you understand how to interpret confidence intervals correctly?

I have to admit that I interpret them as the book writes, that if the confidence level is 0.95, then about 95 percent of confidence intervals in the long run will contain the parameter, because of this we can be "confident" that the paramter is in the interval we made, even if we just make one.
I have done some research about the subjects you have talked about and the thing is that they are part of the upper courses in the bachelor, and a master in mathematics in for example stochastic analysis. At my school atleast people taking even a master in statistics does not learn about these subjects(measure theory etc.) And think about how many people learn statistics(echonomists, social sciences etc.), surely not all of these will have learn about the advanced mathematical theory. This must mean that there is an adequate way to understand probability, without a master in mathematics?

Stephen Tashi · Aug 16, 2013

bobby2k said:

But how would you connect the relative frequency interpretation to the axiomatic mathematical theory?

The first thing to do would be to state precisely what is meant by "the relative frequency interpretation of probability". State the interpretation with the technical details. I don't think the relative frequency interpretation of probability is naive.

I have to admit that I interpret them as the book writes, that if the confidence level is 0.95, then about 95 percent of confidence intervals in the long run will contain the parameter, because of this we can be "confident" that the paramter is in the interval we made, even if we just make one.

I think the point your book headed toward is that, for example, if you sample so you have a 95% confidence interval for the mean that is bounded by plus or minus 0.30 and the sample mean is 25 then you cannot say that there is a 95% probability that the population mean is in the interval 24.70 to 25.30.

I have done some research about the subjects you have talked about and the thing is that they are part of the upper courses in the bachelor, and a master in mathematics in for example stochastic analysis. At my school atleast people taking even a master in statistics does not learn about these subjects(measure theory etc.) And think about how many people learn statistics(echonomists, social sciences etc.), surely not all of these will have learn about the advanced mathematical theory. This must mean that there is an adequate way to understand probability, without a master in mathematics?

People can adequately work problems and apply various techniques without understanding them. Look how many people can work some calculus problems and don't understand the definition of limit.

If you look at applications of probability and statistics published in journals, there definitely are people who do good work without using measure theory. You should be able to understand that probability is not relative frequency without knowing measure theory.

(Published applications are peer reviewed and errors are likely to be caught. If you look at non-peer reviewed applications of probability (for example, the source code of computer programs or reports that are not published in journals) you find all sorts of errors in applying probability theory.)

bobby2k · Aug 17, 2013

Stephen Tashi said:

I think the point your book headed toward is that, for example, if you sample so you have a 95% confidence interval for the mean that is bounded by plus or minus 0.30 and the sample mean is 25 then you cannot say that there is a 95% probability that the population mean is in the interval 24.70 to 25.30.

Yes good point, fortunately I have allready spent some time reflecting over this, so this I understand.

Stephen Tashi said:

The first thing to do would be to state precisely what is meant by "the relative frequency interpretation of probability". State the interpretation with the technical details. I don't think the relative frequency interpretation of probability is naive.

Ok, I will try to make a precise definiton, I have never done this before, but I hope we can work with it.

I will make 3 help definitions first.

definition: simple experiment [itex]S_{A}[/itex]
This is an experiment that can be repeated, and it contains an outcome we denote A.

definition: big experiment [itex]B^{id}_{S_{A}}[/itex]
This is an experiment where we conduct a simple experiment [itex]S_{A}[/itex] a number of times, we can conduct it however many times we want. The id is there for notational purposes, and to emphasize that the results of each big experiment will be different.

In each big experiment we define a function:
[itex]F_{B^{id}_{S_{A}}}(n): N \rightarrow Q[/itex]
It is a function of n, but the notation tells us that each function will be different for each simple experiment, and for each big experiment(each big experiment will have different id's).
Since we can repeat the simple experiment in the big experiment however many times we want, N is the domain.
The value of the function is calculated as follows: Look at the n first simple experiments of the big experiment, and count in how many of these the event A happened, denote this [itex]n_{A}[/itex], then
[itex]F_{B^{id}_{S_{A}}}(n) = \frac{n_{A}}{n}[/itex]

Now we can define the frequency interpretation of probability.

defintion: relative frequency probability
We say that an event A in a simple experiment [itex]S_{A}[/itex] has a relative frequency probability P(A)=[itex]p_{A}[/itex] if and only if the following applies:

For which ever big experiment [itex]B^{*}_{S_{A}}[/itex](id=*) and whichever [itex]\epsilon>0[/itex] there exists a number number [itex]n^{*}_{\epsilon}[/itex].
such that whenever n[itex]\geq n^{*}_{\epsilon}[/itex] then:
[itex]|F_{B^{*}_{S_{A}}}(n)-p_{A}| < \epsilon [/itex], please note that [itex]n^{*}_{\epsilon}[/itex] depends on both [itex]\epsilon[/itex] and also on the id, the last thing means that how long it takes for a relative frequency to become as close as we want to [itex]p_{A}[/itex] will vary from time to time, but every time it will eventually happen.

end of definition

Is this something I can work with to make the connection I asked for?

Stephen Tashi · Aug 17, 2013

bobby2k said:

Is this something I can work with to make the connection I asked for?

Since I've had no interest in the "the relative frequency interpretation of probability" prior to this thread, I don't know if your definition of the relative frequency interpretation of probability is a standard definition. Perhaps "the relative frequency interpretation" of probability is simply a philosophical way of looking at things (e.g. the "frequentist" approach to analyzing situation as opposed to the Bayesian approach) instead of a precise mathematical assumption. Since you are the one interested in "the relative frequency interpretation", I leave the task of researching what it really is to you. But we should be clear whether we are discussing some standard definition of "the relative frequency interpretation of probability" or your personal definition of it.

there exists a number number [itex]n^{*}_{\epsilon}[/itex].
such that whenever n[itex]\geq n^{*}_{\epsilon}[/itex] then:
[itex]|F_{B^{*}_{S_{A}}}(n)-p_{A}| < \epsilon [/itex], please note that [itex]n^{*}_{\epsilon}[/itex] depends on both [itex]\epsilon[/itex] and also on the id, the last thing means that how long it takes for a relative frequency to become as close as we want to [itex]p_{A}[/itex] will vary from time to time, but every time it will eventually happen.

You have made a precise definition of an axiom that I suspect is inconsistent with probability theory. For example in flipping a fair coin (one flip = one "simple" experiment) you claim that , for each given [itex] \epsilon [/itex] and for each given sequence of flips that such an [itex] n^* [/itex] exists.

In your notation for [itex] id [/itex] , when we talk about [itex] n > n^* [/itex] this must either must be interpreted as a claim about the infinite set of experiments that give all possible ways the given experiment "[itex] id [/itex]" or it must be interpreted as a claim that [itex] id [/itex] is one particular infinite sequence of coin tosses. By the latter interpretation, certain infinite sequences of coin tosses are "prohibited" (the ones where the relative frequency of heads in the first [itex] n [/itex] terms does not eventually stay within [itex] \epsilon [/itex] of 1/2.

In the conventional probability space for tossing a fair coin, each initial finite sequence of coin tosses has the same probability and each infinite sequence of tosses has zero probability, Can you define an alternative probability space for coin tosses where your assumption would hold?

You assumption brings to mind an attempt to define probability theory that I once read about. I think the author was name Von Mieses (edit: the web says it was "Von Mises") and he used infinite sequences that were called "Kollectives" or something like that. I don't know if his theory was shown to be inconsistent or whether it simply dropped out of fashion.

atyy · Aug 17, 2013

bobby2k said:

My main problem when I started the thread was that I thought that the law of large numbers somehow validated that we could look at probabilities as relative frequencies(since both contained relative frequencies). But as we can read in the link you give says that the law of large numbers is consistent with the relative frequency interpretation. So this means that if we choose to use the relative frequency interpretation, then the mathematical theory seems "fair" to use?

That's my understanding.

bobby2k · Aug 23, 2013

Hello again

I do not think I will pursue a better definition, because if I do get one, I am not sure what I will do with it. But I have searched a lot on the Internet and acquired some books with some points, I would like to discuss.

As far as the definition is concerned, a book states it like this:

I have some comments about earlier posts, but first I must share some things from the books. It is about what you said that people who uses probability correctly knows it is not a relative frequency.

Let's look at the three axioms. Obviously the first two agree with the relative frequency interpretation. And another cool book I found shows that the third also agrees with the classical interpretation, and hence it is easy to see that it agrees with the relative frequency interpretation.

Also it can be showed that the definition of dependence and independene agrees good with the relative frequency interpretation as showed below:

Now the relative frequency must also work with random variables, because here we just attatch a distribution to a variable.

And lastly the relative frequency distribution must agree with independent linear combinations of stochastic variables, and the probability distribution of a sum must also agree with the relative frequncy interpretation. Because we can add them together two at a time, and when we want the probability that the sum of two variables is a certain number, we integrate/add over the two individual values that give us the respective sum value, and we have allready showed the relative freq. prob. satisfies multiplying independent events, and also the third axiom to add mutually exclusive events, and the events we add are offcourse mutually exclusive.

And from this the law of large numbers must also satisfy the rel. freq. prob. since it is built only the preceding concepts.

Now comes my questions:

Stephen Tashi:
If it is not ok to view probability as relative frequencies, why does this interpretation seem to fit so perfectly with how they made probability-theory? I have read a justification for choosing the third axiom, with thinking about it as a meassure, or a combination of set theory and a belief that P(A) ≤ P(B), if A is a subset of B. But if we were not to look at relative frequency or classical probability, what is then the justification for the definition of dependence or independence. Above they showed that this definition works perfect with the rel. freq. prob., but if we are not bound by this interpretation, why not for instance define P(A|B)=P(A and B)/(P(B)*pi)?, or whatever we like as long as it does not conflict with the axioms?
I am going to learn more about the the weak and strong law as you mentioned earlier, but it will take some time to master the technical aspects I think. But in one book, they say that both the strong and week law give tells us that the rel. freq. interpretation is a good one when they use it with bernoulli variables(I think they said the strong one was better).

atyy:
Atyy we seemed to agree that the law of large numbers gave a justification for the rel. freq. prob. interpretation. But do you agree that it may be better to say that the connection is that all of our definitions and axioms in the axiomatic probability theory are chosen in a way to not conflict with the rel. freq. prob. interpretation, and hence this gives the connection, so when we they chose to make the mathematical probability-theory this way, it is not at all a coincidence that the law of large numbers also coincides with rel. freq. interpretation?Also thank you very much to both of you for staying in the discussion, I hope we can end it very soon. I have now started with a new subject in statistics at university, it is interesting, but I have to admit that it annoys me some that we are now continuing with new theory, but the very foundation of the theory was not examined and learned properly.

Stephen Tashi · Aug 23, 2013

It gets too confusing to discuss more than one or two issues per post. Let's start with the first passage you mentioned.

Look at the statement [itex] P(A) = \lim_{n \rightarrow \infty}\frac{n_A}{n} [/itex]. This is an intellectually dishonest statement. It treats a stochastic outcome [itex] n_A [/itex] as if it were an ordinary real valued variable and it invites the reader to think that [itex] lim_{n \rightarrow \infty} [/itex] refers to the type of limit defined in elementary calculus.

Look for other sources to find a respectable statement of what "the relative frequency interpretation of probability" is.

You should also study the Law of Large Numbers. You like to mention it - figure out what it really says.
The limits involved in the weak and strong law of large numbers do not have the same definition as a limit in introductory calculus. These limits have modifiers like "in probability" or "almost surely". Until you work through how these limits are defined, you don't understand the law of large numbers.

(By the way, the Von Mises approach to probability theory should be completely written off. In browsing the web, I see that people may have fixed it up to be rigorous and it apparently has a connection to attempts to connect randomness with computability.)

bobby2k · Sep 6, 2013

Hello again Stephen Tashi

I have now found a discussion of the Law of large numbers that incorporates the relative frequency.

then we get

From the definition of allmost sure convergence(given earlier in the text), this means that we can say that.

If A is an event, and [itex]n_{A}[/itex] is the number of events that A occours.
[itex]P(\stackrel{lim}{n \rightarrow \infty}\frac{n_{A}}{n}=P(A))=1[/itex]
Do you agree that this limit exists and is correct defined? This is basically the strong Law of large numbers but with a fraction instead of the mean. I have used this from a big book, I hope you can accept this limit.

If you agree with the above limit, then I allmost have what I want. We can call the event
K:
[itex]\stackrel{lim}{n \rightarrow \infty}\frac{n_{A}}{n}=P(A)[/itex], and we have that
P(K)=1

If you agree with the above, my only remaining problem is with the fact that in the end of the text they say "we are essentially certain that the relative frequency of an event will converge to the probability of the event". Because from what I have seen in probability theory we have not defined what happens when P(B)=1 for an event B. I am wondering if we can use set-theory to show that if P(B)=1, then the event B must contain every "simple outcome" in the sample Space S.
Because if we can do this, we can define that the event B "will occur" if it contains every simple outcome in S.

Is the analysis above correct and precice enough? If this is enough, the only thing that remains for me to do is to show that if
P(B)=1, for an event B.
then B=S, or S[itex]\subseteq[/itex]B, (since offcourse we allready have B[itex]\subseteq[/itex]S)

Stephen Tashi · Sep 6, 2013

bobby2k said:

Do you agree that this limit exists and is correct defined?

The limit in the theorem is a special kind of limit. That's why there is an "a.s" in the notation of the limit. This is not the kind of limit used in ordinary calculus. So you can't say an "a.s" limit implies the existence of the kind of limit used in ordinary calculus.

if we can use set-theory to show that if P(B)=1, then the event B must contain every "simple outcome" in the sample Space S.

That isn't true. For example, the probability that a number selected from a uniform distribution on [0,1] is a rational number is 1. [Edit: Per Mr. Anchovy's correction in the following post, I meant to type "is an irrational number". The probability of selecting a rational number is zero. ] But in the probability space for this distribution the set of simple outcomes consists of all numbers in [0,1].

pbuk · Sep 6, 2013

Stephen Tashi said:

For example, the probability that a number selected from a uniform distribution on [0,1] is a rational number is 1. But in the probability space for this distribution the set of simple outcomes consists of all numbers in [0,1].

Woah, hang on there:

I didn't think it was possible to select a number from a uniform distribution on [0,1]
if it is possible, then as there are countably many rational numbers but uncountably many irrational numbers in this interval how can P(rational) = 1? If anything, I suppose P(irrational) could be 1, which would not change the implication on the statement regarding the set of simple outcomes.

Stephen Tashi · Sep 6, 2013

MrAnchovy said:

Woah, hang on there:

I didn't think it was possible to select a number from a uniform distribution on [0,1]

I agree, but that doesn't deter probability theory from talking about it.

if it is possible, then as there are countably many rational numbers but uncountably many irrational numbers in this interval how can P(rational) = 1? If anything, I suppose P(irrational) could be 1, which would not change the implication on the statement regarding the set of simple outcomes.

Thanks for catching that. I meant to type "irrational number" in my post. I've edited it to fix that.

Understanding probability, is probability defined?

Similar threads

Hot Threads

Recent Insights