Relative frequency and nonmeasurable sets

In summary: If we assume the continuum hypothesis (yes, I know that there is no good reason to assume that), then that means that we can come up with a total ordering \prec such that for every real x in [0,1], the set of all y such that y \prec x is countable (and the set of all y such that x \prec y is uncountable). Then letting S be the set of all pairs (x,y) such that 0 \leq x \leq 1 and 0 \leq y \eq 1 and x \prec y, S can't be measurable in the normal product measure on R^2.In summary, the unmeasurable set S
  • #1
stevendaryl
Staff Emeritus
Science Advisor
Insights Author
8,938
2,947
There is a conceptual puzzle that I don't understand about nonmeasurable sets. Take the unit interval [itex][0, 1][/itex] and let [itex]S[/itex] be some subset. Now, generate (using a flat distribution) a sequence of reals in the interval:

[itex]r_1, r_2, r_3, ...[/itex]

Then we can define the relative frequency up to [itex]n[/itex] as follows:

[itex]f_n = 1/n \sum_j \Pi_S(r_j)[/itex]

where [itex]\Pi_S(x)[/itex] is the characteristic function for [itex]S[/itex]: it returns 1 if [itex]x \in S[/itex] and 0 otherwise.

If [itex]S[/itex] is measurable, then with probability 1, [itex]lim_{n \rightarrow \infty} f_n = \mu(S)[/itex], where [itex]\mu(S)[/itex] is the measure of [itex]S[/itex]. But if [itex]S[/itex] is not measurable, then what will be true about [itex]f_n[/itex]? Will it simply not have a limit? Or since this is a random sequence, we can ask what is the probability that it will have a limit. Does that probability exist?
 
Last edited:
Physics news on Phys.org
  • #2
stevendaryl said:
If [itex]S[/itex] is measurable, then with probability 1, [itex]lim_{n \rightarrow \infty} f_n = \mu(S)[/itex], where [itex]\mu(S)[/itex] is the measure of [itex]S[/itex].

Why take a limit in order to create a paradox? Consider the probability distribution of the random variable [itex] f_1[/itex]. For a set [itex] S [/itex] that is measurable by the uniform probability density distribution [itex]\mu [/itex] on [0,1] , we have [itex] P( f_1 = 1) = \mu(S) [/itex]. If [itex] S [/itex] is not measurable then apparently [itex] P(f_1=1)[/itex] is undefined.

Until we define how [itex] f_1 [/itex] is distributed we can't proceed to the problem of defining [itex] lim_{n \rightarrow \infty} f_n [/itex]
 
  • Like
Likes Zafa Pi
  • #3
The dilemma's connected with sampling are interesting to discuss and difficult. As far as I can see there is no theoretical basis for talking about random samples in the measure theoretic approach to probability. There is no axiom in measure theory that says it is possible to take random samples. There are no formal definitions that allow us to discuss an event that has a probability and then "actually" happens or doesn't happen.

The current treatment of sampling is matter of applied mathematics. The theoretical basis for assigning probabilities to outcomes of sampling is an application of conditional probability, but if we look at the measure theoretic definition of conditional probability, it is defined abstractly as a ratio , not as a formalization of the concept than an event that has been assigned a probability changes its state from "potential" to "actual".

Is it possible to create a mathematical system that treats sampling in a rigorous manner? For example, how do we handle the dilemma that the probability of realizing any given random sample from a normal distribution is zero? Do we define sampling-with-finite-precision and then define taking a random sample from a continuous distribution as some sort of limit?

The practical person's outlook on unmeasureable sets is "If I can take independent random samples then I can come up with a practical estimate for the probability of an outcome being in a supposedly unmeasureable set". I don't have a good intuition about how such an attempt would fail!

Two thoughts:

1) Assume random sampling is done with a finite precision, so a random sample is drawn from a discrete distribution and "theoretical" random sampling from a continuous distribution is defined as some sort of limit as we let a family of discrete distributions approach (in some sense) a continuous distribution. Then perhaps the inability to assign a probability to an umeasureable set is a situation where such a limit fails to exist. For example, perhaps two different ways of letting the discrete family approach the same continuous distribution would imply two different probabilities for the unmeasureable set.

2) The objection to a Vitali set being measureable isn't that it can't be assigned a probability, but rather that we can't assign it a translation invariant probability. I don't know whether all unmeasureable sets have this feature.
 
Last edited:
  • #4
Stephen Tashi said:
The objection to a Vitali set being measureable isn't that it can't be assigned a probability, but rather that we can't assign it a translation invariant probability. I don't know whether all unmeasureable sets have this feature.

If we assume the continuum hypothesis (yes, I know that there is no good reason to assume that), then that means that we can come up with a total ordering [itex]\prec[/itex] such that for every real [itex]x[/itex] in [itex][0,1][/itex], the set of all [itex]y[/itex] such that [itex]y \prec x[/itex] is countable (and the set of all [itex]y[/itex] such that [itex]x \prec y[/itex] is uncountable). Then letting [itex]S[/itex] be the set of all pairs [itex](x,y)[/itex] such that [itex]0 \leq x \leq 1[/itex] and [itex]0 \leq y \eq 1[/itex] and [itex]x \prec y[/itex], [itex]S[/itex] can't be measurable in the normal product measure on [itex]R^2[/itex].
 
  • #5
It would be interesting to formulate a definition of "realizable" sets that would formalized the intuitive notion of sets that can actually be the result of random sampling. For a continuous real valued random variable, there are measurable sets (such as a single numerical value - a "point") that cannot be realized due to limitations on precision of measurements - or, more abstractly, limitations on the information that is provided by a sampling procedure.

We could skirt the issue of how and whether probable events become "actual" events by defining "realizable sets" as those that exist in the context of sampling from a discrete distribution. Perhaps that's a cowardly approach, but, at the moment, I don't see a good alternative.

Using that approach, to speak of "realizable sets" of a continuous distribution, we have to make some connection between discrete distributions and the continuous distribution. The first thought that comes to mind is to partition the range of the random variable up into disjoint intervals (each having nonzero probability) and consider the discrete distribution that assigns the probability of a given interval being realized as the probability measure assigned to that interval by the continuous distribution. The realizable sets would be the disjoint intervals and finite (or countable?) unions of such intervals. Intersections of realizable sets would not necessarily be realizable (e.g. two intervals intersecting at a point) so the realizable sets would not form a sigma algebra.

( Are there clever ways to use sampling that has a limited interval of precision to realize sets that are more interesting than unions of intervals? Can anything clever be done by transforming the random variable and realizing samples form the transformed distribution? )

The terminology "r is a realizable set of the continuous distribution F". would mean that there exists a partition of the range of F into disjoint intervals, each having non-zero probability, such that r can be expressed as a countable union of some of these intervals.

I've been ambiguous about whether and "interval" should be an open interval, closed interval, half-open interval, etc. because I think any type of interval can be allowed.

----
Another way of considering practical sampling is consider what is practical in terms of computer simulations. For example, if we have an algorithm that generates pseudo random numbers from uniform discrete distribution on 1,2,..N then we can approximate sampling from a uniform distribution on [0,1] by creating the sample in stages. First, pick one of N equal sub intervals of [0,1]. The subdivide that sub interval into N sub-sub intervals and pick one of those sub-sub intervals, etc.

It seems that this point of view also leads to defining realizable sets in terms of intervals. However, a simulation isn't limited by the interval-based precision of physical measuring instruments, so perhaps there is room for more imagination.
 

FAQ: Relative frequency and nonmeasurable sets

1. What is relative frequency?

Relative frequency is the proportion or ratio of the number of times an event occurs to the total number of trials or observations. It is often expressed as a decimal or percentage.

2. How is relative frequency used in science?

Relative frequency is used to estimate the probability of an event occurring based on past observations. It is also used to analyze and interpret data, particularly in statistics and experimental studies.

3. What are nonmeasurable sets?

Nonmeasurable sets are subsets of a larger set that cannot be assigned a precise numerical measure. This concept is often used in mathematics, particularly in the study of infinite sets and measure theory.

4. Why are nonmeasurable sets important?

Nonmeasurable sets are important because they demonstrate the limitations of mathematical models and the concept of measurement. They also have practical applications in fields such as economics, physics, and philosophy.

5. Can nonmeasurable sets be studied and understood?

Yes, nonmeasurable sets can be studied and understood through the use of mathematical concepts and theories. While they may not have a precise numerical measure, they can still be analyzed and their properties can be explored.

Similar threads

Replies
2
Views
1K
Replies
1
Views
1K
Replies
11
Views
497
Replies
5
Views
1K
Replies
44
Views
5K
Replies
1
Views
2K
Back
Top