Optimizing Data Sampling for Probabilities

In summary, there is no one mathematical method to determine the optimum sampling of data for probabilities. It depends on the specific instance and the data being sampled.
  • #1
Loren Booda
3,125
4
Is there a mathematical method to determine the optimum sampling of data for probabilities?

Flip a coin. Simplistically speaking from experience, it has a 1/2 chance of landing on either side. But what if it can land on its edge? What if it can fall through a crack? What if lava from a fissure invading the room can envelop and melt the coin? What if it can quantum mechanically flip itself after landing? Other examples of probability, like the nonlinear trajectory of a particle, have determinism not immediately apparent.

Even an electronic random number generator run by a quantum computer is susceptable to decoherence between the device and the observer. It seems that we must have extensive practical knowledge about the system under observation, then apply Occam's razor, if we are to determine the set of data required. But how may this be done systematically?
 
Physics news on Phys.org
  • #2
Loren Booda said:
Flip a coin. Simplistically speaking from experience, it has a 1/2 chance of landing on either side. But what if it can land on its edge? What if it can fall through a crack? What if lava from a fissure invading the room can envelop and melt the coin? What if it can quantum mechanically flip itself after landing? Other examples of probability, like the nonlinear trajectory of a particle, have determinism not immediately apparent.
You could flip it until it behaves normally. :-p

In fact, that's a common practical solution in the gaming world -- one keeps rerolling a die until it doesn't fall off the table or lean against something or whatever.
 
  • #3
By "normal" one might mean "average." How does one determine mathematically how many tries one needs to achieve average? Don't methods like standard deviation incorporate their own error, ad infinitum?

Overall, how and when can we be assured of precision's reproducibility?
 
  • #4
But that's not what was meant when I said normal. I meant for "behaving normally" to be "lands heads up or lands tails up".
 
  • #5
Duly noted.

Please allow me to repeat [with editing]:
How does one determine mathematically [with a huge number of interacting physical variables] how many tries one needs to achieve [a significant] average?

Don't [results] like standard deviation incorporate [errors of their own, each with their own statistical deviations] ad infinitum?

Overall, how and when can we be assured of precision's reproducibility?
 
  • #6
You decide before hand what consitutes the sample space - either heads or tails. Any other outcome is deemed inadmissable. Why? Because that is what we want, and has nothing to do with mathematics. Mathematics is merely a tool for modelling, in this instance. Whether real life behaves sufficiently close to the model for the model to be valid is a different matter. There are plenty of tests to work out whether sample data is likely to have come from a population with assumed properties; they are taught to high school students such as confidence intervals; I'm surprised you've not met them. Then there is the strong law of large numbers, chi squared tests, t tests, ANOVA, et c.
 
Last edited:
  • #7
Your examples are worthwhile studying. Do you know of an online tutorial that compares most of them?
 

FAQ: Optimizing Data Sampling for Probabilities

What is the importance of prioritizing statistical data?

Prioritizing statistical data is crucial because it allows scientists to focus on the most relevant and significant information. By prioritizing data, scientists can make informed decisions and draw accurate conclusions, leading to more meaningful and impactful research.

How do you determine which statistical data to prioritize?

The process of prioritizing statistical data involves identifying the research question or objective, understanding the available data sources, and evaluating the quality and relevance of each dataset. It is essential to consider the reliability, validity, and representativeness of the data to determine which should be prioritized.

What are the potential challenges in prioritizing statistical data?

One of the main challenges in prioritizing statistical data is the availability and accessibility of data. Some datasets may be difficult to obtain or may contain incomplete or inaccurate information. Additionally, there may be a bias in the data, which can lead to skewed results if not properly addressed.

How can prioritizing statistical data improve the research process?

Prioritizing statistical data can improve the research process in several ways. By focusing on relevant and high-quality data, scientists can save time and resources. It also allows for more accurate and reliable results, which can strengthen the validity and impact of the research.

What are some common methods used to prioritize statistical data?

Some common methods for prioritizing statistical data include data mining, exploratory data analysis, and machine learning algorithms. These methods can help identify patterns, trends, and relationships within the data and determine which variables are most important for the research question at hand.

Similar threads

Replies
11
Views
2K
Replies
25
Views
1K
Replies
3
Views
2K
Replies
1
Views
6K
Replies
29
Views
3K
Replies
14
Views
3K
Replies
21
Views
3K
Back
Top