Statistics Definition and 998 Threads

Statistics is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industrial, or social problem, it is conventional to begin with a statistical population or a statistical model to be studied. Populations can be diverse groups of people or objects such as "all people living in a country" or "every atom composing a crystal". Statistics deals with every aspect of data, including the planning of data collection in terms of the design of surveys and experiments.When census data cannot be collected, statisticians collect data by developing specific experiment designs and survey samples. Representative sampling assures that inferences and conclusions can reasonably extend from the sample to the population as a whole. An experimental study involves taking measurements of the system under study, manipulating the system, and then taking additional measurements using the same procedure to determine if the manipulation has modified the values of the measurements. In contrast, an observational study does not involve experimental manipulation.
Two main statistical methods are used in data analysis: descriptive statistics, which summarize data from a sample using indexes such as the mean or standard deviation, and inferential statistics, which draw conclusions from data that are subject to random variation (e.g., observational errors, sampling variation). Descriptive statistics are most often concerned with two sets of properties of a distribution (sample or population): central tendency (or location) seeks to characterize the distribution's central or typical value, while dispersion (or variability) characterizes the extent to which members of the distribution depart from its center and each other. Inferences on mathematical statistics are made under the framework of probability theory, which deals with the analysis of random phenomena.
A standard statistical procedure involves the collection of data leading to test of the relationship between two statistical data sets, or a data set and synthetic data drawn from an idealized model. A hypothesis is proposed for the statistical relationship between the two data sets, and this is compared as an alternative to an idealized null hypothesis of no relationship between two data sets. Rejecting or disproving the null hypothesis is done using statistical tests that quantify the sense in which the null can be proven false, given the data that are used in the test. Working from a null hypothesis, two basic forms of error are recognized: Type I errors (null hypothesis is falsely rejected giving a "false positive") and Type II errors (null hypothesis fails to be rejected and an actual relationship between populations is missed giving a "false negative"). Multiple problems have come to be associated with this framework, ranging from obtaining a sufficient sample size to specifying an adequate null hypothesis. Measurement processes that generate statistical data are also subject to error. Many of these errors are classified as random (noise) or systematic (bias), but other types of errors (e.g., blunder, such as when an analyst reports incorrect units) can also occur. The presence of missing data or censoring may result in biased estimates and specific techniques have been developed to address these problems.

View More On Wikipedia.org
  1. Agent Smith

    Regarding Simulations and Sample Size

    TL;DR Summary: Sims and sample size A statistics question I have in my notes goes like this: Our significance level ##\alpha = 0.01## The percentage of left-handed people in the general population is ##10\%##. Liliana is curious if this is true for her arts class and so she takes a random...
  2. TomVassos

    I Help us calculate the likelihood an intelligent alien species exists!

    Hello everyone, I need some statistics help relating to one aspect of the Fermi Paradox. The Universe is a very big place and even if there are millions of other alien civilizations out there, they are likely so far away from us that we will never ever meet them. But another factor that will...
  3. Agent Smith

    B Experiment to test for causality

    We want to check whether specially treated bitter gourd is effective, marketed as BitterHeal, in lowering blood sugar levels in diabetics. They take a random sample of diabetics and assign them randomly to ##3## groups: Group A: Are given BitterHeal Group B: Are given untreated bitter gourd...
  4. rcktbr

    Searching for knowledge about probability and statistics

    I am currently studying Economics at undergraduate level and want to enhance my knowledge about probability and statistics in order to better understand econometrics.
  5. G

    Probability distributions for Maxwell-Boltzmann, B-E, F-D

    I don't even understand what question is being posed here. The answers given by the author are as follows: These are numbers, potentially very large ones.
  6. A

    I How do I express that a 100% occurrence in a small sample is low "confidence"?

    In experiment A: I observe an event 2 times in 2 trials. In experiment B: I observe an event 100 times in 100 trials. In both cases, I calculate a frequency of 100% In both cases, I calculate a 95% confidence interval of (1, 1). But intuitively the result of experiment B is "stronger" than...
  7. L

    Mathematica Plot normal distribution with measurement results

    Hi, I have completed an experiment at university as part of my internship and have now received several measurement results which I would like to analyze statistically and plot the results as a normal distribution in Mathematica. Is this even possible with Mathematica? Unfortunately, I haven't...
  8. badr

    I Probability of passing a disease down to a child or a group of siblings?

    Hello. So I got a question about heredity . Let's say the probability of inheriting schizophrenia is 6 % if one parent is affected. So i know that for 6 % probability, there is 1.2 kid out of 5 who will inherit that illness . So is it better not to have kids in this case ?
  9. T

    NHST statistics for % of students late to class

    This is a homework question in my daughter’s maths class. When I did stats I always had examples where there were two variables : an example being swallow wingspan and sex. The statement that males have larger wingspan would therefore be the H_1 and the H_0 would be that there is no impact of...
  10. S

    B Should I be treating the data I have as a Population or Sample?

    A study on strength properties of high-performance concrete obtained by using super-plasticizers and certain binders recorded the following data on flexural strength (in mega-pascals, MPa) from 28 tests: 6.1, 5.6, 7.1, 7.3, 6.6, 8.0, 6.8, 6.6, 7.6, 6.8, 6.7, 6.6, 6.8, 7.6, 9.3, 8.2, 8.7, 7.7...
  11. Gary Venter

    How Does a Stat Modeler's Approach Influence Learning Physics?

    I studied foundations of mathematics from mathematical and philosophical angles in grad school but then went on to a career of building and testing statistical risk models. The guiding philosophy there, which I call Boxian Skepticism, derives from a quote of George Box: "All models are wrong but...
  12. J

    A Bivariate Smoothing Splines

    Does anyone know of a bivariate smoothing spline package that lets you set your own loss function? All of the public domain software I've been able to find (e.g., SCIPY) appears to minimize the sum of squared errors. For example, I'd like to set the spline coefficients to maximize the...
  13. chwala

    Test whether the diets are different from one another at ##α=5\%##

    Looking at stats today, In my working i have; Let ##H_0 = μ_1=μ_2## v/s ##H_1 = μ_1-μ_2≠ 0## then, ##\bar x = \dfrac{134+83+...+123}{12}=120## ##\bar y = \dfrac{70+118...+94}{7}=101## ##t=\dfrac{\bar x- \bar y}{S_p ⋅\sqrt {\dfrac{1}{n_1}+\dfrac{1}{n_2}}}## ##t=\dfrac{120-101}{21.21...
  14. TomVassos

    I Calculating the End of the Universe Using Standard Deviation Statistics

    One possible end to the Universe is called vacuum decay, where a Higgs boson could transition from a false vacuum to a true vacuum state. This would create a vacuum decay bubble (known as bubble nucleation) that would expand at light speed, destroying everything in its path. According to Anders...
  15. R

    I Variation of the Liar's Paradox

    A variation of the Liar's Paradox occurred to me: "Statistics are wrong 90% of the time". This statement seems to refute itself, but does so in a less straightforward way. I would appreciate any insights! And what about, "Statistics are wrong 50% of the time"? (Even odds.)
  16. F

    B AI Detection - Phase 1: sample collection

    I know two programs that claim to be able to detect whether a text has been written by a machine or by a human. A (ZeroGPT): https://www.zerogpt.com/ B (OpenAI): https://openai-openai-detector.hf.space/ Character Count: https://www.lettercount.com/ If you have time and examples, please test...
  17. A

    B Definition of a random variable in quantum mechanics?

    In a line of reasoning that involves measurement outcomes in quantum mechanics, such as spins, photons hitting a detection screen (with discrete positions, like in a CCD), atomic decays (like in a Geiger detector counting at discrete time intervals, etc.), I would like to define rigorously the...
  18. T

    Statistics problem: Comparing written work with & w/out use of AI

    I want to compare performance on written work under different conditions, for example with and without the use of AI, according to some specified criteria. Assume the written work is a critical analysis of specific content. The written work will be scored on a number of dimensions, such as...
  19. Memo

    How Many Type A Pigs Have 3 DYL Blood in a Herd of 1,000?

    Mentor note: Thread moved from technical section to here, so is missing the homework template. TL;DR Summary: The weight of DYL 3-blood hybrid pigs after correction of a farm is a random quantity with a normal distribution. Knowing that the probability of a pig weighing over 20 kg is 0.1587 and...
  20. A

    Meaning of "Average" Flux Tallies in MCNP

    Hello, I've been working with MCNP on and off for a few years now, but just recently realized that I don't entirely understand how tallies are actually calculated in MCNP, and what they signify. Taking the example of the F2 tally, the user manual (Section 3.3.5.1) states that F2 is the "flux...
  21. C

    I What Is the Probability Atom A Will Emit a Photon Before Atom B?

    For concretness I'll use atoms and photons but this problem is actually just about probabilities. There's an atom A whose probability to emit a photon between times t and t+dt is given by a gaussian distribution probability P_A centered around time T_A with variance V_A. There's a similar atom...
  22. M

    How Can Mathematical Physics and Information Theory Enhance Collaboration?

    Post-grad, my background is in mathematical physics, probability/statistics, and information theory. I am here for discussion and collaboration on things I find interesting from time to time.
  23. Artemisa

    Error floors in this Bayesian analysis

    In this article((https://arxiv.org/pdf/2001.04581.pdf)), the authors use a Bayesian analysis based on the positions of astrophysical bodies and their errors in the medians. This statistical analysis uses the markov chain monte carlo chains. The uncertainties in the positions are large, so what...
  24. hagopbul

    What Mathematical Concepts Support Profits on MetaTrader?

    TL;DR Summary: Asking about meta trader platform and what mathematical theories should i read about Hello : Recently got my attention a claim about meta trader platform and how you can use it as supportive income source What is this platform exactly ? What should I read to be able to use...
  25. Graham87

    I Basic standard deviation calculation

    I don’t get how they got the equation for the standard deviation. Why do they only square with the time in the denominator? Thanks!
  26. H

    Good introductory book on statistical/data analysis?

    TL;DR Summary: I'm looking for a book on statistical/data analysis. Hey all. I've been doing statistical analysis in my research (such as using PCA and LDA), but I have never received a formal education on statistical analysis or data mining, and what I know about analysis is quite scattered...
  27. P

    Understanding the meaning of "expected fraction" (Statistics)

    The first part of the question asked me to calculate the mean and standard deviation for the number of remain votes in the simple binomial model consisting of total sample size of 2091 people. I believe this is fairly straightforward, it was simply ##E(X) = \mu = 2091(0.5) = 1045.5## votes and...
  28. S

    Probability of Hypokalemia w/ 1 or Multiple Measurements

    TL;DR Summary: Finding the probability with one measurement and multiple measurements on separate days. Question: Hypokalemia is diagnosed when blood potassium levels are low, below 3.5 mmol/L. Let’s assume we know a patient whose measured potassium levels vary daily according to N(µ = 3.8...
  29. A

    Break a Stick Example: Random Variables

    Hello, I would like to confirm my answers to the following random variables question. Would anyone be willing to provide feedback and see if I'm on the right track? Thank you in advance. My attempt:
  30. shahbaznihal

    A Computing the Fisher Matrix numerically

    Hi, I have been studying the Fisher matrix to apply in a project. I understand how to compute a fisher matrix when you have a simple model for example which is linear in the model parameters (in that case the derivatives of the model with respect to the parameters are independent of the...
  31. P

    I Are Boltzman's statistics compatible with a deterministic universe?

    Are Boltzman's statistics compatible with deterministic universe? Suppose that the gas molecules in a given container are perfectly elastic objects obeying Newton's laws. Suppose further that we select the initial conditions (impulse and position of each molecule) at random. Is it true that, if...
  32. A

    A How to derive the sampling distribution of some statistics

    Assume that ##T## has an Erlang distribution: $$\displaystyle f \left(t \, | \, k \right)=\frac{\lambda ^{k }~t ^{k -1}~e^{-\lambda ~t }}{\left(k -1\right)!}$$ and ##K## has a geometric distribution $$\displaystyle P \left( K=k \right) \, = \, \left( 1-p \right) ^{k-1}p$$ Then the compound...
  33. WMDhamnekar

    MHB Probability, Expected value, joint P.D.F. and order statistics

    I want to know how did author derive the red underlined term in the below given Example? Would any member of Math help board enlighten me in this regard? Any math help will be accepted.
  34. L

    [Statistics] Calculate the percentage

    My attempt: P(x>=90) = 85/90 = 17/18 Is my understanding of the equation correct? Thanks
  35. A

    I Modeling the concentration of gas constituents in a Force Field

    Say there is a gas made up of two gas molecules: Molecule A and Molecule B. Molecule A has a mass: ma and mole fraction: na. Molecule B has a mass: mb and mole fraction: nb. The gas is at thermal equilibrium and has a constant temperature throughout itself (T) everywhere. It is placed in a...
  36. D

    Learn About Ancillary Statistics & Their Role in Education

    Ancillary statistics! You don't know what this means? I didn't know either, so I looked it up: http://utstat.toronto.edu/reid/research/A20n41.pdf As a non-native speaker, I didn't even know what "ancillary" means, so I had to look it up, too. The word has its root in latin "ancilla" which is...
  37. A

    Calculus Advanced Calculus with Applications in Statistics

    Is someone has already heard about this book wrote by Andre I. Khuri (Professor emeritus in science at university of Florida) ? By the table of contents the book seems to cover a lot of things in calculus/multivariable calculus and in a rigourous way according to the preface (they argue that...
  38. S

    I Bayesian statistics in science

    [Moderator's note: This thread has been split off from a previous thread since its topic is best addressed in a separate discussion. This post has been edited to focus on the topic for separate discussion.] Jaynes has used in the derivation of the rules of probability as the logic of plausible...
  39. chwala

    Solve the variance problem below - statistics

    The question is below: below is my own working; the mark scheme for the question is below here; i am seeking for any other approach that may be there...am now trying to refresh on stats...bingo!
  40. tixi

    Labwork Statistics help: Average of averages

    I have done the experiment, and have a lot of data. For each data point (we have five), we did ten repetitions, for which we need to do video analysis. The analysis works frame by frame and gives a velocity between each frame. So, to get the value of one repetition, we already need to calculate...
  41. Amitkumarr

    I Finding bias of the coin from noise corrupted signals

    Suppose there are two persons A and B such that both have a personal communication system which can transmit and receive bits. B has a biased coin whose bias is not known. A asks B to toss the coin 2000 times, send a 0 when a tail comes up and a 1 when a head comes up. It is known that whatever...
  42. V

    B Convince Covid-19 Vaccine Efficiency Through Statistics

    I have been trying to convince someone that it is wrong to compare the death percentages of two different populations (percentage of death of Covid-19 cases per category: vaccinated vs unvaccinated) in an uncontrolled setting (i.e. real-world data), and conclude that the Covid-19 vaccine does...
  43. ohwilleke

    I Why Do Physicists Use Gaussian Error Distributions?

    David C. Bailey. "Not Normal: the uncertainties of scientific measurements." Royal Society Open 4(1) Science 160600 (2017). How bad are the tails? According to Bailey in an interview, "The chance of large differences does not fall off exponentially as you'd expect in a normal bell curve," and...
  44. S

    How Has Computer Trading Affected Stock Trading Volume Statistics?

    Has the advent of computer trading greatly increased the size of statistics for trading volume? - or do those statistics (for individual stocks) somehow omit the flash trades done by computers? In the pre-computer days, there were people who had theories of stock trading based on both the...
  45. W

    A Using Statistics to Test for Normality of Pi

    Is there a " reasonable" way to test for the normality of ##\pi## , i .e., that every digit occurs with the same frequency? Someone suggested randomly sampling strings of size 20 and outputting the frequency. Then I guess we could average the frequencies among samples , use a chi-squared test...
  46. Falgun

    Prob/Stats Looking for a probability and statistics textbook

    I want to learn some probability & statistics on my own. I am well versed in Calc 1-3 , elementary ODEs and very little linear algebra. I want a comprehensive , introductory textbook which is NOT COOKBOOK STYLE. I might be self studying AP statistics next term so if the book covers everything I...
  47. shahbaznihal

    A Galaxy statistics calculation in Saslaw's book

    I am trying to follow a calculation from the book of William C. Saslaw, The Distribution of the Galaxies: Gravitational Clustering in Cosmology. The calculation is shown on the pages following page 122 in chapter 14 where the author talks about the Correlation function. I am able to reproduce...
  48. chwala

    Discrete data vs continous data in statistics

    I would like to seek your take on the two terms; discrete and continuous in this context, In my understanding, when we look at height of individuals (in cms), this measure in general or in definition implies continuous data. If we are to look at specific math problem that involves height of say...
  49. W

    I Bias in Linear Regression (x-intercept) vs Statistics

    Hi, In simple regression for machine learning , a model : Y=mx +b , Is said AFAIK, to have bias equal to b. Is there a relation between the use of bias here and the use of bias in terms of estimators for population parameters, i.e., the bias of an estimator P^ for a population parameter P is...
Back
Top