Data repeatability (statistics question)

  • Thread starter Melawrghk
  • Start date
  • Tags
    Data
In summary, the conversation discusses using MATLAB to determine if two sets of data represent the same values. After calculating the z-score using the formula provided, the null hypothesis is accepted that the two sets are different. However, the person is questioning this result due to the close means and large standard deviation. The response explains that the standard deviation does not accurately represent the uncertainty of the mean, and it is important to compare the difference between the means with the expected statistical inaccuracies. The person is commended for questioning the results and reminded to define terms used in their question.
  • #1
Melawrghk
145
0

Homework Statement


I am trying to see if two sets of data represent the same values or not. I have:
Mean1 = 9.3155, stdev1 = 0.1334; mean2 = 9.3040, stdev2 = 0.1248;
N1 = N2 = 1000;
I got these values from my data using MATLAB (std() and mean());

Homework Equations



[itex]z = \frac{(mean1-mean2)}{\sqrt{stdev1^{2}/N1^{2}+stdev2^{2}/N2^{2}}}[/itex]

The Attempt at a Solution



Null hypothesis: Sets are different.
Alternative: Sets are the same.

Using the formula above I get z score of 63, which accepts my Null hypothesis that the two series are different.

However, I don't really seem to understand why they would be considered different given the fairly large standard deviation and close means. The way I think is kind of like - the second mean fits within mean1+/-stdev1, so shouldn't the z score be smaller?

Statistics isn't my strong suit, and this is for an electronics thing, but I'm curious what I'm thinking wrong exactly.
 
Physics news on Phys.org
  • #2
The standard deviation of a distribution s describes how individual entries of the distribution scatter. But it does not describe the uncertainty u with which the mean is determined. The uncertainty is determined from N numbers, not a single one. Therefore, its statistical uncertainty u is smaller than s, namely u² = s²/N (*). What you essentially want to do is comparing the difference between the means with the statistical inaccurancies you expect for them. Not comparing the difference to the width of the distributions. This would tell you to what extent you could take a single number and tell which of the two probability distributions it probably belongs to (assuming the distributions are different, of course).

Btw.: Excellent question. It's nice to see when students question results that seem wrong to them.


(*): Note by comparison that your equation for the not-further-specified "z" is a bit fishy. Also note that it is good habit to define/explain terms used. Just because everyone in your class knows what your teacher means by "z" does not imply that everyone around the world does.
 

Related to Data repeatability (statistics question)

What is data repeatability?

Data repeatability refers to the consistency or reproducibility of a set of data. It measures how likely it is for the same results to be obtained when the same experiment is repeated multiple times.

Why is data repeatability important in statistics?

Data repeatability is important in statistics because it allows us to determine the reliability and validity of our data. It helps us to identify any potential errors or biases in our data collection process and ensure that our results are accurate and consistent.

How do you calculate data repeatability?

Data repeatability can be calculated using statistical measures such as standard deviation, coefficient of variation, or intraclass correlation coefficient (ICC). These measures assess the variability of the data and determine the level of repeatability.

What factors can affect data repeatability?

There are several factors that can affect data repeatability, including human error, measurement instruments, environmental conditions, and sample size. It is important to control for these factors to ensure the reliability of data.

How can data repeatability be improved?

Data repeatability can be improved by using standardized and validated measurement techniques, increasing sample size, and conducting multiple trials. It is also important to carefully document and track all data collection processes to identify any potential errors or biases.

Similar threads

  • Programming and Computer Science
Replies
7
Views
918
  • Set Theory, Logic, Probability, Statistics
Replies
22
Views
2K
  • Precalculus Mathematics Homework Help
Replies
1
Views
1K
  • Engineering and Comp Sci Homework Help
Replies
1
Views
1K
  • Precalculus Mathematics Homework Help
Replies
3
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
774
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
4K
  • Engineering and Comp Sci Homework Help
Replies
7
Views
6K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
8
Views
1K
Back
Top