Calculating the confidence interval of your data

AI Thread Summary
To calculate the confidence interval for data accuracy, one must first decide on the desired confidence level, such as 95%. After loading data into a database, a random sample of lines should be checked to estimate the number of errors. For example, if 250 lines are checked and 25 are found to be corrupt, the estimated fraction of corruption is 10%. Using a Poisson distribution, the variance and standard deviation can be calculated to construct the confidence interval. This interval provides a range within which the true percentage of corrupted lines is likely to fall.
tsaitea
Messages
19
Reaction score
0
Hello guys,

I would like to calculate the confidence interval of the data in which the data is correct. In otherwords I would like to know how much confidence we have that the data is correct.

Could anyone direct me where to start?

Thank you.
 
Physics news on Phys.org
Google for "confidence integral" is a good place to start.
Note: you decide how much confidence you want i.e. 95% - the confidence interval the the range within which you can have that confidence.
 
Thank you for your reply Simon.

That definitely helped me understand the confidence interval. I am struggling to put it into context now. Say for example I am loading 300 lines of data into a database. Now I want to figure out the # of errors that would occur during the load (data not loaded properly).

What I was thinking is maybe I would have to perform the load, and calculate the sample size I would need based on a 95% confidence interval and randomly check each line until I have checked up to the sample size. And based on the # of errors found, I could determine the confidence interval?
 
Let me try to repeat your problem. You maybe want to load millions of lines into a database and what to estimate what percentage of the lines is corrupt. You decide to check at random say N=250 lines. Let's assume that the distribution of corrupt lines follows a poisson distribution and you find that n=25 lines (i.e. 10% ) are corrupt, hence your estimate of the probability or fraction of corrupted lines is p=n/N=0.1. The variance is also 25 for a poisson distribution and the standard deviation ## \sigma=\sqrt{n}##. Using a normal approximation you can construct a 95% confidence interval
for the true percentage as ##[ (n- z \sigma)/N, (n+z \sigma)/N]## where for a 95% interval z=2 (1.96 to be exact).
So your true value is in the range [15/250, 35/250].
 
  • Like
Likes 1 person
I was reading a Bachelor thesis on Peano Arithmetic (PA). PA has the following axioms (not including the induction schema): $$\begin{align} & (A1) ~~~~ \forall x \neg (x + 1 = 0) \nonumber \\ & (A2) ~~~~ \forall xy (x + 1 =y + 1 \to x = y) \nonumber \\ & (A3) ~~~~ \forall x (x + 0 = x) \nonumber \\ & (A4) ~~~~ \forall xy (x + (y +1) = (x + y ) + 1) \nonumber \\ & (A5) ~~~~ \forall x (x \cdot 0 = 0) \nonumber \\ & (A6) ~~~~ \forall xy (x \cdot (y + 1) = (x \cdot y) + x) \nonumber...

Similar threads

Replies
3
Views
1K
Replies
1
Views
1K
Replies
22
Views
3K
Replies
1
Views
1K
Replies
18
Views
4K
Replies
21
Views
4K
Back
Top