How to compare two data sets with statistics?

elegysix · Dec 9, 2013

I have two questions:

I have a set of data, a measured spectrum. When I model the spectrum with a function, I calculate r²=1-([itex]\sum[/itex](y-y_model)²/[itex]\sum[/itex](y-y_avg)²).

Q1) However, I have reference data now, which is what the spectrum should be. So is it right to use the same calculation on it for r², but instead of using y_model, using y_reference?

Q2) The model function I was fitting to the data is
S_λ = 2πhc²/λ⁵(e^hc/λkT-1)
Is it correct to calculate goodness of fit in that way for such a distribution?

Here is a plot of my two data sets

thanks!

Simon Bridge · Dec 10, 2013

Q1> what does it mean: "what the spectrum should be"
There is what the spectrum is and what the model predicts - surely it "should be" whatever it actually is.

Q2> To decide what to do you need, first, to define the problem.
What is it you are trying to find out?

If you want to see if the model is a good fit to the data, then a goodness fit is probably warranted.
Make sure that the approach you use answers the questions you are asking.

What I am reading above is that you have not asked a clear enough question to know how to proceed.

Suspect you may need these:
http://home.comcast.net/~szemengtan/
... "Inverse Problems" towards the bottom of the page.

Those data plots are seriously cool btw.

elegysix · Dec 10, 2013

Thanks, sorry for being unclear.
Forget that I mentioned a "model"

"what the spectrum should be" is the ASTMG173.
We captured the solar spectrum and want to compare it with a reference spectrum (the ASTMG173) to show that our measurements are accurate.

the question is - how can I properly use statistics to say how well these two data sets match?

Is it appropriate to use this calculation: [itex]r^{2} = 1 - \frac{\sum(y_{r} - y_{s})^{2} }{\sum(y_{r} - \bar{y_{r}})^{2} } [/itex]

where [itex] y_{r} [/itex] is the reference y data, and [itex] y_{s} [/itex] is our measured y data, and [itex] \bar{y_{r}} [/itex] is the mean of the reference y data.

thanks

Simon Bridge · Dec 11, 2013

So you are testing the measuring method, to show that it is sound?

You want to use the coefficient of determination test?
I think you have the roles of the data-sets reversed.

There are other goodness of fit tests - i.e. chi-squared - what lead you to choose this one?

elegysix · Dec 11, 2013

Simon Bridge said:

So you are testing the measuring method, to show that it is sound?

yes.

Simon Bridge said:

You want to use the coefficient of determination test?

Not necessarily. I want to use whatever test is appropriate for this.

Simon Bridge said:

There are other goodness of fit tests - i.e. chi-squared - what lead you to choose this one?

I am not familiar with the others, that is why I made this thread. Which test should I use? what would you use?

thanks

Simon Bridge · Dec 11, 2013

I see ... I cannot see anything immediately ruling out a CoD test.
I would use Chi-squared... but that's me.

Really you are comparing two data-sets and asking if they are close enough to come from the same forward function rather than checking a data set against a theoretical model of a forward function.

The inverse problems papers I linked you to (post #2) gives a lot of detail on different rationales for goodness of fit in different circumstances.

How to compare two data sets with statistics?

Related to How to compare two data sets with statistics?

1. How do I determine which statistical test to use?

2. What is the significance level and how do I choose it?

3. How do I interpret the results of a statistical test?

4. Can I compare data sets with different sample sizes?

5. What if my data is not normally distributed?

Similar threads

Hot Threads

Recent Insights