Chi-squared fit with errors on both x and y

In summary, Vanadium 50 said that if you standardize your variables, the resulting expression is unitless.
  • #1
Malamala
308
27
Hello I have some data points which have errors on both x and y coordinates. I want to fit a straight line to them but I am not sure how to take the error on x into account. Normally, when I have just the error on y, I want to minimize $$\sum\frac{(y_{pred}(x)-y_{measured}(x))^2}{\sigma_y^2}$$
Can I just replace ##\sigma_y^2## with ##\sigma_x^2+\sigma_y^2##? The errors on x and y are not correlated. Thank you!
 
Physics news on Phys.org
  • #2
  • Like
Likes WWGD
  • #3
It is also called orthogonal distance regression.
 
  • #4
Dale said:
It is also called orthogonal distance regression.

Yes. You start with the obvious thing - a line y = mx + b, and you try and do a least-squares fit using the perpendicular distances between the points and the candidate line instead of the y-distances. Problem is that doesn't always get you a unique unbiased solution.

That's why you need to specify what you are looking for very carefully.
 
  • Like
Likes WWGD and Dale
  • #6
Even though this appears to be a drive-by posting, I'll make one more comment.

If you minimize a function of Δy only, it's clear what you are doing. If you minimize something like Δx2 + Δy2 it's not even guaranteed that you have a number with consistent dimensions: suppose y is temperature and x is time. What units would Δx2 + Δy2 even be in?

To get a well-defined answer, one needs to pose a much, much better defined question. And even then it may not exist.
 
  • Like
Likes WWGD
  • #7
Vanadium 50 said:
Even though this appears to be a drive-by posting, I'll make one more comment.

If you minimize a function of Δy only, it's clear what you are doing. If you minimize something like Δx2 + Δy2 it's not even guaranteed that you have a number with consistent dimensions: suppose y is temperature and x is time. What units would Δx2 + Δy2 even be in?

To get a well-defined answer, one needs to pose a much, much better defined question. And even then it may not exist.
Maybe if you standardize your variables you can avoid the issue with units? I understand that is one if the reasons for standardization.
 
  • #8
WWGD said:
Maybe if you standardize your variables you can avoid the issue with units? I understand that is one if the reasons for standardization.
What do you mean by this?
 
  • #9
Malamala said:
What do you mean by this?
I was replying to @Vanadium 50 regarding his statement on mixed units in the expression ##\sqrt \delta x^2 + \ delta y^2 ##. If you standardize your expression ( assuming normality of data or other) the resulting variable is unitless , from algebra alone ( you're dividing two expressions with the same units ), so that you avoid at least this issue of having mixed units. Seems like something @Stephen Tashi may know about.
 

FAQ: Chi-squared fit with errors on both x and y

What is a chi-squared fit with errors on both x and y?

A chi-squared fit with errors on both x and y is a statistical method used to determine the best fit line for a set of data points that have errors or uncertainties in both the x and y values. It takes into account the errors in both variables and calculates a chi-squared value, which is a measure of how well the data points fit the expected line.

How is a chi-squared fit with errors on both x and y different from a regular chi-squared fit?

A regular chi-squared fit only takes into account errors in the y variable, while a chi-squared fit with errors on both x and y considers errors in both the x and y variables. This allows for a more accurate determination of the best fit line for the data.

What is the purpose of using a chi-squared fit with errors on both x and y?

The purpose of using a chi-squared fit with errors on both x and y is to determine the best fit line for a set of data points that have errors or uncertainties in both the x and y values. This can provide a more accurate representation of the relationship between the variables and can be useful in making predictions or drawing conclusions from the data.

How is the chi-squared value calculated in a chi-squared fit with errors on both x and y?

The chi-squared value is calculated by summing the squared differences between the expected y values (based on the best fit line) and the actual y values, divided by the uncertainties in the y values. This calculation is then repeated for each data point and the resulting values are summed to get the final chi-squared value.

What does a low or high chi-squared value indicate in a chi-squared fit with errors on both x and y?

A low chi-squared value indicates a good fit between the data points and the expected line, while a high chi-squared value indicates a poor fit. A value close to 1 is considered a good fit, while values significantly higher than 1 suggest that the data does not fit the expected line well.

Similar threads

Replies
8
Views
2K
Replies
1
Views
1K
Replies
1
Views
730
Replies
7
Views
304
Replies
16
Views
2K
Replies
3
Views
2K
Replies
7
Views
2K
Back
Top