- #1
Bjarke Nicolaisen
- 5
- 0
Hi all,
I am a science educator in high school. I have been thinking about how to make a simple estimate that 1st and maybe 2nd year students can follow for the propagation of error to the uncertainty of the slope in linear regression. The problem is typically that they make some measurements (x_i,y_i) of, say, time and distance, and then use linear regression to find the slope. For the uncertainty of the slope they can use the standard inferred empirical one, which assumes constant and equal variance in the y-data, etc. But this estimate relies not at all on the measured uncertainty on each data-point, delta_y_i. It only uses variation of data from the regression line, etc. Which is of course fine, but if I want to teach the students to propagate their errors on each y_i, it gets very messy. So I thought maybe the following estimate would be an idea, to start out with:
- Find the relative uncertainty on each data point, delta_y_i/y_i. Take the maximal value (overestimate).
- Divide by sqrt (N-2) for linear regression, (N-#parameters) in general.
- This relative uncertainty is then a simple overestimate for the relative uncertainty on the result, in my example the slope in linear regression.
I am more than a little unsure if this always ensures an overestimate. Although this does not need to be bulletproof, it is more to give the students an awareness of uncertainty propagation. I also know that here we are mixing two different methods of linear analyses, one where we assume no knowledge of the individual y_i errors (empirical), and one where we know the errors. But still, I find it an interesting idea. What do you guys think? Maybe you have a similar but more correct estimate, or simply an opinion.
I am a science educator in high school. I have been thinking about how to make a simple estimate that 1st and maybe 2nd year students can follow for the propagation of error to the uncertainty of the slope in linear regression. The problem is typically that they make some measurements (x_i,y_i) of, say, time and distance, and then use linear regression to find the slope. For the uncertainty of the slope they can use the standard inferred empirical one, which assumes constant and equal variance in the y-data, etc. But this estimate relies not at all on the measured uncertainty on each data-point, delta_y_i. It only uses variation of data from the regression line, etc. Which is of course fine, but if I want to teach the students to propagate their errors on each y_i, it gets very messy. So I thought maybe the following estimate would be an idea, to start out with:
- Find the relative uncertainty on each data point, delta_y_i/y_i. Take the maximal value (overestimate).
- Divide by sqrt (N-2) for linear regression, (N-#parameters) in general.
- This relative uncertainty is then a simple overestimate for the relative uncertainty on the result, in my example the slope in linear regression.
I am more than a little unsure if this always ensures an overestimate. Although this does not need to be bulletproof, it is more to give the students an awareness of uncertainty propagation. I also know that here we are mixing two different methods of linear analyses, one where we assume no knowledge of the individual y_i errors (empirical), and one where we know the errors. But still, I find it an interesting idea. What do you guys think? Maybe you have a similar but more correct estimate, or simply an opinion.
Last edited: