Extrapolating data points using models

In summary, the conversation discusses using data measurements to predict the output power of a given laser at 9 amps. It is noted that in the ideal case, a linear relationship is expected but higher order polynomials fit the data better. The conversation also addresses finding a proper balance between underfitting and overfitting the data and suggests looking at the reduced chi square value and the graphical residual analysis. It is recommended to use the higher-order model if the theory supports it, but to also consider any potential experimental or measurement errors.
  • #1
roam
1,271
12

Homework Statement


I have made a number of measurements of current against optical power for a given laser. As shown below, my measurements only go up to 8 amps. I am trying to use the data to predict the output power at 9 amps.

JpLDtLA.png


In the ideal case, the behaviour is expected to be linear, but here higher order polynomials fit the data better.

I would like to know if there is a way to find a proper balance between underfitting and overfitting these data. Also, I want to know if there are better methods to extrapolate this data point.

Homework Equations



The Attempt at a Solution



Clearly, the two models give different predictions of what the power would be at 9 amps (the difference being ~ 600 mW).

Here are the corresponding r2 values for the various fittings:

$$
\begin{array}{c|c}
\text{degree} & r^{2}\\
\hline 1 & 0.9977\\
2 & 0.9998\\
3 & 1.0000\\
4 & 1.0000
\end{array}
$$

Is it possible to decide which model to use based on these values? Can you determine if the flexibility of the model is too high so that it's modeling noise? :confused:

Any suggestions is greatly appreciated.
 

Attachments

  • JpLDtLA.png
    JpLDtLA.png
    4.9 KB · Views: 631
Physics news on Phys.org
  • #2
roam said:
Is it possible to decide which model to use based on these values?
Yes. Hold the plot horizontal and look 'along the line'. The deviation from a straight line is clearly systematic.
A measure of this is the reduced chi square = chi square/degrees of freedom.
link from this thread said:
##\ ## Stephen Tashi
In your case it should reduce sharply from linear to quadratic and not much from 2nd to 3rd order.
I'm not so familiar with ##R^2## -- except that it comes with excel fits :wink:. But I suppose the improvement from quadratic to 3rd order shows that the latter is not worth it.

[edit] google is our friend
 
Last edited:
  • Like
Likes roam
  • #3
roam said:
Is it possible to decide which model to use based on these values?

A high R2 value does not guarantee that the model fits the data well. As remarked by @BvU: Use your eyes to look 'along the line' or perform a graphical residual analysis to check whether the data-point deviations are randomly distributed around the fitted curve.
[PDF]
Curve Fitting Made Easy
 
  • Like
Likes roam and SammyS
  • #4
Hi @BvU and @Lord Jestocost,

I have a few follow-up questions. Here is a plot of my residuals:

tMNtmn5.png


The blue line shows the deviations from the straight line (linear fit). The residuals for quadratic and cubic also appear to be non-random, what does this mean?

Regarding the reduced chi-squared test, as I understand the smaller the value of ##\chi^{2}/\text{degrees of freedom}##, the better the fitting is. But if the improvement from one model to the next is small, then we should say with the current model?

To calculate this I need to find the number of degrees of freedom for this data set. the reference in BvU's post gives this definition:

$$\text{Number of data points} - \text{Number of parameters calculated from the data points}$$

I've got 8 data points. What would be the "number of parameters calculated from the data points"? :confused:
 

Attachments

  • tMNtmn5.png
    tMNtmn5.png
    24.7 KB · Views: 522
  • #5
roam said:
quadratic and cubic also appear to be non-random
There is a clear 2nd order term in the residuals for the linear fit. What non-random behaviour do you see in the other two ?
roam said:
What would be the "number of parameters calculated from the data points"?
For an average that is 1, for a straight line 2, for a parabola 3, etc.
You have 8 data points, so you could exactly calculate a seventh order polynomial through all points: zero degrees of freedom. But then you basically modeled the noise, not the actual behaviour. In addition, that 'model' it extremely useless for extrapolation.

roam said:
But if the improvement from one model to the next is small, then we should stay with the current model?
Yes. The ##\chi^2/N## has a distribution that depends on N
redchidensity.jpg

(picture https://www.chem.purdue.edu/courses/chm621/text/stat/funcs/sampling/sampling.htm n = 3,5,10,20 shown)
With higher N it becomes sharper and more symmetric around 1. In other words: a deviation from 1 becomes more and more unlikely.

Read up a bit on that until you understand a phrase like :
The area under the reduced chi squared distribution, from the ##\chi^2_R## found, to ##\infty## is the probability you would find a higher ##\chi^2_R## if you would repeat the experiment.

Remember though, that this is statistics -- for an experimentalist the physics takes precedence.

Note to self: I omit a treatise on internal/external errors which may be essential for the ##\int_{\chi^2}^\infty## phrase
 

Attachments

  • redchidensity.jpg
    redchidensity.jpg
    27.7 KB · Views: 512
  • Like
Likes roam and Lord Jestocost
  • #6
I would be interested to see the error bars on your data points.
 
  • Like
Likes roam
  • #7
Because the deviations from the linear model are so systematic, they do not look like random errors to me. The regression models should statistically support the inclusion of the non-linear term. IMHO, you should use the higher-order model. That being said, if the theory strongly suggests a linear relationship, then you should ask yourself if there may be something about your experiment or measurement methods that are introducing the non-linear term. Even if that is true, the best extrapolation of the entire experiment and measurement process is the non-linear model.
 
  • Like
Likes roam

FAQ: Extrapolating data points using models

1. What is extrapolation and how is it used in data analysis?

Extrapolation is the process of estimating values beyond the range of known data points using a mathematical model. It is commonly used in data analysis to make predictions or projections based on existing data.

2. What are the potential risks of extrapolating data points using models?

One of the main risks of extrapolation is the assumption that the underlying trend will continue beyond the known data points. This may not always be the case and can lead to inaccurate predictions. Additionally, extrapolation can also be influenced by outliers or errors in the data, which can further impact the accuracy of the results.

3. How do scientists choose the appropriate model for extrapolation?

Scientists typically choose a model based on the type of data and the underlying trend. For example, linear regression models are commonly used for data with a linear trend, while exponential or logarithmic models may be more suitable for data with a non-linear trend. The choice of model also depends on the purpose of the extrapolation and the level of accuracy required.

4. Can extrapolation be used to make accurate predictions in all situations?

No, extrapolation is not always a reliable method for making predictions. It is most effective when there is a clear and consistent trend in the data, and when the model used is appropriate for the data. In situations where there is a lot of variability or uncertainty in the data, extrapolation may not yield accurate results.

5. How can scientists account for potential errors or limitations when extrapolating data points using models?

Scientists can account for potential errors or limitations by using multiple models and comparing the results, or by incorporating a margin of error in the extrapolation. It is also important to carefully evaluate the data and consider any potential biases or anomalies that may affect the accuracy of the extrapolation.

Back
Top