Nonlinear Least Squares or OLS for Nonlinear Models?

  • I
  • Thread starter fog37
  • Start date
  • Tags
    Ols
In summary, the discussion on "Nonlinear Least Squares or OLS for Nonlinear Models?" highlights the distinctions between Ordinary Least Squares (OLS) and Nonlinear Least Squares (NLS) methods in the context of fitting nonlinear models. OLS assumes linearity in parameters and is easier to compute, making it suitable for linear relationships. Conversely, NLS directly handles nonlinearity, allowing for more accurate parameter estimation in complex models. The choice between these methods depends on the nature of the data and the underlying model, with NLS typically preferred for true nonlinear relationships despite its computational complexity.
  • #1
fog37
1,569
108
TL;DR Summary
Difference between nonlinear least squares vs ordinary least squares
hello,

I understand that the method of ordinary least squares (OLS) is about finding the coefficients that minimize the sum ##\Sigma (y_{observed} -g(X))^2## where ##g(X)## is the statistical model chosen to fit the data. Beside OLS, there clearly other coefficient estimation methods (MLE, etc.)

In general, OLS is fair game when the model ##g(X)## is "linear with respect with the parameters" (linear regression, polynomial regression, etc.): any model that is the sum of several terms with each term being the product of the estimated coefficient and whatever variable: ##g(X) =\Sigma \beta f(X)## where ##f(X)## are like the basis functions. For example, ##g(X)=\beta_0 +\beta_1 X+\beta_2 X^2## is linear and the basis functions are the three functions ##1, X, X^2##...

Of course, the OLS approach is valid as long as specific assumptions on the residuals are met. Additionally, after taking the first derivative and setting them to zero, we are able to arrive to nice analytical formulas for the coefficients.

That said, what is the issue with using OLS when ##g(X)## is a nonlinear model? I know that sometimes we "convert" a nonlinear model so that it assume the form of a linear model. That strategy then allows us to use OLS on the new model based on the transformed variables...That is a useful hack.

But I have reading about "nonlinear least squares". Isn't it the same approach as OLS but when the model is nonlinear where we directly plug the nonlinear model ##g(X)## in ##\Sigma (y_{observed} -g(X))^2## ? We may not end up with analytical estimators and have to solve for the coefficients using some numerical method...But I don't see an issue apply OLS to nonlinear models...

Thank you.
 
  • Like
Likes Dale
Physics news on Phys.org
  • #2
OLS minimized the sum of squared errors of the actual samples versus the estimated values. If that is your goal, then that is the thing to do.
 
  • Like
Likes fog37
  • #3
FactChecker said:
OLS minimized the sum of squared errors of the actual samples versus the estimated values. If that is your goal, then that is the thing to do.
That is the goal but many resources I read state that OLS is only for linear models and that puzzled me....Is it because the estimates resulting from applying OLS to linear models are not as good as they could be when the model is linear?
 
  • #4
I should probably not have called it OLS. If your goal is to minimize the sum-squared-errors, then do that, whether it requires OLS or a numerical technique.
These problems do not exist in a vacuum. You should have a reason for the model you propose and have something that you want to use the results for. That should determine what approach you can use. What you need to be aware of is that the statistical results like confidence intervals of the parameters may not be valid if certain assumptions are not met.
 
  • Like
Likes fog37
  • #5
FactChecker said:
I should probably not have called it OLS. If your goal is to minimize the sum-squared-errors, then do that, whether it requires OLS or a numerical technique.
These problems do not exist in a vacuum. You should have a reason for the model you propose and have something that you want to use the results for. That should determine what approach you can use. What you need to be aware of is that the statistical results like confidence intervals of the parameters may not be valid if certain assumptions are not met.
I see.

Inferential statistics is either about estimation, hypothesis testing, or both. Estimation is really just about coming up with a reasonably good numerical, unbiased, consistent, low variance estimate of the parameter.

Hypothesis testing focuses on a different task: it hypothesizes the unknown population parameter and uses the limited sample data to check if that hypothesis (H0) is valid or not. Confidence intervals, standard errors, p-values result from hypothesis testing, not from estimation, correct?

If the required assumptions are not met by the chosen model, estimation may still work just fine...but confidence intervals, standard errors, p-values, etc. will not be reliable, statistically speaking.

For example, in linear regression, the response variable ##Y## does not have to be normally distributed for the model to be sound and get good estimates of the slope and intercept. The Markov-Gauss assumptions don't force ##Y## or the residuals to have normal distribution at all....But confidence intervals, standard errors, p-values, the output of hypothesis testing, will not be good if Y in not normal which implies that the residuals will also not be normally distributed...

Am I thinking correctly here?
 
  • #6
fog37 said:
For example, in linear regression, the response variable ##Y## does not have to be normally distributed for the model to be sound and get good estimates of the slope and intercept. The Markov-Gauss assumptions don't force ##Y## or the residuals to have normal distribution at all....But confidence intervals, standard errors, p-values, the output of hypothesis testing, will not be good if Y in not normal which implies that the residuals will also not be normally distributed...

Am I thinking correctly here?
When you talk about a normal distribution, you should be talking about the random term, ##\epsilon##, not about ##Y##. There can be many ways that random behavior influences ##Y##. I have not seen you mention that yet. You need to pay special attention to how the random term enters into the equation. Without that, your model is incomplete.
Some example models are:
##Y = a_0 + a_1 X_1 + a_2 X_2 + \epsilon##
or
##Y = \epsilon \cdot e^{a_0 + a_1 X_1 + a_2 X_2}##
or
##Y = g( X + \epsilon)##
 
  • Like
Likes fog37
  • #7
I see. Your point is that the residuals can be normally distributed (and have equal variance) at each ##X## value...But that does not automatically imply that the observed response variable ##Y## has also normally distributed values....

However, I have always thought that if the error is normal, then ##Y## is also normally distributed...
 
  • #8
fog37 said:
I see. Your point is that the residuals can be normally distributed (and have equal variance) at each ##X## value...But that does not automatically imply that the observed response variable ##Y## has also normally distributed values....

However, I have always thought that if the error is normal, then ##Y## is also normally distributed...
IMO, we shouldn't talk about "residuals" and "error" as though they are a simple normal random variable with a mean of 0. They are the errors of an estimated model versus the true model and can be changed by other errors in the estimated model.
Suppose we have an actual physical relationship ##Y = a_0 + a_1 X + \epsilon##, where ##\epsilon## is a normal variable with a mean of zero, and estimate it with a linear equation ##\hat Y = \hat {a_0} + \hat {a_1} X##.
Then the errors or residuals are ##\hat {\epsilon_i} = y_i - \hat {y_i} = (a_0 - \hat {a_0}) + (a_1 - \hat {a_1})x_i +\epsilon_i##
##\hat {\epsilon_i}## is different from the term ##\epsilon_i##. It includes a term that depends on ##x_i##
 

FAQ: Nonlinear Least Squares or OLS for Nonlinear Models?

What is Nonlinear Least Squares?

Nonlinear Least Squares (NLS) is a form of regression analysis used to fit a model to a set of data points when the relationship between the independent and dependent variables is nonlinear. Unlike Ordinary Least Squares (OLS), which assumes a linear relationship, NLS minimizes the sum of the squares of the residuals (the differences between observed and predicted values) for nonlinear models.

How does Nonlinear Least Squares differ from Ordinary Least Squares?

Ordinary Least Squares (OLS) assumes a linear relationship between the independent and dependent variables, leading to a straightforward minimization of the sum of squared residuals. Nonlinear Least Squares (NLS), on the other hand, deals with models where the relationship is nonlinear, requiring iterative optimization techniques to minimize the sum of squared residuals. This often involves more complex algorithms and computational effort compared to OLS.

What types of models can be fitted using Nonlinear Least Squares?

Nonlinear Least Squares can be used to fit a wide range of nonlinear models, including exponential, logarithmic, polynomial, and power-law models. It is particularly useful for models where the dependent variable is a nonlinear function of one or more independent variables, such as growth curves, dose-response curves, and certain types of time series models.

What are the common algorithms used in Nonlinear Least Squares optimization?

Common algorithms used in Nonlinear Least Squares optimization include the Gauss-Newton method, the Levenberg-Marquardt algorithm, and the Nelder-Mead simplex method. These algorithms iteratively adjust the parameters of the model to minimize the sum of squared residuals, each with its own strengths and weaknesses in terms of convergence speed and robustness to initial parameter estimates.

What are the challenges associated with Nonlinear Least Squares?

Challenges associated with Nonlinear Least Squares include the potential for convergence to local minima rather than the global minimum, sensitivity to initial parameter estimates, and the computational intensity of the iterative optimization process. Additionally, interpreting the results can be more complex compared to linear models, and ensuring the model's assumptions are met can be more nuanced.

Back
Top