Optimizing Regression Degree with Weighted Cost Function

In summary, the conversation discusses the concept of regression and the desire for a method to determine the "best degree" of regression for a given set of data points. This method would need to balance between low residual errors and low degree, potentially using a weight function and appropriate measure to minimize it. The topic is of interest for applying to stock market prices and finding a smooth curve that makes sense.
  • #1
andrewcheong
9
0
Hello, all. I know what I want, but I just don't know what it's called.

This has to do with regression (polynomial fits). Given a set of N (x,y) points, we can compute a regression of degree K. For example, we could have a hundred (x,y) points and compute a linear regression (degree 1). Of course, there would be residual error because the line-of-best-fit won't go through every point perfectly. We could also compute quadratic (degree 2) or higher-degree regressions. This should reduce the residual error, or at least, be no worse an estimate than the lower-degree regressions.

Now, what I want is a regression that determines the "best degree". I mean, if I have N points, I can always get a perfect fit by computing a regression of degree N-1. For example, if I only have 2 points, a 1-degree regression (linear) can fit both points perfectly. If I only have 3 points, a 2-degree regression (quadratic) can fit all three points perfectly, etc. So if I have a 100 points, one might say that a 99-degree regression is the "best degree". However, I look at higher-degrees as a cost.

I want a method of determining a regression with a balance between low residual errors and low degree. I imagine that there must be some sort of a "cost" parameter that I have to set, because the computer alone cannot say what the "right" balance between residual error and degree is.

Can anyone point me to the name of such a technique? Perhaps the most common used form of it?

I want to apply this to stock market prices. As human beings, we can look at a plot of stock prices and mentally "fit" a smooth curve across the points that makes sense. But how does a computer do this? We can't just tell it to do a perfect fit, because then it'll do an N-1 degree fit (e.g. cubic B-splines).

Thanks in advance!
 
Mathematics news on Phys.org
  • #2
You can introduce a weight function: ##W=c_1\cdot \deg p + c_2 \cdot \mu## and minimize it. Of course you will first have to choose an appropriate measure ##\mu## to scale your error margins.
 

FAQ: Optimizing Regression Degree with Weighted Cost Function

What is Smooth Adaptive Regression?

Smooth Adaptive Regression (SAR) is a statistical method used to model relationships between variables in a dataset. It is a type of regression analysis that uses a non-parametric approach, meaning it does not make assumptions about the underlying distribution of the data.

How does SAR differ from traditional regression methods?

SAR differs from traditional regression methods in that it adapts to the complexity of the data and does not require the user to specify the functional form of the relationship between variables. This allows for more flexibility in modeling complex and non-linear relationships.

What are the advantages of using SAR?

Some advantages of using SAR include its ability to handle non-linear relationships, its flexibility in adapting to different types of data, and its ability to identify important variables in a dataset. It is also less sensitive to outliers compared to traditional regression methods.

How is SAR implemented in practice?

SAR is typically implemented using smoothing techniques, such as kernel smoothing or splines, to create a smooth curve that best fits the data. The amount of smoothing can be adjusted to control the flexibility of the model. Cross-validation techniques can also be used to select the optimal amount of smoothing.

What are the limitations of SAR?

One limitation of SAR is that it can be computationally intensive for large datasets. Additionally, it may not perform well if the data has a high level of noise or if there are missing values. It also requires the user to select the appropriate amount of smoothing, which can be subjective and may affect the results.

Back
Top