Can't we use linear regression for classification/prediction?

In summary: Linear regression is used to fit a line to the data, not to classify it into distinct categories like "yes" or "no". Logistic regression is better suited for this type of classification task. Linear regression can be used for classification, but it is not ideal and may not give accurate results.
  • #1
shivajikobardan
674
54
Homework Statement
difference between logistic and linear regression.
Relevant Equations
none
they say that linear regression is used to predict numerical/continuous values whereas logistic regression is used to predict categorical value. but i think we can predict yes/no from linear regression as well

1651852382099.png


Just say that for x>some value, y=0 otherwise, y=1. What am I missing? What is its difference with this below figure?

1651852434225.png
 
Physics news on Phys.org
  • #2
Branching can have profound effects on performance: for example, the overhead of accessing code modules not currently quickly available after an if/then/else branch. It not as as simple as you might think a priori. Complicated data analysis like you present can have this kind of problem too.

Here is a somewhat dated, but still very important read from Ulrich Drepper:
https://www.akkadia.org/drepper/cpumemory.pdf

I am sure someone will spout reasons why it is "bad", but your question is down in the weeds and low level db programmers are down there in the weeds with you and have to mess with branching effects all the time. I concede that very high level programming platforms can negate some of this. But the question asked above still remains for us weedy types.

Posted in error.
 
Last edited:
  • #3
jim mcnamara said:
Branching can have profound effects on performance...
Is this the reply to a different thread?
 
  • #4
shivajikobardan said:
What is its difference with this below figure?
Are you joking? One is a straight line (hence linear regression), the other is an s-curve.

The second plot is a terrible example of a threshold curve BTW - the input data points should not all be 0 or 1 because in that case there is no need to apply the threshold.
 
  • #6
pbuk said:
Are you joking? One is a straight line (hence linear regression), the other is an s-curve.
Ofc I get that. What I am trying to say is why can't we say if x>0.5, y=1 else y=0?
pbuk said:
The second plot is a terrible example of a threshold curve BTW - the input data points should not all be 0 or 1 because in that case there is no need to apply the threshold.
 
  • #7
shivajikobardan said:
Ofc I get that. What I am trying to say is why can't we say if x>0.5, y=1 else y=0?
You can use linear regression with nonlinear functions as long as the form of the model has the parameters of the linear regression in an acceptable way. The values of the nonlinear functions are the independent variables of the linear regression. i.e. ##Y = a_0 + a_1 f_1(X_1) + a_2 f_2(X_2) + \epsilon## is a model where linear regression can be used to find the ##a_i##s even if the ##f_i##s are nonlinear. The values ##z_{i,j}=f_i(x_j)## are the new independent variables.
One limitation on the use of linear regression for classification is that the classifications often can not be defined by a variable, ##X##. How can the categories (man, woman, dog, cat, duck) be defined by a real variable, ##X##?
 
Last edited:
  • #8
shivajikobardan said:
What I am trying to say is why can't we say if x>0.5, y=1 else y=0?
Because that is not a linear relationship.
 

FAQ: Can't we use linear regression for classification/prediction?

Can linear regression be used for classification?

Linear regression is primarily used for predicting continuous numerical values, so it is not recommended for classification tasks. However, it can be used for binary classification by setting a threshold for the predicted values. If the predicted value is above the threshold, it is classified as one class, and if it is below the threshold, it is classified as the other class.

What are the limitations of using linear regression for prediction?

Linear regression assumes a linear relationship between the independent and dependent variables, which may not always be the case in real-world data. Additionally, it is sensitive to outliers and can be affected by multicollinearity, where the independent variables are highly correlated with each other.

How does linear regression differ from other classification algorithms?

Linear regression is a type of regression algorithm that aims to predict a continuous numerical value, while classification algorithms aim to predict discrete categorical values. Linear regression also uses a different cost function and optimization method compared to classification algorithms.

Can linear regression be used for multi-class classification?

No, linear regression is not suitable for multi-class classification as it can only predict continuous numerical values. It cannot handle more than two classes, and it would require multiple linear regression models to perform multi-class classification, which is not recommended.

What are some alternatives to using linear regression for classification?

There are many classification algorithms that can be used instead of linear regression, such as logistic regression, decision trees, random forests, support vector machines, and neural networks. The choice of algorithm depends on the type of data and the problem at hand.

Back
Top