Python: Help with bestfit line and outliers

In summary, the individual is struggling with outliers affecting the best fit line on their scatter plot in python. They have tried using numpy's polyfit function, but still encounter points that throw off the slope. Despite researching and checking python references, they have not found a solution. They are looking for a way to fix this issue without limiting the interval or physically removing the bad points from their data. Additionally, they are seeking a way to take errors into account. It appears that the issue may lie with the data itself, rather than the software.
  • #1
DMT
9
0
I've been having some trouble with outliers messing up my best fit line on my scatter plot in python. I'm using numpy's polyfit function to calculate the slope and y intercept of the best fit line, however I always seem to get one or two points which throw off the slope enough to make quite a noticeable difference. I've already checked a few python references and did a lengthy google search, but haven't found a solution. Does anyone know of a good way to fix this problem without having to limit the interval or physically remove the bad points from my data?

Edit: Also, knowing a way to take errors into account would be very helpful as well.

Thanks!
 
Last edited:
Technology news on Phys.org
  • #2
I have not used the polyfit function in python, but have used it a lot in Matlab. If have points that are quite far from the best fit line, the best I can say is that the points are not good points. If you are plotting some experiment, then they might be the result of some badly performed experiment. Python, like Matlab, will try to give you the best fit line always. You have yourself said that you haven't found anything on Google. This shows that the software is perfectly fine, and the problem is in your data.
 

FAQ: Python: Help with bestfit line and outliers

What is the purpose of a bestfit line in Python?

A bestfit line in Python is used to find the relationship between two variables by creating a line that closely fits the data points. This line can then be used to make predictions and analyze the data.

How can I find the bestfit line in Python?

To find the bestfit line in Python, you can use the built-in function "polyfit" from the NumPy library. This function takes in the x and y values of the data points and returns the slope and intercept of the bestfit line.

What are outliers and how do they affect the bestfit line?

Outliers are data points that are significantly different from the rest of the data. In a bestfit line, outliers can affect the slope and intercept of the line, making it less accurate. It is important to identify and handle outliers before creating a bestfit line.

How can I detect and remove outliers in Python?

There are various methods for detecting and removing outliers in Python, such as using statistical measures like the interquartile range or visual methods like scatter plots. Once outliers are identified, they can be removed from the dataset or their values can be replaced with more reasonable values.

Can I customize the bestfit line in Python?

Yes, you can customize the bestfit line in Python by adjusting the parameters of the "polyfit" function, such as the degree of the polynomial or the weights of the data points. You can also use other libraries like matplotlib to add labels, colors, and other visual elements to the bestfit line.

Similar threads

Replies
3
Views
2K
Replies
10
Views
2K
Replies
3
Views
690
Replies
18
Views
1K
Replies
4
Views
4K
Replies
6
Views
1K
Replies
6
Views
3K
Replies
1
Views
1K
Replies
4
Views
5K
Replies
4
Views
2K
Back
Top