Outlier Detection - Algorithm to Exclude Systematic Error from Data Set

  • Thread starter vibe3
  • Start date
  • Tags
    Detection
In summary, The conversation discusses a data set with a systematic error at specific time points and the need to detect and exclude these errors from the data. Various methods, such as plotting moving averages and using different window sizes, are suggested to achieve this goal. One option is to use an algorithm without statistical justification, while another option is to use an algorithm that can withstand academic scrutiny. The suggestion to plot the difference between adjacent bins and use a moving average to reduce noise is also mentioned.
  • #1
vibe3
46
1
Hi all, I have data similar to the following

plot.png


where the x-axis is time and the y-axis is magnetic field. At around t = 20 (and t = -80) there is a systematic error (probably due to some other current switching on and then switching off) which I want to get rid of in my data.

Can anyone recommend a good algorithm to detect when this happens in my time series and exclude it from my data set?

I plotted the moving average too which seems to indicate it is not as simple as simply searching for large deviations from the mean.
 
Physics news on Phys.org
  • #2
vibe3 said:
I plotted the moving average.

Moving averages can be taken over windows of various sizes and the windows can include both the past and future. You could try various windows.

Your goal isn't precisely defined yet. It could be either one of the following:

1) I want an algorithm to detect the regions of the curve affected by switching currents. Suggest an algorithm. I'll try it and decide myself if it works. There doesn't have to be any statistical justification for it. This is not for a published paper or anything that needs academic scrutiny.

2) I want an algorithm that can stand academic scrutiny and not attract criticism if I write up what I'm doing as a report.
 
  • #3
Option 1 would be fine for me
 
  • #4
Judging from the curve, you have very large differences between adjacent bins at the edges of those outliers. If you just plot ##|n_i-n_{i-1}|##, they should give two nice peaks. Use the moving average of a few bins instead of the original values if the dataset is too noisy.
 
  • #5



Hello there,

Thank you for sharing your data and question with us. Outlier detection is an important step in analyzing data, as it helps to identify and exclude any errors or anomalies that may affect the overall results.

There are several algorithms that can be used for outlier detection, and the choice will depend on the specific characteristics of your data. One commonly used approach is the Z-score method, which calculates the standard deviation of the data and identifies any points that fall outside a certain threshold. This method can be effective in detecting outliers in a normally distributed data set.

Another approach is the use of box plots, which visually display the distribution of the data and can help to identify any extreme values. You can also use statistical tests, such as the Grubbs test, to determine if any data points are significantly different from the rest of the data.

In your case, where the systematic error occurs at specific points in time, you may want to consider using a time series analysis approach. This involves modeling the data over time and identifying any deviations from the expected pattern. You can then exclude these points from your data set.

I would also recommend consulting with a statistician or data scientist for further guidance on selecting the most appropriate algorithm for your specific data set. Best of luck in your analysis!
 

Related to Outlier Detection - Algorithm to Exclude Systematic Error from Data Set

1. What is an outlier in data analysis?

An outlier is a data point that significantly deviates from the rest of the data in a dataset. It is usually an extreme value that is much larger or smaller than the majority of the data points.

2. Why is it important to detect and exclude outliers from a dataset?

Detecting and excluding outliers is important because they can significantly skew statistical analyses and lead to inaccurate conclusions. Outliers can also be indicators of errors or anomalies in the data collection process.

3. What are some common methods for detecting outliers?

Some common methods for detecting outliers include using statistical measures such as z-scores, box plots, and scatter plots. Machine learning algorithms like k-nearest neighbors and isolation forests can also be used to identify outliers.

4. How does outlier detection help to improve data quality?

By excluding outliers, the data becomes more representative of the true underlying pattern, leading to more accurate analyses and conclusions. This improves the overall quality and reliability of the data.

5. Are there any limitations to outlier detection methods?

Yes, there are some limitations to outlier detection methods. They may not be effective for identifying outliers in high-dimensional data, and they can also be influenced by the choice of parameters and assumptions made. It is important to carefully evaluate and interpret the results of outlier detection methods.

Back
Top