Outlier Detection - Algorithm to Exclude Systematic Error from Data Set

vibe3 · Apr 12, 2013

Hi all, I have data similar to the following

where the x-axis is time and the y-axis is magnetic field. At around t = 20 (and t = -80) there is a systematic error (probably due to some other current switching on and then switching off) which I want to get rid of in my data.

Can anyone recommend a good algorithm to detect when this happens in my time series and exclude it from my data set?

I plotted the moving average too which seems to indicate it is not as simple as simply searching for large deviations from the mean.

Stephen Tashi · Apr 15, 2013

vibe3 said:

I plotted the moving average.

Moving averages can be taken over windows of various sizes and the windows can include both the past and future. You could try various windows.

Your goal isn't precisely defined yet. It could be either one of the following:

1) I want an algorithm to detect the regions of the curve affected by switching currents. Suggest an algorithm. I'll try it and decide myself if it works. There doesn't have to be any statistical justification for it. This is not for a published paper or anything that needs academic scrutiny.

2) I want an algorithm that can stand academic scrutiny and not attract criticism if I write up what I'm doing as a report.

vibe3 · Apr 16, 2013

Option 1 would be fine for me

mfb · Apr 16, 2013

Judging from the curve, you have very large differences between adjacent bins at the edges of those outliers. If you just plot ##|n_i-n_{i-1}|##, they should give two nice peaks. Use the moving average of a few bins instead of the original values if the dataset is too noisy.

blue_raver22 · Apr 23, 2013

Hello there,

Thank you for sharing your data and question with us. Outlier detection is an important step in analyzing data, as it helps to identify and exclude any errors or anomalies that may affect the overall results.

There are several algorithms that can be used for outlier detection, and the choice will depend on the specific characteristics of your data. One commonly used approach is the Z-score method, which calculates the standard deviation of the data and identifies any points that fall outside a certain threshold. This method can be effective in detecting outliers in a normally distributed data set.

Another approach is the use of box plots, which visually display the distribution of the data and can help to identify any extreme values. You can also use statistical tests, such as the Grubbs test, to determine if any data points are significantly different from the rest of the data.

In your case, where the systematic error occurs at specific points in time, you may want to consider using a time series analysis approach. This involves modeling the data over time and identifying any deviations from the expected pattern. You can then exclude these points from your data set.

I would also recommend consulting with a statistician or data scientist for further guidance on selecting the most appropriate algorithm for your specific data set. Best of luck in your analysis!

Outlier Detection - Algorithm to Exclude Systematic Error from Data Set

Related to Outlier Detection - Algorithm to Exclude Systematic Error from Data Set

1. What is an outlier in data analysis?

2. Why is it important to detect and exclude outliers from a dataset?

3. What are some common methods for detecting outliers?

4. How does outlier detection help to improve data quality?

5. Are there any limitations to outlier detection methods?

Similar threads

Hot Threads

Recent Insights