How to calculate max/min scales on a scatter plot

  • MHB
  • Thread starter expertalmost
  • Start date
  • Tags
    Plot
In summary, the user is looking for a mathematical method to establish smooth maximum and minimum lines for 3 log scatter plots. They are currently using a moving average plus standard deviation method, but it does not work well due to data clumping. They are looking for a solution that is elegant, robust, and general enough for all three data sets. The user is not concerned with the data being truly log-normal, but rather wants to clip extremes and indicate them as such. The desired solution should be able to handle different data sets and provide a consistent and meaningful 0-1 scale. After considering a cubic polynomial approach suggested by another user, the user has found success with rank-type smoothing or moving quantiles with damping.
  • #1
expertalmost
5
0
Good morning!

I have 3 log scatter plots that I want to establish smooth maximum and minimum lines. What is the usual mathematical method to do that? (Image and excel file links below.)

The black lines on the scatter plot images are hand drawn. The third scatter plot is especially tricky and not amenable to a moving average plus stddev because of the data clumping. Note: This is time series data so new data constantly comes in. In other words, I cannot just use the whole data population in one shot.

Any ideas would be greatly appreciated.

Excel File: https://dl.dropboxusercontent.com/u/44057708/Three%20Scatters.xls
Image at: https://dl.dropboxusercontent.com/u/44057708/ThreeScatters.jpg
 
Mathematics news on Phys.org
  • #2
Can you give us a little more context? Here are some questions I have:

1. How is this data generated? What are you measuring?

2. Is it important that every single data point in one cluster lies between your smooth max and min lines? Or is it enough that the vast majority lie between the two lines?

3. What is the data rate of this data? That is, how fast is the data coming in?

4. Are there any other features you'd like to know about the data? Local peaks, for example?
 
  • #3
Thank you for your time and questions! I appreciate your efforts. Here are some brief answers to your questions.

1) I use financial market analysis and these are the log-normal values of them. Whether the data is actually/truly log-normal is not really a concern as extremes are clipped and indicated as such. Using mean/stddev analysis on the third series does not work well due to the data clumping. I am looking for a solution elegant/roboust/general enough for all three data sets. And I have many groups of three data sets.

2) Not every point needs to lie between my max/min. I was targetting 80% on the minimum due to the paucity of points there and zero is a less critical component. Targeting 95% on the maximum side.

3) The data is coming in slowly. Only using daily analysis now.

4) In this case, not interested in local peaks other than how well they get smoothed in the final scaling.

Hope this helps define the problem more clearly :)

Thank you again for your interest.
 
  • #4
You say that the mean/std dev approach doesn't work. What if you computed a moving average on the basis of a lot more data points? For example:

1. Fit a cubic polynomial to the data. Excel will do this quite readily. Suppose the result to be $f(t)$.
2. Compute the maximum deviation from the cubic, and construct an envelope around $f(t)$ thus: $f(t) \pm \text{max dev}$. That would guarantee all the data would be in the envelope.

However, the envelope might not be tight enough. To help you more, I think I still need to know your design requirements better. By what criteria would you judge the "goodness" of the envelope?
 
  • #5
Thank you for your suggestions! You obviously know considerably more math than me and I appreciate your insights and experience. I will have to investigate cubic polynomials.

Right now I take the 25 largest/smallest of the last 100 elements and average that. I also add a stddev amount to the max. (ad hoc...yes!) Then smooth by damping (multiplying the change by .1 and using it). I was hoping there was a more elegant/roboust/general solution as I have to tweak the stddev and damping factors for different data sets.

The issue I have with using a larger data sample is the the lag introduced.

The use is quite simple. I use previous data to establish stable max/min levels so I can scale new values as they come in. Gives me a 0-1 range that is consistent and meaningful across data sets. As far as goodness, again, ad hoc. No more than 10-15 percent of the values should be clipped above/below my max/min scale. So the upper black line is my 1 and the lower is my 0 and as new data comes in, it is scaled to the most recent 0-1 range. And then it is used to update the population sample. Standard time-series analysis (hopefully).

My apologies for the lengthy replies. Not having much formal training in this, I end up using more words than probably necessary. Thank you for your patience! Hope it helps clarify what I'm trying to do. :)
 
  • #6
I've generated a working solution and just wanted to post it for future generations ;)

Someone pointed out that because the data is not really fall under "standard error" due to the gaps, that medians +/- stddev would not really work. I have confirmed this with many days of attempts.

Therefore, the other typical solution, as far as I can tell, is called rank-type smoothing or moving quantiles; with damping of the end result. Basically, using a sample size of 100, I take an average of the 10 largest items for the maximum and an average of the 25 smallest items for the minimum. I then dampen the changes to 10% for smoothing. This gives me a smooth enough maximum and minimum line to use as a 0-1 scale.

Hope this helps!
 
  • #7
Glad you found something that worked! I can't say I understand it - this shows that you might not be so far below me in mathematical knowledge as you thought. ;)
 

FAQ: How to calculate max/min scales on a scatter plot

How do I determine the maximum and minimum scales for a scatter plot?

The maximum and minimum scales for a scatter plot are based on the range of values for the data being plotted. The maximum scale should be slightly larger than the largest data point, and the minimum scale should be slightly smaller than the smallest data point.

Can I use different scales for the x-axis and y-axis on a scatter plot?

Yes, it is common to use different scales for the x-axis and y-axis on a scatter plot. This allows for better visualization of the relationship between the two variables being plotted.

How do I adjust the scales on a scatter plot to fit all of my data points?

If your data points are not fitting within the current scales, you may need to adjust the maximum and minimum scales accordingly. You can do this by manually changing the scales or using software that automatically adjusts them based on the data.

How do I determine the appropriate scale for a scatter plot?

The appropriate scale for a scatter plot depends on the range and density of your data points. If your data is spread out over a wide range, you may need a larger scale to accurately represent the data. If your data is densely clustered, a smaller scale may be more appropriate.

Can I use a logarithmic scale on a scatter plot?

Yes, a logarithmic scale can be used on a scatter plot if the data being plotted has a large range of values. This can help to better visualize the relationship between the variables, especially if there is a large difference between the minimum and maximum values.

Similar threads

Replies
1
Views
5K
Replies
2
Views
8K
Replies
10
Views
4K
Replies
13
Views
2K
Back
Top