Comparing data sets of different sizes

In summary, you have two data sets with different bin size. You want to divide the y-values of the two data sets. The problem is that the x arrays for the two sets have different spacings (bin sizes). One set has 4175 data points, evenly spaced. The other set has its x values NOT evenly spaced and there are only 1919 data points in it. You could zero-pad the smaller array to make it the same size as the larger one, but this is tricky because it is unclear if the order and positioning of the data points is important. You could sample the larger x array and extract the number of values equal to the smaller x array. Then generate y arrays that are of equal size.
  • #1
cepheid
Staff Emeritus
Science Advisor
Gold Member
5,199
38
I have two data sets, each having its own array of x values and its own corresponding array of y values. I want to divide the y-values of two data sets. The problem I am having is that the x arrays for the two sets have totally different spacings (bin sizes). One set has 4175 data points, evenly spaced. The other set has its x values NOT evenly spaced and there are only 1919 data points in it.

What would be the best way of going about modifying the second data set so that it might be compared to the first one? I could just interpolate, but then I am worried that I am basically just adding made up data points to the y values for the second set, and that I might destroy some features in it, or add spurious ones.
 
Technology news on Phys.org
  • #2
  • #3
cepheid said:
I have two data sets, each having its own array of x values and its own corresponding array of y values. I want to divide the y-values of two data sets. The problem I am having is that the x arrays for the two sets have totally different spacings (bin sizes). One set has 4175 data points, evenly spaced. The other set has its x values NOT evenly spaced and there are only 1919 data points in it.

What would be the best way of going about modifying the second data set so that it might be compared to the first one? I could just interpolate, but then I am worried that I am basically just adding made up data points to the y values for the second set, and that I might destroy some features in it, or add spurious ones.

A few questions:

Are the positions in the x arrays relevant? You can perhaps zero-pad the smaller array to make it the same size of the larger one. This is a tricky question to answer because it is certainly do-able, but we'd need to know more about the data and if the order and positioning of the data points is important.

So essentially you have 4 arrays? x1, y1, x2, y2? And the y arrays are derived from the x arrays?

Can you maybe sample the larger x array and extract the number of values equal to the smaller x array? Then generate y arrays that are of equal size?
 
  • #4
The positions in the x arrays are relevant. The x arrays are wavelengths. The y arrays are essentially intensities. So these are spectra. See the astronomy thread that I linked to for more details.

Yes, there are four arrays as you described. The y arrays are not derived from the x arrays. They are observed/measured intensities for each wavelength.

Sampling the larger x array may not be that useful, since the other data I have that I'm going to calibrate off these data are equally as large. Also, what if none of the wavelengths in x1 exactly match those in x2?
 
  • #5
Seems to me you're stuck interpolating. When the samples are taken, how is the filtering done: how steep are the ramps of the high and low pass band filters for each frequency range used in the sample gathering? The filters are in effect acting as interpolators already. When you mention "equally" spaced, is this linear, logarithmic, ... ? How large is the range and domain of the sampled data set? Interpolation somewhat modeled after the filters might improve the results.
 

Related to Comparing data sets of different sizes

What is the purpose of comparing data sets of different sizes?

The purpose of comparing data sets of different sizes is to identify similarities and differences between the data sets and draw meaningful conclusions. It can also help in identifying trends and patterns within the data.

What are the challenges of comparing data sets of different sizes?

Some of the challenges of comparing data sets of different sizes include dealing with missing or incomplete data, accounting for variations in sample sizes, and selecting appropriate statistical methods to make valid comparisons.

What are some commonly used statistical methods for comparing data sets of different sizes?

Some commonly used statistical methods for comparing data sets of different sizes include t-tests, ANOVA, and correlation analysis. These methods allow for the comparison of means, variances, and relationships between variables across different data sets.

How can outliers affect the comparison of data sets of different sizes?

Outliers, or extreme values in a data set, can significantly impact the comparison of data sets of different sizes. They can distort the measures of central tendency and spread, leading to incorrect conclusions. It is important to identify and address outliers before making comparisons.

What are some best practices for comparing data sets of different sizes?

Some best practices for comparing data sets of different sizes include clearly defining the research question or hypothesis, selecting appropriate statistical methods, addressing missing or incomplete data, and carefully interpreting the results in the context of the data sets and the research question.

Similar threads

  • Programming and Computer Science
Replies
14
Views
1K
  • Programming and Computer Science
Replies
1
Views
641
  • Programming and Computer Science
Replies
1
Views
1K
  • Programming and Computer Science
Replies
10
Views
1K
  • Programming and Computer Science
Replies
22
Views
3K
  • Programming and Computer Science
Replies
11
Views
1K
  • Programming and Computer Science
Replies
6
Views
914
  • Programming and Computer Science
Replies
1
Views
10K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
903
  • Programming and Computer Science
Replies
3
Views
694
Back
Top