- #1
Jarven
- 7
- 0
Hey, I have never taken any stats course but I desperately need the answers to my questions checked out.
We have a dataset with 5 independent and dozens of observational dependent variables, including location. The independent and dependent variables are sampled asynchronously! (the variables are logs of activities, location, type of language being used, voice samples, and some survey data). Some datapoints are better than others - but we don't know which those are. Observations took place over 3 months at more or less regular intervals. If it were a continuous signal we'd find that the sampling rate was below the Nyquist rate.
1. What techniques you would use to determine if this data-set has some signal or is all noise? Note, you are free to explore statistical approaches in the frequency (Fourier or other transform) domain as well.
The first technique I would use to determine whether the data-set contains a signal is to perform an ensemble averaging. This technique is utilized under the assumption that the noise is completely random and the source(s) of the signal produce consistent data points. If a sufficient amount of data-sets were collected over the 3 month period, the ensemble average would significantly reduce noise and make the signal apparent, assuming a signal exists.
Secondly, creating a frequency spectrum of the data-set using Fourier transform shall be useful in identifying white noise. If the amplitude of frequency appears to be equal within a discrete set of frequencies then it is possible to dismiss that range as noise. The remaining frequencies which do not exhibit properties of white noise are subject to a Fourier inverse transform and the signal is reconstructed and is subject to further modifications such as smoothing. If the white noise spans the entire domain of frequencies then we can assume a signal does not exist.
2. How can you use the data even though it is sampled below the nyquist rate?
Assuming the difference between the lower and upper range of the signal frequencies is less than that of its lower range, it is definitely possible to use this data. The data does not need to be sampled at twice the upper frequency of the signal but can be sampled at twice the bandwidth of the signal without detrimental effects from aliasing.
Is what I wrote right? Am I missing stuff. Can you point me in the right direction?
I have never learned any of the topics encompassed by the question and currently my knowledge for the answers come from Wikipedia.
We have a dataset with 5 independent and dozens of observational dependent variables, including location. The independent and dependent variables are sampled asynchronously! (the variables are logs of activities, location, type of language being used, voice samples, and some survey data). Some datapoints are better than others - but we don't know which those are. Observations took place over 3 months at more or less regular intervals. If it were a continuous signal we'd find that the sampling rate was below the Nyquist rate.
1. What techniques you would use to determine if this data-set has some signal or is all noise? Note, you are free to explore statistical approaches in the frequency (Fourier or other transform) domain as well.
The first technique I would use to determine whether the data-set contains a signal is to perform an ensemble averaging. This technique is utilized under the assumption that the noise is completely random and the source(s) of the signal produce consistent data points. If a sufficient amount of data-sets were collected over the 3 month period, the ensemble average would significantly reduce noise and make the signal apparent, assuming a signal exists.
Secondly, creating a frequency spectrum of the data-set using Fourier transform shall be useful in identifying white noise. If the amplitude of frequency appears to be equal within a discrete set of frequencies then it is possible to dismiss that range as noise. The remaining frequencies which do not exhibit properties of white noise are subject to a Fourier inverse transform and the signal is reconstructed and is subject to further modifications such as smoothing. If the white noise spans the entire domain of frequencies then we can assume a signal does not exist.
2. How can you use the data even though it is sampled below the nyquist rate?
Assuming the difference between the lower and upper range of the signal frequencies is less than that of its lower range, it is definitely possible to use this data. The data does not need to be sampled at twice the upper frequency of the signal but can be sampled at twice the bandwidth of the signal without detrimental effects from aliasing.
Is what I wrote right? Am I missing stuff. Can you point me in the right direction?
I have never learned any of the topics encompassed by the question and currently my knowledge for the answers come from Wikipedia.