Generating data from trendline

  • Thread starter PixelDictator
  • Start date
  • Tags
    Data
In summary, a user is trying to generate sets of random data points with corresponding uncertainties that would give the same fitted line with the same uncertainties. They are at a loss for ways to achieve this and are asking for a method to make it happen. One suggestion is to generate numbers and transform them according to the equation of the line, then add "noise" from a normal distribution centered at zero. Another suggestion is to scale a set of values to have a desired mean and standard deviation. There is also a discussion about the ambiguity of the term "uncertainties" and the importance of specifying which variable is treated as independent in linear least squares regression.
  • #1
PixelDictator
2
0
Hello all,

I am trying to take a fitted line, with given standard error in slope and y-intercept, and generate sets of random data points (and corresponding uncertainties) which would give the same line with the same uncertainties.

I'm at a loss for ways to achieve this, and I'm not quite sure that it would be possible without trying to brute-force it with programming, or something equally ugly... Is there any method that would make this happen? We don't have any original data points, just the few numbers about the trendline.
 
Mathematics news on Phys.org
  • #2
Generate some numbers and transform them according to the equation of your line. Then just draw "noise" from a normal distribution centred at zero and add it to your data. In matlab, you would do something like this...

x = rand(1,100); % Generate some data
noise = normrnd(0,1,1,100); % Generate noise
y = 2*x + 1 + noise; % Transform it according to the equation of your line
 
  • #3
PixelDictator said:
Hello all,

which would give the same line with the same uncertainties.

Do you mean exactly the same line with exactly the same standard deviation for the errors? - so someone fitting a line to the generated data would get the exactly the same slope and intercept?

Or do you mean you want to do what Number Nine suggested -which is to assume your line is the correct deterministic part of the equation for the data and then generate the random errors? In that case, someone fitting a line to the generated data might not get exactly the same line as you began with.
 
  • #4
Stephen,
I'm attempting to do the former. I've set up a program to do what Number Nine suggested, which works pretty well in the meantime, but it would be a lot better if I had a way to recreate the line and uncertainties perfectly.
 
  • #5
You can scale a set of values to have whatever mean and standard deviation you want by adding and multiplying it by two constants. For example, generate a set of values E. Suppose it has mean mu and variance sigma_sq. For constants c and k, created scaled data by setting F = k E + c. The data F has mean = k mu + c and variance = k^2 ( sigma_sq). You can solve for the values of k and c that produce the mean and variance that you want.

(In this post I'm talking about variances as a "sample variances", which are computed with a denominator of n = the number of data points, not with a denominator of n-1, as in the unbiased estimator for population variance.)

You are using the ambiguous word "uncertainties", and I can't be sure what quantity or quantities you mean by that.

One interesting technicality about linear least squares regression is that if you fit a line to (x,y) data viewing x as the independent variable, you may get a different line that if you regard y as the independent variable. If you want "artificial" data so that the procedure for linear least squares regression produces a given line when applied to that data, then you must be careful to specify which variable is treated as independent.

Assume x is the independent variable and the artificial data is (x, y) with y = A x + B + F where A and B are constants and the F are artificial "errors" from the trendline. The equations that must be satisified in order for the linear regression to reproduce A and B when applied to the data are (as I recall):

A = ( cov(x,Ax + B + F))/ var(x)
B = mean of (A x + B + F) - (A)( mean of x).

where the means and variances involved are sample means and variances of the data.

If I'm clear on what you are trying to do then we can check if I got those equations right and solve for them for k and c.
 

FAQ: Generating data from trendline

How can I generate data from a trendline?

To generate data from a trendline, you can use a mathematical formula to calculate the data points that fall along the trendline. This formula usually involves the slope and intercept of the trendline, as well as the x-values of the data points.

What is the purpose of generating data from a trendline?

The purpose of generating data from a trendline is to make predictions or projections about future data points based on the existing trend. This can be useful in making decisions or planning for the future.

Can I generate data from any type of trendline?

Yes, you can generate data from any type of trendline, such as linear, exponential, or logarithmic. However, the accuracy of the generated data may vary depending on the type of trendline and the quality of the data.

Is it necessary to have a trendline to generate data?

No, it is not necessary to have a trendline to generate data. You can also use other methods, such as regression analysis, to generate data based on the overall trend of the data points.

Are there any limitations to generating data from a trendline?

Yes, there are some limitations to generating data from a trendline. This method assumes that the trend will continue in the same direction and at the same rate, which may not always be the case. It is also important to consider the quality and reliability of the data used to create the trendline.

Similar threads

Replies
6
Views
1K
Replies
5
Views
2K
Replies
5
Views
2K
Replies
22
Views
3K
Replies
4
Views
1K
Replies
10
Views
499
Back
Top