Confidence limits for the inverse of an estimated value

In summary, the time constant is the length of time it takes for something to settle following a disturbance.
  • #1
Calvadosser
39
0
I am aware that, in statistics, things get difficult as soon as they get nonlinear. And taking the reciprocal of a quantity is a nonlinear operation.

I have some data that would form a nice looking straight line, except for random error scattering it around the line. I have a total of about fifty points. If I fit a regresssion line to the data, I can find an estimate of the slope of the line. In my particular case, the slope of the line (if I knew it precisely) would give me the coefficient λ in a 1st order linear differential equation dC(t)/dt = -λC(t). Thus the regression analysis gives me an estimate for λ.

There is a standard formula for calculating confidence limits on the estimate of the slope of a line computed via a regression analysis. This formula gives me the upper and lower confidence limits λ[itex]_{lower}[/itex] and λ[itex]_{upper}[/itex] on my estimate of λ.

The solution for the differential equation is C(t) = C(0) exp(-t/T), where the time constant, T = 1/λ. It is the time constant T that is the thing of real interest because this will tell me how long a system takes to settle following a disturbance.

Here is my question.

(A) What is the "best", in some appropriate sense, estimate for T? Is it simply 1/(my regression estimate for λ)?

(B) If so, what are the confidence limits for my estimate of T? Are they simply the inverses, 1/λ[itex]_{lower}[/itex] and 1/ λ[itex]_{upper}[/itex]. of my confidence limits on λ?

Thank you for any help. I assume it is a simple and straightforward question but I have not succeeded in finding the answer nor in working it out myself.
 
Physics news on Phys.org
  • #2
Hi Calvadosser,

Calvadosser said:
Here is my question.

(A) What is the "best", in some appropriate sense, estimate for T? Is it simply 1/(my regression estimate for λ)?

To enter into a discussion on what means the "best" estimator would require a whole course in statistical inference, but for the sake of this problem let's say yes, [itex]\frac{1}{\hat\lambda}[/itex] is the "best".
Calvadosser said:
(B) If so, what are the confidence limits for my estimate of T? Are they simply the inverses, 1/λ[itex]_{lower}[/itex] and 1/ λ[itex]_{upper}[/itex]. of my confidence limits on λ?.

That's another yes; in statistics we make data go through all kind of transformations so that we can fit it into statistical models of our choice; that's actually what you are doing with your equation so that you are able to fit your data into a linear model. Once you have fitted the model you need to apply the inverse transformation to the results, including confidence intervals, so yeah, your procedure is correct, you're fine.
 
Last edited:
  • #3
To follow on with what viraltux said, it depends ultimately on how specific transformations preserve information about the probabilities and subsequent information.

With some things if you are given say x, and you need to find T = f(x), then you can apply the transformations to give new results which will conserve the probabilistic properties under that transformation.

But other times, they don't. One example is with a technique known as highest posterior density in Bayesian analysis which doesn't.

If you want to look into this problem in general, find out frameworks which deal with transformation of statistics, intervals, and other measures that conserve the probabilistic information under transformation.
 
  • #4
viraltux said:
yes, [itex]\frac{1}{\hat\lambda}[/itex] is the "best".

I'm not sure about that. Take a simple example: X is uniform on [0,1].
The average is 0.5, so the 'correct' time constant is 2.
If we take 2 samples of X, X1 and X2, and calculate the time constant as the inverse of their mean, what is the expected value of the result?
[itex]\int^{1}_{x_{1}=0}\int^{1}_{x_{2}=0}2/(x_1+x_2)dx_2.dx_{1} = 2\int^{1}_{x_{1}=0}[ln(x_{1}+x_2)]^{1}_{x_{2}=0}dx_{1}[/itex]
[itex] = 2\int^{1}_{x_{1}=0}(ln(x_{1}+1)-ln(x_{1}))dx_1[/itex]
[itex] = 2[(x_{1}+1)ln(x_{1}+1)-x_{1}-x_{1}ln(x_{1})+x_{1}]^{1}_{x_{1}=0}[/itex]
= 4 ln(2) =~ 2.77
 
  • #5
haruspex said:
I'm not sure about that.

There are a few things sure in this world; the Sun is hot, the water is wet, and haruspex will be there to keep you on track! hahaha :smile:

haruspex said:
Take a simple example: X is uniform on [0,1].The average is 0.5, so the 'correct' time constant is 2. If we take 2 samples of X, X1 and X2, and calculate the time constant as the inverse of their mean, what is the expected value of the result?
[itex]\int^{1}_{x_{1}=0}\int^{1}_{x_{2}=0}2/(x_1+x_2)dx_2.dx_{1} = 2\int^{1}_{x_{1}=0}[ln(x_{1}+x_2)]^{1}_{x_{2}=0}dx_{1}[/itex]
[itex] = 2\int^{1}_{x_{1}=0}(ln(x_{1}+1)-ln(x_{1}))dx_1[/itex]
[itex] = 2[(x_{1}+1)ln(x_{1}+1)-x_{1}-x_{1}ln(x_{1})+x_{1}]^{1}_{x_{1}=0}[/itex]
= 4 ln(2) =~ 2.77

What you are doing is [itex]\frac{\frac{2}{x_{1,1}+x_{2,1}} + \frac{2}{x_{1,2}+x_{2,2}}+ ... + \frac{2}{x_{1,n}+x_{2,n}}}{n}≈2.77[/itex] but what you should do is [itex]\frac{n}{\frac{x_{1,1}+x_{2,1}}{2} + \frac{x_{1,2}+x_{2,2}}{2}+ ... + \frac{x_{1,n}+x_{2,n}}{2}}≈2[/itex]

But anyway, it is true that inference is a whole world; you may get really seemingly crazy estimators once you squeeze a problem, but for the problem presented by the OP as such what he/she is doing is just OK.

PS: If I ever go into space I want you to check the rocket engines; I know you won't let anything pass :-p
 
Last edited:
  • #6
viraltux, I believe I have correctly modeled the consequence of using OP's procedure. Try it in a spreadsheet. Generate 100 pairs of samples from U(0,1); for each pair take the average, then the inverse. You will almost always get a result > 2, often > 3.
Of course, this is fairly extreme. More samples in each set would give a smaller error.
Calvadosser, can you take a look at the distribution of the lambdas? If we know more about that we might be able either to suggest a better procedure or to put bounds on the error.
 
  • #7
haruspex said:
viraltux, I believe I have correctly modeled the consequence of using OP's procedure. Try it in a spreadsheet. Generate 100 pairs of samples from U(0,1); for each pair take the average, then the inverse. You will almost always get a result > 2, often > 3.
Of course, this is fairly extreme. More samples in each set would give a smaller error.
Calvadosser, can you take a look at the distribution of the lambdas? If we know more about that we might be able either to suggest a better procedure or to put bounds on the error.

Hi haruspex,

The OP has a straight line set of measurements in its model and he estimates via linear regression its slope having the value [itex]\hat{λ}[/itex] and a standard error for the estimate based on a Gaussian distribution.

He does not have 100 slopes to work with which seems to be the way you are approaching the problem based on the example you post, so even if he wanted he could not do the inverse for every slope and see how the distribution of inverse slopes behaves (or the distribution of λ slopes for that matter).

But even so, let's change the problem and imaging that he can actually measure a set of, let's say, n lambdas ([itex]λ_{1..n}[/itex]), and he wants to estimate [itex]T=1/λ[/itex]

What you suggest is [itex]\hat{T}= \frac{1/λ_1 + 1/λ_2 + ... +1/λ_n}{n} [/itex] but this is wrong, this approach allows the function that calculates T to bias its own estimation.

In this situations we apply what is called in inference the invariance principle which basically states that if you have a function [itex]f[/itex] and you want to estimate [itex]f(θ)[/itex] then you do [itex]\widehat{f(θ)} = f(\hat{θ})[/itex], which in the OP case would be [itex]\hat{T}=1/\hat{λ}[/itex], and, by the way, this principle would hold for whatever distribution θ, λ might have.
 
Last edited:
  • #8
viraltux said:
Hi haruspex,

The OP has a straight line set of measurements in its model and he estimates via linear regression its slope having the value [itex]\hat{λ}[/itex] and a standard error for the estimate based on a Gaussian distribution.

He does not have 100 slopes to work with which seems to be the way you are approaching the problem based on the example you post, so even if he wanted he could not do the inverse for every slope and see how the distribution of inverse slopes behaves (or the distribution of λ slopes for that matter).

But even so, let's change the problem and imaging that he can actually measure a set of, let's say, n lambdas ([itex]λ_{1..n}[/itex]), and he wants to estimate [itex]T=1/λ[/itex]

What you suggest is [itex]\hat{T}= \frac{1/λ_1 + 1/λ_2 + ... +1/λ_n}{n} [/itex] but this is wrong, this approach allows the function that calculates T to bias its own estimation.

In this situations we apply what is called in inference the invariance principle which basically states that if you have a function [itex]f[/itex] and you want to estimate [itex]f(θ)[/itex] then you do [itex]\widehat{f(θ)} = f(\hat{θ})[/itex], which in the OP case would be [itex]\hat{T}=1/\hat{λ}[/itex], and, by the way, this principle would hold for whatever distribution θ, λ might have.
No, you misunderstand what I did.
To find an unbiased estimator for a statistic s, the following is standard procedure:
- construct some candidate function fs({xi}) of n observations
- compute E(fs) as a function of a presumed value s of the statistic
- see how E(fs) compares with s
E.g. with s being the mean and fmean({xi}) = (Ʃxi)/n gives E(fmean) = mean, but with s being the variance, we get fvar = (Ʃ(xi-Ʃxi)/n)2)/(n-1).
If L is a linear function and fs is an unbiased estimator for s then L(fs) is an unbiased estimator for L(s). But it does not work for nonlinear functions. E.g. for the standard deviation, √fvar is not an unbiased estimator for √var. (There are corrections that have been developed, but none are perfect.)
In my model for the OP procedure, I took just two observations, computed the mean, and inverted. There are two ways we can try to assess this as an estimator for 1/mean. In my first post I assessed it analytically. Since you thought the analysis flawed, I then assessed it numerically. To find E(f) I had to generate lots of pairs and take the average result.

Now, as I said, my model was rather extreme. It took a uniform distribution 'close' to the origin (i.e. the std dev is a large fraction of the mean) and only used one pair of observations to compute the mean. Changing either of those would reduce the error. To get a bound on the error in the OP procedure we need to know more about the distribution and the number of samples.
 
  • #9
haruspex said:
No, you misunderstand what I did.
To find an unbiased estimator for a statistic s, the following is standard procedure:
- construct some candidate function fs({xi}) of n observations
- compute E(fs) as a function of a presumed value s of the statistic
- see how E(fs) compares with s
E.g. with s being the mean and fmean({xi}) = (Ʃxi)/n gives E(fmean) = mean, but with s being the variance, we get fvar = (Ʃ(xi-Ʃxi)/n)2)/(n-1).
If L is a linear function and fs is an unbiased estimator for s then L(fs) is an unbiased estimator for L(s). But it does not work for nonlinear functions. E.g. for the standard deviation, √fvar is not an unbiased estimator for √var. (There are corrections that have been developed, but none are perfect.).

First, [itex]\hat{λ}[/itex] being a biased or unbiased estimator is absolutely irrelevant to the problem, the invariance principle holds for any estimator; biased, unbiased or otherwise.

Second, the fact that an estimator is biased does not make it bad, for instance, the maximum likelihood estimator for the variance of a Gaussian distribution is biased and yet is the one preferred in many areas of multivariate analysis.

haruspex said:
Now, as I said, my model was rather extreme. It took a uniform distribution 'close' to the origin (i.e. the std dev is a large fraction of the mean) and only used one pair of observations to compute the mean. Changing either of those would reduce the error. To get a bound on the error in the OP procedure we need to know more about the distribution and the number of samples.

OK, I think we are reaching the point of confusion here, seems to me that for some reason you think that a biased estimator is something bad that has to be fixed, right? that's why these examples showing how applying the invariance principle might return biased estimators, is this correct?

Well, so it seems that in order to fix this, which we don't have to, you are asking the OP about the distribution of λ when actually there is no such thing; we only have one unique value for λ which is estimated via MLE in a linear regression procedure.

What we get is the estimation of λ using this method (or whatever method the OP used for the linear regression) and the error associated with it, that is all we have; no distributions, no number of points... no nothing, that's it, and in this scenario you apply the invariance principle to estimate T which is what the OP is already doing.

And when we calculate the confidence interval for T we will see how the estimation of T is not centered in such interval, accounting this way for the bias that seems to be the issue in this discussion.
 
  • #10
viraltux said:
First, [itex]\hat{λ}[/itex] being a biased or unbiased estimator is absolutely irrelevant to the problem, the invariance principle holds for any estimator; biased, unbiased or otherwise.
I thought the invariance principle was a pragmatic one rather than mathematically proven. Do I have that wrong?
seems to me that for some reason you think that a biased estimator is something bad that has to be fixed, right?
No, I don't go that far. I'm saying it's a reason to stop and think about what you're going to do with the answer. The search for unbiased estimators is grounded, in my view, on the typical case that an error of +ε costs about the same as an error of -ε. In the present case, if taking the parameter to be 1/(λ+ε) (instead of 1/λ) has about the same cost as taking it to be 1/(λ-ε) then the OP procedure is fine. But if it's (1/λ)+ε and (1/λ)-ε are about equal in cost then it's not.
you are asking the OP about the distribution of λ when actually there is no such thing
That was not my intention, whatever the wording. I was after the scatter in the data.
 
  • #11
I too am interested with regards to the invariance principle, because although I know it exists for many probabilistic and statistical situations involving transformations, I've never actually never checked it out which is contributing to my ignorance.

Could you point out a reference or two for this viraltux (maybe even a book that covers it?)
 
  • #12
haruspex said:
I thought the invariance principle was a pragmatic one rather than mathematically proven. Do I have that wrong?

chiro said:
I too am interested with regards to the invariance principle, because although I know it exists for many probabilistic and statistical situations involving transformations, I've never actually never checked it out which is contributing to my ignorance.

Could you point out a reference or two for this viraltux (maybe even a book that covers it?)

Yes you have it wrong haruspex, it is mathematically sound, if given the trillion of times that I've used it, and the quadrillions of times that I've seen it in publications, if now it turned out to be mathematically flawed my heart would skip a beat. :-p

Hi chiro, I didn't find the paper I wanted but anyway, this one proves that applying functions to MLE estimators return the MLE estimator of the function (which would be the OP case) and has an example too:

http://www.stats.ox.ac.uk/~dlunn/b8_02/b8pdf_6.pdf

haruspex said:
No, I don't go that far. I'm saying it's a reason to stop and think about what you're going to do with the answer. The search for unbiased estimators is grounded, in my view, on the typical case that an error of +ε costs about the same as an error of -ε. In the present case, if taking the parameter to be 1/(λ+ε) (instead of 1/λ) has about the same cost as taking it to be 1/(λ-ε) then the OP procedure is fine. But if it's (1/λ)+ε and (1/λ)-ε are about equal in cost then it's not.

This "cost about the same" is too ambiguous but I can see where the problem is, you have 1/(λ+ε) and you reason something like "well, if ε>0 I'm fine but, hey, if ε<0 the estimation for 1/(λ+ε) might go way to high! so... how about if I just adjust the estimator a little bit, something like 1/(λ+ε+τ) with τ>0 to make the estimator unbiased so when the bad news ε<0 come I am in the safe side." Something in these lines is what you are thinking right?

OK, I am going to briefly (and dramatically) describe what happened back in the days when the "bias vs unbiased" lesson came up in my faculty.

- professor: "what is best, a biased or an unbiased estimator?"
- students: "Unbiased", "of course" "I agree", "What kind of question is that?"
- professor: OK, why?
- students: :rolleyes: :rolleyes: :rolleyes: :rolleyes: :rolleyes:
- professor: soooooooooooooo...
- students: Well, if you got it biased, and you know it is biased, and you can even calculate how biased it is, well, then you can take the bias away! why would anyone want to use a calculation that is known to be consistently higher or lower?
- professor: That's right, why would anyone do that?
- students: :rolleyes: :shy: :mad: :zzz:

I am sure this situation is a classic in every inference course.

Anyway, also a classic is to take the estimation of the Gaussian variance as an example, let's consider the following three estimators for the variance

[itex]S_{unbiased} = \frac{1}{n-1} \sum_{i=1}^n\left(x_i - \overline{x} \right)^ 2[/itex]
[itex]S_{MLE} = \frac{1}{n} \sum_{i=1}^n\left(x_i - \overline{x} \right)^ 2[/itex]
[itex]S_{LSE} = \frac{1}{n+1} \sum_{i=1}^n\left(x_i - \overline{x} \right)^ 2[/itex]

Well, turns out that among these three the unbiased estimator is the one with the highest error! (in terms of least square error). OK, this is hard to believe and very counter intuitive because, geeee, we know it is biased! fix it! right? Well, you do that, and you introduce error.

You know, I had the mathematical proof in front of my eyes and I still had to run a simulation to believe it, but it is true!

Then one student (in this case it was me) asked "But everyone in school, engineering, physicist... they all use the unbiased version to estimate the variance, why on Earth we don't all use the LSE version with the lowest error!?"

professor: "Not everything that shines is gold" and he went on with the class. :confused:

Oh well, I had to do all kind of guesses about my professor statement on why people don't widely use the LSE version, but I will not go on with this here, for now, suffice to say that what the OP is doing is OK, and that unbiased doesn't mean the "best".
 
Last edited:
  • #13
Thank you viraltux for that result, I'll have to remember that because it's going to be very useful. I'm sure it's probably in my introductory stats book, but it's definitely good to know that it holds.
 
  • #14
chiro said:
Thank you viraltux for that result, I'll have to remember that because it's going to be very useful. I'm sure it's probably in my introductory stats book, but it's definitely good to know that it holds.

Don't be surprised if it is not because I think this is one of these things most of the time courses take for granted and they simply state it, at least I don't remember myself the professors proved it in class... Glad to know the prove is there though! :-p
 
  • #15
My thanks for the replies - and for the discussion, some of which (but not all) is over my head. I had originally supposed that more or less the same question is asked very frequently.

You have very kindly:

- Reassured me that what I propose is, at very least, not a stupid thing to do.

- Shown me that the question is deeper than I had imagined.

I'll add it to me ever-growing list of interesting things to look into - when other things don't take up all the available time.
 
  • #16
Calvadosser said:
I have a total of about fifty points. If I fit a regresssion line to the data, I can find an estimate of the slope of the line. In my particular case, the slope of the line (if I knew it precisely) would give me the coefficient λ in a 1st order linear differential equation dC(t)/dt = -λC(t).

How are the data points measured? Do you have separate instruments to measure [itex] \frac{dC(t)}{dt} [/itex] and [itex] C(t) [/itex]? Or are you computing [itex] \frac{dC(t)}{dt} [/itex] from differences in the [itex] C(t) [/itex] data?

(I'm wondering why you chose to use regression instead of fitting a curve to the C(t) data.)
 
  • #17
viraltux said:
http://www.stats.ox.ac.uk/~dlunn/b8_02/b8pdf_6.pdf
Ok, I see. Invariance works for MLE (and for median-unbiased estimators, right?), but not for estimators in general.
This "cost about the same" is too ambiguous but I can see where the problem is, you have 1/(λ+ε) and you reason something like "well, if ε>0 I'm fine but, hey, if ε<0 the estimation for 1/(λ+ε) might go way to high! so... how about if I just adjust the estimator a little bit, something like 1/(λ+ε+τ) with τ>0 to make the estimator unbiased so when the bad news ε<0 come I am in the safe side." Something in these lines is what you are thinking right?
Yes, but let me put it more precisely. In many cases, the cost of an error in one direction is about the same as the cost of the same magnitude error in the other direction. If so, an unbiased estimator is likely to be good (particularly if the cost is as the square of the error). In the present case, the question is whether this is true of the error in estimating λ (in which case the OP procedure is fine) or of the error in estimating 1/λ (in which case the OP procedure could be awful, depending on details).
I consider this to be the answer to your prof's challenge. Choice of estimator should depend on the cost function.
a classic is to take the estimation of the Gaussian variance as an example, let's consider the following three estimators for the variance

[itex]S_{unbiased} = \frac{1}{n-1} \sum_{i=1}^n\left(x_i - \overline{x} \right)^ 2[/itex]
[itex]S_{MLE} = \frac{1}{n} \sum_{i=1}^n\left(x_i - \overline{x} \right)^ 2[/itex]
[itex]S_{LSE} = \frac{1}{n+1} \sum_{i=1}^n\left(x_i - \overline{x} \right)^ 2[/itex]

Well, turns out that among these three the unbiased estimator is the one with the highest [least square] error!
Fascinating - I wasn't aware of this, thanks.
However, in the absence of beliefs regarding cost function, I gather that the gold standard is not the estimator with the least MSE, but the unbiased estimator with least MSE (http://en.wikipedia.org/wiki/Mean_squared_error#Interpretation). OTOH, a cost function which grows faster than the square of the error might well indicate use of one of the biased estimators. Haven't tried to figure that out.
 
  • #18
haruspex said:
Ok, I see. Invariance works for MLE (and for median-unbiased estimators, right?), but not for estimators in general.

Considering the vast majority of estimators in general linear models, time series, etc use MLE estimators, or some sort LE estimators equivalent to MLE is not like we should worry much about it.

But let's say that the OP is using a Bayesian method to estimate λ, could we say that 1/λ is a Bayesian estimator? Maybe not but, could I adopt now a "in your face Bayesian boy" attitude just because the Bayesian estimator happens not to be invariant in a transformation? Well... tempting :biggrin: , but not really, we could call this transform estimator something like Bayesian induced estimator or something like that, and it would be as legitimate as the MLE one; it just would have other properties and that would be all.

It'd be nice to have a table of invariant properties for families of estimators though.

Yes, but let me put it more precisely. In many cases, the cost of an error in one direction is about the same as the cost of the same magnitude error in the other direction. If so, an unbiased estimator is likely to be good (particularly if the cost is as the square of the error)

No, if the cost is an squared of the error then the SLSE is the estimator we would choose to estimate the variance, yet, it turns out that SLSE is the one among the three I presented with the biggest bias! In this cost scenario the unbiased estimator would be the worst choice.


Fascinating - I wasn't aware of this, thanks.
However, in the absence of beliefs regarding cost function, I gather that the gold standard is not the estimator with the least MSE, but the unbiased estimator with least MSE (http://en.wikipedia.org/wiki/Mean_squared_error#Interpretation). OTOH, a cost function which grows faster than the square of the error might well indicate use of one of the biased estimators. Haven't tried to figure that out.

These estimators are called MVUE and they gather much attention in the inference world because they have very nice mathematical properties. When working with biased estimator the bias make the mathematics much harder, but this does not make them better than biased estimators in a broad sense.

So the answer to what estimator is the best is always problem dependent since every bit of information you have about the problem might make you jump from one to another, or even to force you to develop one Taylor-made estimator for the problem in hand.
 
  • #19
viraltux said:
No, if the cost is an squared of the error then the SLSE is the estimator we would choose to estimate the variance, yet, it turns out that SLSE is the one among the three I presented with the biggest bias!
No, here I'm referring to estimates of the mean, not the variance. It seemed to me that the average of the data should minimise the expected value of the square of the error in the estimate of the true mean. But I admit that was just a guess. As far as I can make out now, it is the case amongst unbiased estimates of the mean, but I can find nothing regarding biased estimators that might produce a lower MSE for the mean. All the discussions on minimising MSE appear to centre on estimating the variance.
I tried applying a constant factor to the 1/[itex]\hat{λ}[/itex] estimator for 1/λ in my simple model. As I should have realized, the integral is unbounded, so the MSE estimator amongst that subset is 0. I changed the uniform distributions to be (1,2) and the factor came out as about 1/1.03.
So I think we've arrived on the same page: the least cost estimator depends on a number of details of the specific problem at hand.
 
  • #20
haruspex said:
No, here I'm referring to estimates of the mean, not the variance. It seemed to me that the average of the data should minimise the expected value of the square of the error in the estimate of the true mean. But I admit that was just a guess. As far as I can make out now, it is the case amongst unbiased estimates of the mean, but I can find nothing regarding biased estimators that might produce a lower MSE for the mean. All the discussions on minimising MSE appear to centre on estimating the variance.
I tried applying a constant factor to the 1/[itex]\hat{λ}[/itex] estimator for 1/λ in my simple model. As I should have realized, the integral is unbounded, so the MSE estimator amongst that subset is 0. I changed the uniform distributions to be (1,2) and the factor came out as about 1/1.03.
So I think we've arrived on the same page: the least cost estimator depends on a number of details of the specific problem at hand.

Oh, but it does not matter about mean or variance you see, we are talking about estimators; what that estimator estimates is irrelevant. When we estimate something we always want to estimate the expected value of that something; so if we want to estimate the E(X) we may use the arithmetic mean... mmmm

OK, put it this way, imaging the random variable [itex]Y = (X-E(X))^2[/itex] and now I ask you to estimate the mean of Y, would it change anything that the mean of Y happens to be the variance of X? No. So if you want the LSE estimator for the mean of Y, that is E(Y), you end up with a biased estimator.
 
Last edited:
  • #21
viraltux said:
In this situations we apply what is called in inference the invariance principle which basically states that if you have a function [itex]f[/itex] and you want to estimate [itex]f(θ)[/itex] then you do [itex]\widehat{f(θ)} = f(\hat{θ})[/itex], which in the OP case would be [itex]\hat{T}=1/\hat{λ}[/itex], and, by the way, this principle would hold for whatever distribution θ, λ might have.
Let θ be a uniform distribution in [-1,1], then clearly [itex]\hat{θ}=0[/itex], but [itex]\widehat{θ^2} = \frac{1}{3} \neq 0 = (\hat{θ})^2[/itex]


Using 1/lambda (and similar for lower/upper limit) is a good approximation if the relative error is small. If it is large, it might be useful to transform the confidence limits more careful.
 
  • #22
mfb said:
Let θ be a uniform distribution in [-1,1], then clearly [itex]\hat{θ}=0[/itex], but [itex]\widehat{θ^2} = \frac{1}{3} \neq 0 = (\hat{θ})^2[/itex]

Using 1/lambda (and similar for lower/upper limit) is a good approximation if the relative error is small. If it is large, it might be useful to transform the confidence limits more careful.

The uniform distribution is a rather special case because it doesn't have an MLE or similar estimator: the common estimator that is used for uniform distributions are the MOM (Method of Moments) estimators.

Viraltux's link showed the invariance principle for MLE estimators, not MOM ones.
 
  • #23
The uniform distribution in [-1,1] is just an example. Any symmetric distribution in the real numbers (where the quantities are defined) satisfies the inequality, just with other values than 1/3.
The difference between the square of the expectation value and the squared expectation value is the variance. Therefore, any distribution with a well-defined variance will have a difference between both quantities.
 
  • #24
mfb said:
The uniform distribution in [-1,1] is just an example. Any symmetric distribution in the real numbers (where the quantities are defined) satisfies the inequality, just with other values than 1/3.
The difference between the square of the expectation value and the squared expectation value is the variance. Therefore, any distribution with a well-defined variance will have a difference between both quantities.

So are you saying that the invariance result for MLE estimators is wrong?

Again, be aware that the uniform distribution (any kind with arbitrary parameters) does not have a valid MLE estimator because the derivative is 0 everywhere, and thus you can't use the assumptions for invariance if you are only considering viraltux's result.
 
  • #25
mfb said:
The uniform distribution in [-1,1] is just an example. Any symmetric distribution in the real numbers (where the quantities are defined) satisfies the inequality, just with other values than 1/3.
The difference between the square of the expectation value and the squared expectation value is the variance. Therefore, any distribution with a well-defined variance will have a difference between both quantities.

chiro said:
So are you saying that the invariance result for MLE estimators is wrong?

Again, be aware that the uniform distribution (any kind with arbitrary parameters) does not have a valid MLE estimator because the derivative is 0 everywhere, and thus you can't use the assumptions for invariance if you are only considering viraltux's result.

:smile: I am thinking in the poor OP's soul, but by now it must be condemn. hahaha... :-p

Anyway, one day I would like to be a teacher, so if I cannot convince a group of undoubtedly extremely bright people of this then I'll have to think twice about it and move into professional poker player or something.

More on this in a while... :smile:
 
  • #26
chiro said:
So are you saying that the invariance result for MLE estimators is wrong?
The situation, as I've come to understand it from viraltux, is that if [itex]\hat{\lambda}[/itex] is an MLE for parameter [itex]\lambda[/itex] then (under some fairly unrestrictive conditions on f), [itex]f(\hat{\lambda})[/itex] is an MLE for [itex]f(\lambda)[/itex].
In the OP, the slope, [itex]\hat{\lambda}[/itex], is being obtained by standard linear regression. We don't know anything about the distribution of the errors, but I doubt this constitutes an MLE, so I remain sceptical of the validity of using [itex]1/\hat{\lambda}[/itex] for [itex]1/\lambda[/itex].
The other complication is that the choice of estimator should really be based on minimising a defined cost function.
 
  • #27
haruspex said:
In the OP, the slope, [itex]\hat{\lambda}[/itex], is being obtained by standard linear regression. We don't know anything about the distribution of the errors,.

The OP mentions:
There is a standard formula for calculating confidence limits on the estimate of the slope of a line computed via a regression analysis. This formula gives me the upper and lower confidence limits λ and λ on my estimate of λ.

Is this formula based on knowing the distribution of errors?
 
  • #28
Stephen Tashi said:
The OP mentions:
There is a standard formula for calculating confidence limits on the estimate of the slope of a line computed via a regression analysis. This formula gives me the upper and lower confidence limits λ and λ on my estimate of λ.​

Is this formula based on knowing the distribution of errors?
Good point. Most likely this is based on a presumed Gaussian distribution, in which case the slope will also be Gaussian.
 
  • #29
haruspex said:
The situation, as I've come to understand it from viraltux, is that if [itex]\hat{\lambda}[/itex] is an MLE for parameter [itex]\lambda[/itex] then (under some fairly unrestrictive conditions on f), [itex]f(\hat{\lambda})[/itex] is an MLE for [itex]f(\lambda)[/itex].
In the OP, the slope, [itex]\hat{\lambda}[/itex], is being obtained by standard linear regression. We don't know anything about the distribution of the errors, but I doubt this constitutes an MLE, so I remain sceptical of the validity of using [itex]1/\hat{\lambda}[/itex] for [itex]1/\lambda[/itex].
The other complication is that the choice of estimator should really be based on minimising a defined cost function.

If they are doing a normal linear regression, the residuals should by default be normally distributed with constant variance, and if this is shown to be false, the algorithm should flag the user if this is the case.

This is a key assumption in this regression model, and if it's violated, then the assumptions of the model will be violated as well leading to otherwise erroneous or highly misleading results.

When you say errors, are you talking about the residuals?
 
  • #30
chiro said:
If they are doing a normal linear regression, the residuals should by default be normally distributed with constant variance, and if this is shown to be false, the algorithm should flag the user if this is the case.
Which brings to mind my previous question: Why use linear regression? Why not simply fit an exponential curve to the C(t) data?
This is a key assumption in this regression model, and if it's violated, then the assumptions of the model will be violated as well leading to otherwise erroneous or highly misleading results.

And if the [itex] \frac{dC(t)}{dt} [/itex] data is computed from the [itex] C(t) [/itex] data, this introduces another complication.

When you say errors, are you talking about the residuals?

I can't speak for hauspex, but I usually say "errors" instead of "residuals" since I think of the linear regression as a prediction.

It's interesting how readily curve fitting software spits out confidence intervals for the parameters of the fit - seeing as there is only one value for each parameter! I have a hazy understanding of of this technique. Perhaps someone would care to refine my picture of it.

As I understand it, a parameter [itex] \alpha [/itex] (such as slope) determined by the "best" fit is a (perhaps non-linear) function of the data [itex] \{x_1,x_2, x_3,..x_n \} [/itex] so [itex] \alpha = f(x_1,x_2,x_3,...x_n) [/itex]. Form a linear approximation of [itex] f [/itex] near the value [itex] \alpha [/itex] and write

[itex] \alpha + \triangle \alpha \approx f(x_1,x_2,..x_n) + \frac{\partial f}{\partial x_1} dx_1 + \frac{\partial f}{\partial x_2} dx_2 + ...\frac{\partial x_n}{\partial x_n} dx_n [/itex]

Then we say that the [itex] dx_i [/itex] are independent, normally distributed random variables with a common standard deviation [itex] \sigma [/itex] and this makes the standard deviation of the random quantity [itex] \triangle \alpha [/itex] an approximately linear function of [itex] \sigma [/itex] and we call this the standard deviation of the distribution of the parameter.

That last part sounds too vague. There is probably a more dignified way of accomplishing the mission.
 
  • #31
Stephen Tashi said:
How are the data points measured? Do you have separate instruments to measure [itex] \frac{dC(t)}{dt} [/itex] and [itex] C(t) [/itex]? Or are you computing [itex] \frac{dC(t)}{dt} [/itex] from differences in the [itex] C(t) [/itex] data?

(I'm wondering why you chose to use regression instead of fitting a curve to the C(t) data.)

First, thanks for all the responses to my original question.

I agree completely that, if one had observations of C(t), where C(t) satisfies dC(t)/dt = -λ C(t), starting from C(0), then you'd simply find the exponential that fitted the observed points.

I tried to include sufficient information to ask my question but I tried to avoid including a load of detail that I thought would simply be tedious and irrelevant to readers.

I am trying to understand a very simple model representing the dynamics of atmospheric concentration of CO2 that someone sent me in a paper they had written. I am trying as a first step to reproduce their work. I think that they have greatly over-estimated the accuracy of the estimate for 1/λ, so I want to be doubly careful with this aspect.

I'm still simplifying things here but I hope this won't result in confusion.

Here are my symbols:

t = time after 1960 (yr)
C(t) = excess atmospheric CO2 over the equilbrium level, at time t (Gt).
Fa(t) = rate of emission of anthropogenic CO2 (Gt/yr)
λ = coefficient relating C(t) and the resulting additional absorption rate over the equilibrium absorption rate. (This assumes linearity and only one sink for CO2, hence a 1st order diff eqtn.)

So I have an equation dC(t)/dt = -λ C(t) + Fa(t) .. .. .. .. (1)

As data, I have:

Values of C(0), C(1),... observed values for excess CO2, derived from data published by the Mauna Loa observatory.

Values of Fa(0),Fa(1),... from published estimates of annual CO2 release from fossil fuel burning and land use change.

If I change equation (1) around, I have dC(t)/dt - Fa(t) = -λ C(t). .. .. .. (2)

Using the approximation dC(t)/dt = [C(t) - C(t-1)]/1 and the estimates for Fa(t), I get 50 (rather scattered) points on the left of (2) and I have 50 points on the right. This gives me my scatter plot. A regression analysis gives me an estimate for λ.

From the estimate for λ, I can compute various things of interest, including T = 1/λ, the time to reach equilibrium, in the hypothetical case of anthopogenic CO2 releases ceasing.

I hope this gives the background and explains why I could not just fit a curve to C(t). If there is a better way of doing things, I'd like to know about it.

Thank you again to all for the comments.
 
  • #32
I would use the approximation dC/dt = 1/2 (C(t+1)-C(t-1)), but chances are good that it does not change much (something you can add as cross-check).

[tex]\lambda = \frac{\frac{dC}{dt}-Fa}{C(t)}[/tex]
This gives you a direct estimate for λ, and if you take the inverse value you get a direct estimate for T in each year. If your values do not have individual (and different) errors, the arithmetric average should be the best way to estimate your value.
 
  • #33
haruspex said:
The situation, as I've come to understand it from viraltux, is that if [itex]\hat{\lambda}[/itex] is an MLE for parameter [itex]\lambda[/itex] then (under some fairly unrestrictive conditions on f), [itex]f(\hat{\lambda})[/itex] is an MLE for [itex]f(\lambda)[/itex].
In the OP, the slope, [itex]\hat{\lambda}[/itex], is being obtained by standard linear regression. We don't know anything about the distribution of the errors, but I doubt this constitutes an MLE, so I remain sceptical of the validity of using [itex]1/\hat{\lambda}[/itex] for [itex]1/\lambda[/itex].
The other complication is that the choice of estimator should really be based on minimising a defined cost function.

The MLE is invariant under reparametrizations but it is only asymptotically efficient and is also only asymptotically unbiased.
On the other side, a minimum variance unbiased estimator for finite sample size is usually not invariant under reparameterizations.
 
  • #34
Even when one uses MLE estimates the transformed confidence intervals are not identical to the confidence intervals for 1/ parameter, especially if the CI's are the shortest possible for one parameter they transformed CI's are not the shortest for 1/parameter.
 
  • #35
DrDu said:
Even when one uses MLE estimates the transformed confidence intervals are not identical to the confidence intervals for 1/ parameter, especially if the CI's are the shortest possible for one parameter they transformed CI's are not the shortest for 1/parameter.

I don't know if this thread is helping the original poster Calvadosser, but it is very informative!

In that spirit, I'll ask about the technical definition of a "shortest possible" confidence interval. Suppose we have some algorithm that takes the sample data and computes an interval and the algorithm has the property that there is a 95% chance that the population parameter we are estimating is in that interval. There is no requirement that all the intervals the algorithm produces have the same length. Only If we take particular sample data do we get an interval of a particular length and (from the frequentist point of view) we don't know that there is a 95% chance that the population parameter is in that particular interval. If I am comparing two such algorithms, I can compare the expected lengths of the intervals they produce. As I interpret your statement, it refers to expected lengths of confidence intervals. Is that correct?
 

Similar threads

Back
Top