Which Estimator Minimizes Expected Square Error for E[g(X,Y)]?

Apteronotus · Aug 28, 2012

Suppose X and Y are r.v.
Suppose also that we get N samples of a r.v. Z which depends on X and Y. That is Z=g(X,Y).

Which is a better estimate of the true value of Z?

[itex]Z=E[g(X,Y)][/itex]
or
[itex]Z=g(E[X],E[Y])[/itex]

chiro · Aug 28, 2012

Apteronotus said:

Suppose X and Y are r.v.
Suppose also that we get N samples of a r.v. Z which depends on X and Y. That is Z=g(X,Y).

Which is a better estimate of the true value of Z?

[itex]Z=E[g(X,Y)][/itex]
or
[itex]Z=g(E[X],E[Y])[/itex]

Hey Apteronotus.

You can't actually estimate Z since Z is a random variable and not a parameter: you need to be careful about using estimation in this context.

Z is a random variable, so if you wanted to get the population mean of Z then you calculate E[Z] = E[g(X,Y)].

Remember that estimation concerns estimating something that is essentially fixed, like mu, sigma, lambda, and so on and we construct distributions in statistical theory to estimate either exactly or approximately a distribution for that particular estimator.

Apteronotus · Aug 30, 2012

Hi Chiro,

Thank you for your reply.

The situation is that my z is in fact fixed. Its value depends on two other variables x and y.
I have a model/function which calculates the true value of z. That is
z=g(x,y)

Now, the problem is that I have added noise to my x & y variables:
X=x+noise
Y=y+noise

Using these noisy inputs, I get a noisy output Z=g(X,Y)
Since the noise has zero mean
E[X]=x, and
E[Y]=y

I was wondering whether g(E[X],E[Y]) or E[g(X,Y)] would bring me closer to the actual value z?

Stephen Tashi · Aug 30, 2012

Apteronotus said:

I was wondering whether g(E[X],E[Y]) or E[g(X,Y)] would bring me closer to the actual value z?

I think that if you manage to state your question precisely, the answer will be E( g(X,Y)), but you haven't defined the meaning of "best" in your original post and to say a random result is "closer" to something has no specific meaning. A random variable has no deterministic "close-ness" to anything unless the "close-ness" is defined in statistical terms and there are different ways of doing that.

Perhaps you want to minimized the expected value of the square of the difference between an estimator of the the mean value of g(x,y) and the actual mean value of g(x,y).

In estimation theory, there are "least squares" estimators, "maximum likelihood" estimators, "minimum variance" estimators, etc. Each is "best" according to a different criteria.

Apteronotus · Aug 30, 2012

Stephen. Thank you for taking the time.

I'm hoping that the meaning of "best" becomes apparent from my second post.

I guess I would define it as follows:
Which quantity is smaller

[itex]
\left\{E[g(X,Y)]-g(x,y)\right\}^2
[/itex]
or

[itex]
\left\{g(E[X],E[Y])-g(x,y)\right\}^2
[/itex]

where
[itex]
X=x+noise \qquad \mbox{and} \qquad Y=y+noise
[/itex]

Apteronotus · Aug 30, 2012

Clearly the second quantity [itex]\left\{g(E[X],E[Y])-g(x,y)\right\}^2=0[/itex], as

[itex]E[X]=E[x+noise]=x \qquad \mbox{and}\qquad E[Y]=E[y+noise]=y[/itex]

Since, the first quantity, [itex]\left\{E[g(X,Y)]-g(x,y)\right\}^2\ge0[/itex] then I guess to answer my question, the first one must be "Better".

chiro · Aug 30, 2012

Have you tried considering estimation schemes that find the value of Z where the probability of Z is maximal (like they do to find point estimates for parameters using Maximum Likelihood Estimation)?

Another technique you can also do is to find the highest probability density for a given probability value that finds the highest simple region of Z that corresponds to the integral of that region being the value of the probability. For example if p = 0.1 then the HPD will correspond to a region where the probability is maximized and the region itself is minimized.

So the above show two approaches: one is a point-estimate approach and the other is an interval/region approach.

Stephen Tashi · Aug 30, 2012

Apteronotus said:

I guess I would define it as follows:
Which quantity is smaller

[itex]
\left\{E[g(X,Y)]-g(x,y)\right\}^2
[/itex]
or

[itex]
\left\{g(E[X],E[Y])-g(x,y)\right\}^2
[/itex]

You can consider answer questions only if you know E(X),E(Y) and E(g(X,Y)). If you already know those quantities, what statistical problem are you trying to solve?

If lower case "x" and "y" denote random variables in those expression, the expressions themselves take-on random values, so you can't claim anything about which one of them is smaller.

In your next post, you seem to say E[X] = x. That would imply x is a constant. So what is "x"? is it a constant or is it a random variable?

A typical scenario is statistics would be that we are trying to estimate E[ g(x,y)] from a sample. We define some function W of the sample data. This function is an "estimator". I think you want to ask which is the "best" estimator for E(g(x,y)). Is it the function defined by W1 = the mean value of g(xi, yi) taken over all data points (xi,yi) ? Or is it the function defined by W2 = g( x_bar, y_bar) where x_bar and y_bar are the mean of the samples x1,x2,.. and y1,y2,... respectively.

One way to define a "best" estimator W (x1,x2,...y1,y2,..) is to say that it minimized the expected square error.

I.e. it minimizes E[ ( w(x1,x2,..,y2,y2...) - E(g(x,y))^2 ]

Note that you have to have two expectations in this expression. If you leave off the one on the left, the expression is a random quantity which varies with the data.

I think the language your are using in your thoughts is failing to distinguish among the following different concepts.

1) The mean of a distribution
2) The mean of sample that is drawn from that distribution
3) An estimator for the mean of the distribution

Similar distinctions hold for other statistical quantities, such as the standard deviation, the variance, the mode etc.

Which Estimator Minimizes Expected Square Error for E[g(X,Y)]?

FAQ: Which Estimator Minimizes Expected Square Error for E[g(X,Y)]?

1. What is the meaning of "G(E[X],E[Y])"?

2. How is "G(E[X],E[Y])" calculated?

3. Can "G(E[X],E[Y])" be negative?

4. What is the significance of "E[g(X,Y)]"?

5. How is "E[g(X,Y)]" different from "G(E[X],E[Y])"?

Similar threads

Hot Threads

Recent Insights