Subtracting out a random variable

mXSCNT · Jan 27, 2009

"Subtracting out" a random variable

let X be a discrete R.V. and let Y = f(X) for some function f. I wish to find a function g, such that Y and Z = g(X) are independent, and also such that the uncertainty H(Z) is maximized. For example, suppose X is uniformly distributed over {0,1,2,3,4,5,6,7} and f(x) = 0 if x < 4, f(x) = 1 otherwise. Then if we let g(x) = x mod 4, g satisfies the requirements in this example. One could interpret Z as the distribution X from which the distribution Y has been "subtracted out." Encoding the values of X in binary as 000,001,010,011,100,101,110,111, we see that f extracts the left bit of X, and g extracts the remaining two bits.

However, this is not always possible; for example suppose X is uniformly distributed over {0,1,2} and f(x) = 1 if x == 2, f(x) = 0 otherwise. Then the only functions g such that g(X) is independent of f(X), are functions that map all of X to a single value, which does not capture the idea of "subtracting out" f(X). For one thing, one would like to be able to deduce the value of X by observing the values of f(X) and g(X), and that is not possible here.

As a compromise one could instead seek a function g such that if W is the joint distribution of f(X) and g(X), then H(X|W) = 0, and the mutual information I(f(X);g(X)) is minimized. But in general then, f(X) and g(X) would not be independent.

Any help would be appreciated--especially a pointer to other material that deals with "subtracting out" a random variable in a similar manner to this!

mXSCNT · Jan 28, 2009

In an extension of the problem, allow g(X,W) instead of simply g(X), for some discrete random variable W, independent from X. Then, seek g and W together, such that g(X,W) and f(X) are independent, and such that H(X|g(X,W), f(X)) = 0. If X is nonzero on only a finite number of values, this can always be done.

In the second example from above, we can solve it by letting W be uniformly distributed over {0,1}, and letting

[tex]g(x,w) = [x = 0] * 0 + [x = 1] * 1 + [x = 2] * w[/tex]

where [] is the Iverson bracket: [P] = 1 if the proposition P is true, [P] = 0 if P is false.

From here on denote the level set [tex]\{x : f(x) = y\}[/tex] by h(y). Note that the condition g(X,W) and f(X) are independent is equivalent to saying that for every [tex]y_1,y_2[/tex] in the image of f, and for every z in the image of g(h(Y),W), [tex]P(g(X,W) = z | X \in h(y_1)) = P(g(X,W) = z | X \in h(y_2))[/tex]. Also, the condition H(X|g(X,W),f(X)) = 0 is equivalent to saying for each y in the image of f, for a fixed z, there is exactly one value of X such that [tex]X \in h(y)[/tex] and [tex]P(g(X,W) = z) \neq 0[/tex].

This can be visualized as follows. Imagine that each value x of X corresponds to a rectangle of width 1 and height proportional to P(X=x). Partition the rectangles into groups h(y) for each y. Then we seek to "color" the rectangles in each group such that each color appears within exactly one rectangle for each group (it does not need to cover the entire rectangle) and such that the area of each color within a group, divided by the total area of its group, is constant over all the groups. Each color represents a possible value of g(X,W). Here is what such a diagram would look like for the preceding example:

attachment.php?attachmentid=17342&stc=1&d=1233184615.png

Let's take a slightly more complicated example. Suppose that X is uniformly distributed over {0,1,2,3,4} and f(X) = [X > 2]. (that is, f(X) = 1 if X > 2, f(X) = 0 otherwise). One can solve the problem in this case by letting W be uniformly distributed over {0,1,2,3,4,5} and letting g(x,w) = [x <= 2] * (2*x + w mod 2) + [x > 2] * (3*(x-3) + w mod 3). This is illustrated below.

attachment.php?attachmentid=17343&stc=1&d=1233184615.png

You may be able to see a pattern between these two examples. Now, somewhat more generally, suppose X is uniformly distributed over {1,2,...,n}, and the range of f(X) is {1,2,...,m}. Then let L = [tex]lcm_{y \in {1...m}} |h(y)|[/tex]. Let W be uniformly distributed over {0,1,...,L-1}. Let [tex]r(x) = i[/tex] iff x is the i'th smallest element of h(f(x)) (beginning at i=0). Now, define g(x, w) = r(x) * L / |h(f(x))| + (w mod |h(f(x))|); this solves the problem for the case when X is uniformly distributed.

However, it doesn't work for the following case: let X be distributed over {0,1,2,3} such that [tex]P(X=x) = \frac{1}{\sqrt{2} + 3}([x = 0] * \sqrt{2} + [x > 0])[/tex], and f(x) = [x > 1]. This case can be solved by letting W be distributed over {0,1,2,3}, P(W=w) = [tex]\frac{1}{2(\sqrt{2} + 1)} ([w <= 1] * \sqrt{2} + [w > 1])[/tex], and letting g(x, w) = [x <= 1] * (2x + [w > 1])) + [x > 1] * (x - 2 + 2 * (w mod 2)), as illustrated below.

attachment.php?attachmentid=17344&stc=1&d=1233184615.png

Using the intuition from the preceding example, let's again suppose that X is only nonzero over {1, 2, ..., n}, and that the range of f(X) is {1,2,...m}, but relax the restriction that X must be uniformly distributed. Let W_y be independently distributed over h(y) such that [tex]P(W_y = x) = P(X=x | X \in h(y))[/tex]. Let [tex]W \in \mathbb{Z}^m, W = W_1 \times W_2 \times ... \times W_m[/tex]. Let [tex]W(i,x) \in \mathbb{Z}^m[/tex] be W with its i'th coordinate replaced by x. Then, [tex]g(x, w) = W(f(x),x)[/tex].Another question is, can this be extended to the case when X is nonzero on arbitrarily many values? I suspect that in this case W might have to be continuous.

The technique of using the extra random variable W to satisfy the conditions H(X|g(X,W), f(X)) = 0, I(g(X,W);f(X)) = 0, comes at a cost: the uncertainty in g(X,W) may become relatively large. It would be ideal if H(g(X,W)) + H(f(X)) = H(X), but much of the time this is not nearly the case.

Any commentary? I am especially looking for a place where similar material (about the subtraction of the effect of one random variable from another) has been treated elsewhere.

Subtracting out a random variable

Attachments

FAQ: Subtracting out a random variable

What is the purpose of subtracting out a random variable in scientific research?

How is subtracting out a random variable different from controlling for a variable?

When is it necessary to subtract out a random variable in scientific research?

What are some examples of random variables that may need to be subtracted out in scientific research?

What are the potential limitations of subtracting out a random variable in scientific research?

Similar threads

Hot Threads

Recent Insights