Optimizing Machine Learning with Irreducible Factorizations of Random Variables

In summary: Yn such that X <= Y, and the Yi are mutually independent.I think the definition of "hidden variables" is what we really want to work with. For example, if X is uniformly distributed over {0,1}, then an independent factorization is (B1 x B2) where the Bi are independent, but a hidden variables independent factorization is (B1 x B2 x B3) where B3 is constant.In summary, factorizing random variables involves finding equivalent random variables that can be expressed as a product of mutually independent factors. These factors can be irreducible or prime, and the definition of primes can be tricky depending on how you look at the situation. However, not all irreducible random
  • #1
mXSCNT
315
1
"Factorizing" random variables

Suppose we have a (discrete) random variable X. Consider a random variable Y "equivalent" to X if there are functions f, g such that X = f(Y) and Y = g(X). Among other things, this implies H(Y) = H(X).

Y = Y1 x Y2 x ... x Yn, where x is the cartesian product, is an "independent factorization" of X if Y is equivalent to X and the Yi are mutually independent random variables (the factors).

So for example, if X is uniformly distributed over {0,1,2,3}, then a factorization is Y = B1 x B2 where the Bi are uniformly distributed over {0,1} and mutually independent. f(x) = (x mod 2, floor(x / 2)), and g((b1,b2)) = b1 + 2*b2.

In this context, a random variable Y is "one" if H(Y) = 0, that is if Y takes one value with probability 1. A random variable Y is irreducible if it has no factors other than one and itself.

A factorization of a random variable is irreducible if all factors in the factorization are irreducible, and are not one.

The problem is then, given a discrete random variable, what are its factorizations? Are irreducible factorizations of discrete random variables unique up to order of the factors and equivalence (as defined above) of the factors? Are irreducible elements also prime?
 
Last edited:
Physics news on Phys.org
  • #2


Note you can define a "divides" relation; A divides B (written A | B) if there is a factorization B ~ AxC.

The usual approach in this sort of situation is to define two kinds of elements: irreducible elements, and prime elements.

C is irreducible if and only if A | C implies A ~ C or A ~ 1.
C is prime if and only if AxB | C implies A | C or B | C.

Typically, one can easily show that irreducible factorizations exist*, and that prime factorizations are unique. If you can show that irreducible objects are actually prime, then happiness ensues.


*: This may require some sort of finiteness condition
 
  • #3


Well, it's clear that irreducible factorizations exist. Let |W| be the cardinality of the domain of a random variable W. We know
[tex]
|X| = |Y| = \prod_{i=1}^n |Y_i|
[/tex]
So you can't keep factorizing the factors forever; |Yi| has to be a whole number and decreases with each factorization.

The definition of primes is A | BC --> A | B or A | C. I think that not all irreducible R.V.'s are prime, because of the problem that if we multiply (by cartesian product) two arbitrary discrete random variables, they might not be independent. Having a hard time thinking of a counterexample though at the moment, I need some paper.
 
  • #4


The simplest case for factorization is where the random variable is uniformly distributed. In that case it factorizes like the integers, into factors whose domains have prime cardinality. If the random variable is not uniformly distributed, this isn't necessarily true; for example suppose X is distributed over {0,1,2,3} where P(X=x) = 1/5 if x <= 2, and P(X=3) = 2/5. Then X is irreducible despite the fact that |X| = 4.

The simplest case for multiplication (cartesian product) is where both random variables are independent. If they aren't then you need their joint distribution to compute the product.
 
  • #5


The definition of primes is A | BC --> A | B or A | C.
Blah, you're right. I always find that definition annoying, because different ways of looking at the situations often reverse the direction of the relation. :frown:
 
  • #6


The simplest case for multiplication (cartesian product) is where both random variables are independent. If they aren't then you need their joint distribution to compute the product.
But then, it isn't an independent factorization -- I think maybe you want to restrict attention to independent products, for this problem? (Or, at least, consider that first)
 
  • #7


Okay, I have a counterexample.
Let X be distributed over {0,1}, let Y be distributed over {0,1,2,3}. The joint distribution of X and Y is
Code:
   |Y 0   1   2   3
-----------------------------
X 0|1/8 1/8 1/4 0
  1|1/8 1/8 0   1/4
Let f((x,y)) = y when y <= 1, and let f((x,y)) = 2 when y >= 2. Let g((x,y)) = x. Then let A = f(X x Y) and B = g(X x Y). Then X x Y can be factored as A x B.

|A| = 3 so A is irreducible, but since |X| and |Y| are not divisible by 3, A cannot be a factor of X or Y. Thus, there are R.V.'s that are irreducible but not prime.

Hurkyl said:
But then, it isn't an independent factorization -- I think maybe you want to restrict attention to independent products, for this problem? (Or, at least, consider that first)
In an independent factorization of X, the factors are only independent from each other--they are not independent from X. Anyway X isn't independent from itself, so it's unavoidable - in fact, for all X, X x X ~ X.
 
Last edited:
  • #8


In an application (i.e. machine learning, or perhaps statistical analysis), suppose we are interested in splitting up a random variable into independent parts that we can then analyze separately. In an empirically measured data set, I think it's probably unlikely that a random variable will exactly factorize into independent random variables; I would expect most empirical random variables to be irreducible, since slight departures from the uniform distribution can spoil a factorization. So one might consider a "near" factorization of a random variable, as follows.

Before, I had X ~ Y if there are functions g, f such that g(X) = Y and g(Y) = X. Now let's introduce a "less than or equal" relationship: X <= Y if there is a function f such that f(Y) = X. This has the desired property that X <= Y and Y <= X implies X ~ Y.

A "hidden variables" independent factorization of a random variable X is then a product Y = Y1 x Y2 x ... x Yn where the Yi are mutually independent, and X <= Y. The quality of the factorization can be measured by H(Y), which we'd like to minimize. In the best case, H(Y) = H(X) and the near factorization is an exact factorization. I say "hidden variables" because the excess uncertainty in Y means Yi is not only a function of X (as it is in an exact factorization); Yi = fi(X,Wi) where fi is some function and Wi is independent from X. The Wi are the hidden variables.

A "nearly independent" factorization of a random variable X is again a product Y = Y1 x Y2 x ... x Yn where X ~ Y and the Yi are not necessarily mutually independent. The quality of a nearly independent factorization should be measured by how close to being independent the Yi are. One way to measure this is by total correlation; let the quantity to minimize be [tex]C(Y) = \sum_{i=1}^n H(Y_i) - H(Y)[/tex].
 
  • #9


I still don't know if exact irreducible independent factorizations are unique.

Per your suggestion, one could extend the concept of a prime to an "independent prime." Say that A i| B (where i| means independently divides) if there is a random variable C, independent from A, where AC ~ B. A random variable A is an independent prime if for every X and Y, where X and Y are independent, A i| XY implies that A i| X or A i| Y. Perhaps if a random variable is independently irreducible, it is also an independent prime.
 
  • #10


I have a counterexample to unique factorization.

Choose two positive real numbers satisfying

[tex]\frac{a^6 - b^6}{a - b} = a^4[/tex]

(This ensures the total probabilities of the following variables sums to 1)
(e.g. a = 0.4 and b ~ 0.249400)
(one exact solution is a = 1/63 and b = 2/63)


Let S and T be independent random variables satisfying:

[tex]P(S = 0) = \frac{a^3}{a^3 + b^3}[/tex]

[tex]P(S = 1) = \frac{b^3}{a^3 + b^3}[/tex]

[tex]P(T = 0) = (a^3 + b^3)\frac{1}{a^2}[/tex]

[tex]P(T = 1) = (a^3 + b^3)\frac{b}{a^3}[/tex]

[tex]P(T = 2) = (a^3 + b^3)\frac{b^2}{a^4}[/tex]

The frequencies of the 6 outcomes of SxT are, in no particular order,

[tex]a, b, \frac{b^2}{a}, \frac{b^3}{a^2}, \frac{b^4}{a^3}, \frac{b^5}{a^4}[/tex]

Now, consider independent random variables U and V given by

[tex]P(U = 0) = \frac{a}{a + b}[/tex]

[tex]P(U = 1) = \frac{b}{a + b}[/tex]

[tex]P(V = 0) = (a + b)[/tex]

[tex]P(V = 1) = (a + b) \frac{b^2}{a^2}[/tex]

[tex]P(V = 2) = (a + b) \frac{b^4}{a^4}[/tex]

The 6 frequences of the outcomes of UxV are, in no particular order

[tex]a, b, \frac{b^2}{a}, \frac{b^3}{a^2}, \frac{b^4}{a^3}, \frac{b^5}{a^4}[/tex]



Both frequency lists are the same; so we should be able to arrange things so that SxT and UxV are equivalent random variables. However, the factorizations are distinct, because S and U are not equivalent. (And S is clearly not equivalent to V)
 
Last edited:
  • #11


Nicely done. The usefulness of factorizing random variables, in my estimation, lies in machine learning. Suppose that we have a random variable X which is sampled according to some machine learning algorithm, such that the long-term probability of visiting a given state [tex]x \in X[/tex] increases monotonically with a fitness function f(x). This is the case, for example, in a Boltzmann machine, where the fitness function is inversely related to the energy of a state of the machine. It is likely also roughly true for other hill climbing algorithms, such as genetic algorithms or gradient ascent. We seek to maximize f, which we can do by finding the maximum likelihood value of X.

Now, if we can decompose X = (Y1,Y2) where Y1 and Y2 are independent of each other, we have effectively broken the problem down into two easier (smaller) subproblems. If we can find maximum likelihood values [tex]y_1 \in Y_1, y_2 \in Y_2[/tex], then the maximum likelihood value of X, and hence the maximum of f, occurs at (y1, y2).

In real life conditions, probably an exact factorization is impossible in most conditions, but an approximate factorization should have much the same effect of splitting up the problem into subproblems that can be optimized individually to get "in the ball park," then recombined and fine-tuned as a whole.

So what I really want is an algorithm to find approximate independent factorizations of random variables whose form is unknown, assuming only that we can sample from them.

Concrete example: suppose we want to use a genetic algorithm to develop a set of rules for steering a toy car to avoid obstacles. Thus, we have a fitness function (ability to avoid obstacles), a random variable over a set of states (the set of rules), and a means of sampling the random variable such that the probability of a state increases with the fitness function for that state (the genetic algorithm). We represent a given set of rules as a string of n bits. The string of bits can be viewed as a factorization of the set of rules into a cartesian product of n random variables, each of 1 bit. This is not an independent factorization. We seek to find a way to represent the rules as a string of bits such that different regions in the string are nearly independent, which means that changes in one part of the string do not trigger other changes in other parts of the string. If we can achieve this, learning will become much faster.
 

FAQ: Optimizing Machine Learning with Irreducible Factorizations of Random Variables

What does it mean to factorize a random variable?

Factorizing a random variable means expressing it as a product of simpler random variables or functions. This process is used to simplify complex probability distributions and make them easier to analyze.

Why is factorizing random variables important in statistics?

Factorizing random variables is important because it allows us to break down a complex distribution into simpler parts, making it easier to understand and analyze. It also helps in simplifying calculations and making predictions about the random variable.

What are the different methods used for factorizing random variables?

There are several methods used for factorizing random variables, such as the sum-product algorithm, factor graphs, and Gaussian elimination. These methods vary in their approach and complexity and are used based on the type of random variable and the problem at hand.

Can all random variables be factorized?

No, not all random variables can be factorized. Some distributions are too complex and cannot be expressed as a product of simpler random variables. However, many commonly used distributions, such as the normal distribution, can be factorized.

How does factorizing random variables relate to Bayesian networks?

Bayesian networks use factorization to represent joint probability distributions as a product of simpler conditional probability distributions. This allows for efficient computation and inference in probabilistic models. Factorizing random variables is a key concept in understanding and constructing Bayesian networks.

Similar threads

Replies
30
Views
3K
Replies
5
Views
2K
Replies
1
Views
2K
Replies
8
Views
2K
Replies
12
Views
1K
Replies
8
Views
934
Replies
4
Views
1K
Back
Top