- #1
elpidiovaldez5
- 4
- 0
I am trying to understand how visually salient features could be discovered by unsupervised learning. I do not want to assume that we already have edge detectors or convolutional neural networks, rather I am trying to imagine how these could be discovered by observing the world.
Imagine that a camera captures grey scale images, and that these are thresholded, so that half the pixels are zero (black) and half the pixels are 1 (white). Now look at each pixel position and extract the 9 pixels in the 3x3 grid centred at that position. There are 2^9 (512) possibilities. So if all pixels were independent, the probability of observing a particular pattern would be 1/512. We could easily work out the real world probability by counting occurrences of each pattern in a large number of real world images.
Now consider a single image in which a horizontal line comprising 20 consecutive white pixels is present. We can calculate the probability of this occurring, if the pixels are independent, or using real world data. Either way it seems very unlikely that this would occur by chance, but we have to be quite careful, because ANY specific pattern of 20 pixels is very unlikely. What is special about a LINE of pixels ? (it is analagous to the unintuitive fact that a winning lottery combination of 9,999,999,999 seems very unlikely, but is exactly as likely as 6,936,125,674). It seems to me that although this arrangement is equally unlikely as many other random arrangements, the pixels in a line has lower entropy. The size of the image is also significant because it might be unlikely to find a line of 20 pixels in a 640x480 image, but much less so in a 4096x2048 image.
This is where my poorly remembered maths and physics let's me down. Is there any mathematical technique which would detect the line as a surprising anomaly ? I am thinking of Hypothesis Testing, or some kind of entropy calculation. In entropic considerations it seems like I need to distinguish 'microstates' (the pixel patterns) from 'macrostates' (a CONTIGUOUS sequence of the same state extended over a certain range). So a macrostate would define exactly which pattern of pixels are being predicted.
Am I missing something here ? I'd like to see some mathematical rigour brought to the idea. If it makes sense for lines, the same principles could be brought to bear to discover other visual features.
Imagine that a camera captures grey scale images, and that these are thresholded, so that half the pixels are zero (black) and half the pixels are 1 (white). Now look at each pixel position and extract the 9 pixels in the 3x3 grid centred at that position. There are 2^9 (512) possibilities. So if all pixels were independent, the probability of observing a particular pattern would be 1/512. We could easily work out the real world probability by counting occurrences of each pattern in a large number of real world images.
Now consider a single image in which a horizontal line comprising 20 consecutive white pixels is present. We can calculate the probability of this occurring, if the pixels are independent, or using real world data. Either way it seems very unlikely that this would occur by chance, but we have to be quite careful, because ANY specific pattern of 20 pixels is very unlikely. What is special about a LINE of pixels ? (it is analagous to the unintuitive fact that a winning lottery combination of 9,999,999,999 seems very unlikely, but is exactly as likely as 6,936,125,674). It seems to me that although this arrangement is equally unlikely as many other random arrangements, the pixels in a line has lower entropy. The size of the image is also significant because it might be unlikely to find a line of 20 pixels in a 640x480 image, but much less so in a 4096x2048 image.
This is where my poorly remembered maths and physics let's me down. Is there any mathematical technique which would detect the line as a surprising anomaly ? I am thinking of Hypothesis Testing, or some kind of entropy calculation. In entropic considerations it seems like I need to distinguish 'microstates' (the pixel patterns) from 'macrostates' (a CONTIGUOUS sequence of the same state extended over a certain range). So a macrostate would define exactly which pattern of pixels are being predicted.
Am I missing something here ? I'd like to see some mathematical rigour brought to the idea. If it makes sense for lines, the same principles could be brought to bear to discover other visual features.