The Blessings of Dimensionality and Hilbert's 6th Problem

Jarvis323 · Apr 26, 2022

I came across an interesting paper and am curious to hear some discussion about the topic among experts.

Blessing of dimensionality: mathematical foundations of the statistical physics of data

Abstract
The concentrations of measure phenomena were discovered as the mathematical background to statistical mechanics at the end of the nineteenth/beginning of the twentieth century and have been explored in mathematics ever since. At the beginning of the twenty-first century, it became clear that the proper utilization of these phenomena in machine learning might transform the curse of dimensionality into the blessing of dimensionality. This paper summarizes recently discovered phenomena of measure concentration which drastically simplify some machine learning problems in high dimension, and allow us to correct legacy artificial intelligence systems. The classical concentration of measure theorems state that i.i.d. random points are concentrated in a thin layer near a surface (a sphere or equators of a sphere, an average or median-level set of energy or another Lipschitz function, etc.). The new stochastic separation theorems describe the thin structure of these thin layers: the random points are not only concentrated in a thin layer but are all linearly separable from the rest of the set, even for exponentially large random sets. The linear functionals for separation of points can be selected in the form of the linear Fisher’s discriminant. All artificial intelligence systems make errors. Non-destructive correction requires separation of the situations (samples) with errors from the samples corresponding to correct behaviour by a simple and robust classifier. The stochastic separation theorems provide us with such classifiers and determine a non-iterative (one-shot) procedure for their construction.
This article is part of the theme issue ‘Hilbert’s sixth problem’.

https://royalsocietypublishing.org/doi/full/10.1098/rsta.2017.0237

https://web.archive.org/web/2019030...e051/8861658c0ae9ba85d2f767499e4ec11b251e.pdf

Hilbert's 6th Problem

6. Mathematical Treatment of the Axioms of Physics. The investigations on the foundations of geometry suggest the problem: To treat in the same manner, by means of axioms, those physical sciences in which already today mathematics plays an important part; in the first rank are the theory of probabilities and mechanics.

https://en.wikipedia.org/wiki/Hilbert's_sixth_problem

I'm also curious about the results in this paper in light of the other.

Observed universality of phase transitions in high-dimensional geometry, with implications for modern data analysis and signal processing
Abstract
We review connections between phase transitions in high-dimensional combinatorial geometry and phase transitions occurring in modern high-dimensional data analysis and signal processing. In data analysis, such transitions arise as abrupt breakdown of linear model selection, robust data fitting or compressed sensing reconstructions, when the complexity of the model or the number of outliers increases beyond a threshold. In combinatorial geometry, these transitions appear as abrupt changes in the properties of face counts of convex polytopes when the dimensions are varied. The thresholds in these very different problems appear in the same critical locations after appropriate calibration of variables. These thresholds are important in each subject area: for linear modelling, they place hard limits on the degree to which the now ubiquitous high-throughput data analysis can be successful; for robustness, they place hard limits on the degree to which standard robust fitting methods can tolerate outliers before breaking down; for compressed sensing, they define the sharp boundary of the undersampling/sparsity trade-off curve in undersampling theorems. Existing derivations of phase transitions in combinatorial geometry assume that the underlying matrices have independent and identically distributed Gaussian elements. In applications, however, it often seems that Gaussianity is not required. We conducted an extensive computational experiment and formal inferential analysis to test the hypothesis that these phase transitions are universal across a range of underlying matrix ensembles. We ran millions of linear programs using random matrices spanning several matrix ensembles and problem sizes; visually, the empirical phase transitions do not depend on the ensemble, and they agree extremely well with the asymptotic theory assuming Gaussianity. Careful statistical analysis reveals discrepancies that can be explained as transient terms, decaying with problem size. The experimental results are thus consistent with an asymptotic large-n universality across matrix ensembles; finite-sample universality can be rejected.

https://royalsocietypublishing.org/doi/10.1098/rsta.2009.0152
https://arxiv.org/abs/0906.2530

Office_Shredder · Apr 26, 2022

I only looked at the first paper. It's confusing. I believe the general claim that random high dimensional sets are linearly separable with high probability. It didn't make a super convincing case that this is an important new result for artificial intelligence. I also don't really know why half the paper talks about Hilbert's sixth problem. I also don't really know why anyone cares about it, my rough understanding is it made more sense before physics became wildly more expansive, and now we still with we had a grand unified theory of everything, but nobody is champing at the bit that the real problem in physics is that we haven't systematized it sufficiently with mathematical axioms.

I tried to look up the first author to see what else they have done, and they're Wikipedia article is full of impressive sounding stuff including proving the universal approximation theorem for neural nets, but no one else in the world seems to agree that they are the person to credit for this.

What about the paper are you interested in? There's a lot of fluffy description of interesting things, and like two pages of math that I suspect is not what drew you to it(or maybe it is given a cursory glance at the second one)

Jarvis323 · Apr 27, 2022

Besides the one paper, the whole issue is interesting to me. I think worth a read.

https://royalsocietypublishing.org/toc/rsta/376/2118

Jarvis323 · Apr 27, 2022

Office_Shredder said:

I only looked at the first paper. It's confusing. I believe the general claim that random high dimensional sets are linearly separable with high probability.

Yes, this is the main claim of the paper, but only for certain classes of high dimensional distributions.

Ramsey theory is a more general topic, which is relevant I think.

https://en.wikipedia.org/wiki/Ramsey_theory

Office_Shredder said:

It didn't make a super convincing case that this is an important new result for artificial intelligence.

There is a mystery, which the paper discusses, about how the human brain is able to generalize so well, incorporating very few examples from many different sources and kinds of information.

The curse of dimensionality is a primary antagonist to learning and modelling that spans many domains, from geometry, to physics, to data analysis and AI. Take density modelling for multivariate distributions as an example. The volume of the space grows exponentially with the number of dimensions. So if the distribution of points remains the same size, but you increase the number of dimensions, it becomes exponentially more sparse (covering less of the space), which leaves huge holes. If the manifold is complicated enough, then you wouldn't be able to reconstruct it from a distribution of data points due to the holes which represent gaps in crucial information. So the big question is, when can we extrapolate to fill those gaps based on the sets of points we have, or how much of the missing information is derivable from the information we do have. And real world systems tend to generate distributions which have some common properties that can lend to addressing these questions in certain cases, when it would otherwise be hopeless.

The paper claims that for some classes of high dimensional distributions, the points live on a thin layer near the median level set of a Lipschitz function. And the separation theorem the paper discusses describes the structure of these layers. The author then claims that this result can explain how human intelligence is so effective at making sense of the world when there are so many variables and so few examples. That, I think, is a speculation of the authors, but an interesting one.

Then the author explains how the separation theorem could make it possible for efficient continuous learning without retraining as well as for an improved capability for knowledge transfer between different models, and proposes a framework for doing that. Since this is not implemented and tested, I'm not sure exactly how this will change things in practice yet. Especially, I'm not clear about the limitations.

Office_Shredder said:

I also don't really know why half the paper talks about Hilbert's sixth problem.

The whole issue is about Hilbert's 6th problem. The reason the paper talks about it so much also, is because Hilbert's 6th problem was originally based on Hilbert's work on axiomatizing geometry, which the author claims lead to, or maybe just was in the same spirit, as the axiomatization of probability theory, and then concentration of measure (which is the basis of the results in the paper) and statistical mechanics, and QM probability, and so forth.

Office_Shredder said:

I also don't really know why anyone cares about it,

The way I read it, the interesting thing is the connection between all of these things, which all might boil down to the same interesting structures that arise from basic mathematical truth given a set of axioms, that, in the context of this paper, also has to do with randomness and correlation structure, and can perhaps tell us something important about intelligence.

Office_Shredder said:

my rough understanding is it made more sense before physics became wildly more expansive, and now we still with we had a grand unified theory of everything, but nobody is champing at the bit that the real problem in physics is that we haven't systematized it sufficiently with mathematical axioms.

Actually, I think a lot of people are chomping at the bit on this, and one reason, I think, is that some people think that if we have a common language of physics between branches, it will help in the effort to unify different branches of physics, or something like that. Some people also just think that some branches are stifled because people are building from the wrong foundations and that new foundations would change that. Those ideas are usually controversial.

Office_Shredder said:

I tried to look up the first author to see what else they have done, and they're Wikipedia article is full of impressive sounding stuff including proving the universal approximation theorem for neural nets, but no one else in the world seems to agree that they are the person to credit for this.

The Journal is the worlds oldest scientific journal, and one of the most prestigious. The author is one of the worlds top researchers in his field. I'm not sure why you feel you need to go after the reputation of the author, or why it matters that the author is someone special or not.

Office_Shredder said:

What about the paper are you interested in? There's a lot of fluffy description of interesting things, and like two pages of math that I suspect is not what drew you to it(or maybe it is given a cursory glance at the second one)

I don't know what your saying here.

The second paper is interesting to me for the same reason as the first, which is that it describes a claimed mathematical truth that has wide real world implications. Mainly my interest is in complexity science. I guess you would probably consider almost all the research in complexity science to be fluffy, and I'll admit it is challenging to find the ground sometimes in complexity science.

Jarvis323 · Apr 27, 2022

Some additional explanation of my interests concerning these papers.

There is a question about neural networks, which is whether they are capable of learning concepts and then applying them to synthesize results which are outside of the realm of their training data (or some researchers say the convex hull), or if they work simply by interpolating in a very high dimensional space to generate results which are just mixtures of the results they've seen. The question is asked about human intelligence as well, and the source of creativity. Does creativity come from nowhere? Is it just interpolation in a high dimensional space, maybe with some random mutation? E.g. maybe when you come up with a unique new piece of art, you've thrown in some inspiration from other art, along with some emotion, along with some ideas about physics, or how your coffee tastes, and the motion of a bird, or the texture of sand, or whatever, and then synthesize something which is an interpolation of all of these seemingly unrelated things? Or maybe you invent something completely new, a jump well outside of the convex hull of the information you've seen which should be unable to be represented as an interpolation of the data points you've seen.

Some researchers now are suggesting that it's all just interpolation, at least for the artificial networks we use today. As neural networks are closed form functions of the inputs, they aren't considered to be algorithmic. They don't really compute their outputs. But I would say that human beings do both. Anyways, this is a topic of interest to me. It has to do with how ideas evolve, and art, and how progress is made in mathematics and physics, and what are the limitations of non-algorithmic models like neural networks.

Now consider computer programs. They are finite. They start with little information, and can fill out huge spaces, and diverge off into infinity. These are algorithms. They don't just interpolate information they've seen, they generate information based on rules and iterative applications of them from initial configurations. Some people have tried to consider the universe as a computational system, where the information out there has been generated by processes, or continuous updating of the configuration based on some set of rules. Those rules, for physics, maybe also with the initial conditions, would be like axioms. These systems can certainly do things which are beyond the limits of non-algorithmic models like neural networks, at least in theory.

Now, even though a program, or the axioms of a universe, might be simple, their outputs can be highly complex and unpredictable. So in the end, we can't understand everything we see in terms of those axioms or rules. Instead, we see things which we have to describe in terms of distributions with properties. We observe different aspects of the results and categorize them, classify them, label them, and try to understand them individually. And we try to relate these objects to one another. But this results to enormously high dimensional sets of information (even though it might be reducible to a much lower dimensional manifold), and basic theory suggests to us that we'd need to look at far more examples than we possibly ever could before we can succeed at modelling it. But that's only if you ignore that the information doesn't come from nowhere, it could come from axioms, or rules, and physics, and this can make the high dimensional distributions we see take on different sorts of structure that we can rely on to simplify things. If we can axiomatize, then we have a starting point from which to work from in trying to figure out what that means. But anyways, we don't really need to do that, we can still just observe commonalities in nature, like the tendency for distributions to be Gaussian, etc, and so even though ideas about axomatization are part of the story, it's not necessarily the whole story, or a necessary condition to advance it.

Now in physics, even if we have the axioms, we can't just simulate the universe, it's too big and complex. So even assuming we had the axioms, unless they obviously told us everything we can expect to see, which they wouldn't, we would need to rely on observation and studying the emergent properties in some other way, like using statistics. But still, there is something which is of high interest, which is the relationships between systems that generate information, and the complexity, or properties, of the information they generate.

In the context of computation, a lot of the time we rely on randomized algorithms, and try to prove things about computability or complexity for certain problems using such algorithms in terms of probability. They often are problems which are related to graphs, or geometry, and thus combinatorics, and the relationship between combinatorics and randomness and geometric structure become the point of interest. And we need to try and prove properties of things as the sizes of the sets become large, etc. And this relates to the general issue of modelling, since separability is a property that can be looked at in this lens, and optimization deals with geometric properties as well.

So in my view, these ideas, as fluffy as they are, are central to the pursuit of knowledge in general. And really the goal, a lot of the time, is to make them less fluffy and narrow things down from observation and recognition of patterns and mental models, to more concrete mathematical statements supported by experiment. This is basically what these two papers claim to offer I think. There actually a somewhat meta relationship concerning the possible necessity of fluffyness, or apparent fluffyness of ideas, before they can become concrete.

The Blessings of Dimensionality and Hilbert's 6th Problem

Abstract

Observed universality of phase transitions in high-dimensional geometry, with implications for modern data analysis and signal processing

Abstract

FAQ: The Blessings of Dimensionality and Hilbert's 6th Problem

What is the concept of "The Blessings of Dimensionality"?

What is Hilbert's 6th Problem?

How are "The Blessings of Dimensionality" and Hilbert's 6th Problem related?

What are some real-world applications of "The Blessings of Dimensionality" and Hilbert's 6th Problem?

Are there any limitations to "The Blessings of Dimensionality" and Hilbert's 6th Problem?

Similar threads

Hot Threads

Recent Insights

The Blessings of Dimensionality and Hilbert's 6th Problem

Abstract​

Observed universality of phase transitions in high-dimensional geometry, with implications for modern data analysis and signal processing​

Abstract​

FAQ: The Blessings of Dimensionality and Hilbert's 6th Problem

What is the concept of "The Blessings of Dimensionality"?

What is Hilbert's 6th Problem?

How are "The Blessings of Dimensionality" and Hilbert's 6th Problem related?

What are some real-world applications of "The Blessings of Dimensionality" and Hilbert's 6th Problem?

Are there any limitations to "The Blessings of Dimensionality" and Hilbert's 6th Problem?

Similar threads

Hot Threads

Recent Insights

Abstract

Observed universality of phase transitions in high-dimensional geometry, with implications for modern data analysis and signal processing

Abstract