- #1
Master1022
- 611
- 117
- TL;DR Summary
- Are there any good resources to understand Mercer's theorem in the context of Bayesian statistics?
Hi,
Question(s):
1. Are there any good resources that explain, at a very simple level, how Mercer's theorem is related to valid covariance functions for gaussian processes? (or would anyone be willing to explain it?)
2. What is the intuition behind this condition for valid covariance functions?
Context:
I was recently taking a course on Bayesian statistics and recently came across Mercer's theorem in the context of answering the question: "What types of functions ##k## can I use as a covariance function of a gaussian process?"
The lecture said:
For any inputs ##x_1, x_2, ..., x_n## (that contain no duplicates), we require that:
[tex] C_n := \left( \left( k(x_i, x_j ) \right) \right)_{I, j = 1, ... , n} [/tex]
is a positive definite matrix, i.e.
[tex] \forall v \in R^n : \langle v, C_n v \rangle > 0 [/tex]
This holds for so-called positive definite kernel functions ##k## that are:
1. Symmetric
2. and for which we have "Mercer's condition":
[tex] \int_{\chi} \int_{\chi} f(x) k(x, x') f(x') dx dx' > 0 \forall f \in L_{2} (\chi) [/tex]
This was all presented quite quickly and unfortunately I don't have a background in real-analysis so am not familiar with topics such as Hilbert spaces, etc. so am trying to gain an understanding as efficiently as possible without learning unnecessary content.
I have already tried the chapter from "C. E. Rasmussen & C. K. I. Williams, Gaussian Processes for Machine Learning, the MIT Press, 2006" but didn't find it massively comprehensible...
Any help would be greatly appreciated
Question(s):
1. Are there any good resources that explain, at a very simple level, how Mercer's theorem is related to valid covariance functions for gaussian processes? (or would anyone be willing to explain it?)
2. What is the intuition behind this condition for valid covariance functions?
Context:
I was recently taking a course on Bayesian statistics and recently came across Mercer's theorem in the context of answering the question: "What types of functions ##k## can I use as a covariance function of a gaussian process?"
The lecture said:
For any inputs ##x_1, x_2, ..., x_n## (that contain no duplicates), we require that:
[tex] C_n := \left( \left( k(x_i, x_j ) \right) \right)_{I, j = 1, ... , n} [/tex]
is a positive definite matrix, i.e.
[tex] \forall v \in R^n : \langle v, C_n v \rangle > 0 [/tex]
This holds for so-called positive definite kernel functions ##k## that are:
1. Symmetric
2. and for which we have "Mercer's condition":
[tex] \int_{\chi} \int_{\chi} f(x) k(x, x') f(x') dx dx' > 0 \forall f \in L_{2} (\chi) [/tex]
This was all presented quite quickly and unfortunately I don't have a background in real-analysis so am not familiar with topics such as Hilbert spaces, etc. so am trying to gain an understanding as efficiently as possible without learning unnecessary content.
I have already tried the chapter from "C. E. Rasmussen & C. K. I. Williams, Gaussian Processes for Machine Learning, the MIT Press, 2006" but didn't find it massively comprehensible...
Any help would be greatly appreciated