- #1
BRN
- 108
- 10
Hello,
I am better studying the theory that is the basis of Bayesian optimization with a Gaussian Process and the acquisition function EI.
I would like to expose what I think I understand and ask you to correct me if I'm wrong.
The aim is to find the best ##\theta## parameters for a parametric function ##f(x, \theta)## (objective function) of which the analytical form is not known.
The Bayes theorem is used to apply ##f## to approximate ##f## to the posterior and then the best parameter set are those that maximize the posterior.
Is used an normal function as likelihood and a Process Gaussian as prior:
$$\pi = \frac{1}{(2\pi)^{k/2}|\Sigma|^{1/2}}\exp\left[-\frac{1}{2}(\mathbf{Y}-\mathbf{\mu})^T\mathbf{\Sigma}^{-1}(\mathbf{Y}-\mathbf{\mu})\right] $$
Everything happens in an iterative way, point by point. The points on which the posterior is calculated are given by the acquisition function sampling points in a ##D_t## dataset. Then, the improvement is defined as
$$
I = \left\{ \begin{matrix}
0 & \text{for}\;f>h \\
h_{t+1}(x)-f(x^+) &\text{for}\;f<h
\end{matrix}\right.
$$
where, ##h_{t+1}(x)## is the posterior function evaluated in step ##t+ 1## and ##f(x^+)## is the maximum value that has been reached so far.
and one can determine the expected improvement
$$
\alpha_{\rm EI}(x^*|\mathcal{D}_t) = \mathbb{E}[I(h)] = \int I(h) \pi {\rm d}h
$$
That is the expected improvement depends on the Gaussian process (the prior).
Therefore, at each step, the posterior is calculated at point #x_{max}# defined as
$$x_{max} = {\rm argmax}_x \alpha_{\rm EI}(x|\mathcal{D}_{t-1})$$
I don't know if what I wrote is correct. I'm a little confused ...
If I was wrong, can someone explain myself better? Could you tell me where to find a complete explanation on this topic? On the net I find only sketchy explanations.
Thanks!
I am better studying the theory that is the basis of Bayesian optimization with a Gaussian Process and the acquisition function EI.
I would like to expose what I think I understand and ask you to correct me if I'm wrong.
The aim is to find the best ##\theta## parameters for a parametric function ##f(x, \theta)## (objective function) of which the analytical form is not known.
The Bayes theorem is used to apply ##f## to approximate ##f## to the posterior and then the best parameter set are those that maximize the posterior.
Is used an normal function as likelihood and a Process Gaussian as prior:
$$\pi = \frac{1}{(2\pi)^{k/2}|\Sigma|^{1/2}}\exp\left[-\frac{1}{2}(\mathbf{Y}-\mathbf{\mu})^T\mathbf{\Sigma}^{-1}(\mathbf{Y}-\mathbf{\mu})\right] $$
Everything happens in an iterative way, point by point. The points on which the posterior is calculated are given by the acquisition function sampling points in a ##D_t## dataset. Then, the improvement is defined as
$$
I = \left\{ \begin{matrix}
0 & \text{for}\;f>h \\
h_{t+1}(x)-f(x^+) &\text{for}\;f<h
\end{matrix}\right.
$$
where, ##h_{t+1}(x)## is the posterior function evaluated in step ##t+ 1## and ##f(x^+)## is the maximum value that has been reached so far.
and one can determine the expected improvement
$$
\alpha_{\rm EI}(x^*|\mathcal{D}_t) = \mathbb{E}[I(h)] = \int I(h) \pi {\rm d}h
$$
That is the expected improvement depends on the Gaussian process (the prior).
Therefore, at each step, the posterior is calculated at point #x_{max}# defined as
$$x_{max} = {\rm argmax}_x \alpha_{\rm EI}(x|\mathcal{D}_{t-1})$$
I don't know if what I wrote is correct. I'm a little confused ...
If I was wrong, can someone explain myself better? Could you tell me where to find a complete explanation on this topic? On the net I find only sketchy explanations.
Thanks!
Last edited: