How are hyperparameters determined in Bayesian optimization?

  • #1
BRN
108
10
Hello,
I am better studying the theory that is the basis of Bayesian optimization with a Gaussian Process and the acquisition function EI.
I would like to expose what I think I understand and ask you to correct me if I'm wrong.

The aim is to find the best ##\theta## parameters for a parametric function ##f(x, \theta)## (objective function) of which the analytical form is not known.
The Bayes theorem is used to apply ##f## to approximate ##f## to the posterior and then the best parameter set are those that maximize the posterior.
Is used an normal function as likelihood and a Process Gaussian as prior:

$$\pi = \frac{1}{(2\pi)^{k/2}|\Sigma|^{1/2}}\exp\left[-\frac{1}{2}(\mathbf{Y}-\mathbf{\mu})^T\mathbf{\Sigma}^{-1}(\mathbf{Y}-\mathbf{\mu})\right] $$

Everything happens in an iterative way, point by point. The points on which the posterior is calculated are given by the acquisition function sampling points in a ##D_t## dataset. Then, the improvement is defined as

$$
I = \left\{ \begin{matrix}
0 & \text{for}\;f>h \\
h_{t+1}(x)-f(x^+) &\text{for}\;f<h
\end{matrix}\right.
$$

where, ##h_{t+1}(x)## is the posterior function evaluated in step ##t+ 1## and ##f(x^+)## is the maximum value that has been reached so far.

and one can determine the expected improvement

$$
\alpha_{\rm EI}(x^*|\mathcal{D}_t) = \mathbb{E}[I(h)] = \int I(h) \pi {\rm d}h
$$

That is the expected improvement depends on the Gaussian process (the prior).
Therefore, at each step, the posterior is calculated at point #x_{max}# defined as

$$x_{max} = {\rm argmax}_x \alpha_{\rm EI}(x|\mathcal{D}_{t-1})$$

I don't know if what I wrote is correct. I'm a little confused ...

If I was wrong, can someone explain myself better? Could you tell me where to find a complete explanation on this topic? On the net I find only sketchy explanations.

Thanks!
 
Last edited:
Technology news on Phys.org
  • #2
BRN said:
Hello,
I am better studying the theory that is the basis of Bayesian optimization with a Gaussian Process and the acquisition function EI.
I would like to expose what I think I understand and ask you to correct me if I'm wrong.

The aim is to find the best ##\theta## parameters for a parametric function ##f(x, \theta)## (objective function) of which the analytical form is not known.
The Bayes theorem is used to apply ##f## to approximate ##f## to the posterior and then the best parameter set are those that maximize the posterior.
Is used an normal function as likelihood and a Process Gaussian as prior:

$$\pi = \frac{1}{(2\pi)^{k/2}|\Sigma|^{1/2}}\exp\left[-\frac{1}{2}(\mathbf{Y}-\mathbf{\mu})^T\mathbf{\Sigma}^{-1}(\mathbf{Y}-\mathbf{\mu})\right] $$

Everything happens in an iterative way, point by point. The points on which the posterior is calculated are given by the acquisition function sampling points in a ##D_t## dataset. Then, the improvement is defined as

$$
I = \left\{ \begin{matrix}
0 & \text{for}\;f>h \\
h_{t+1}(x)-f(x^+) &\text{for}\;f<h
\end{matrix}\right.
$$

where, ##h_{t+1}(x)## is the posterior function evaluated in step ##t+ 1## and ##f(x^+)## is the maximum value that has been reached so far.

and one can determine the expected improvement

$$
\alpha_{\rm EI}(x^*|\mathcal{D}_t) = \mathbb{E}[I(h)] = \int I(h) \pi {\rm d}h
$$

That is the expected improvement depends on the Gaussian process (the prior).
Therefore, at each step, the posterior is calculated at point #x_{max}# defined as

$$x_{max} = {\rm argmax}_x \alpha_{\rm EI}(x|\mathcal{D}_{t-1})$$

I don't know if what I wrote is correct. I'm a little confused ...

If I was wrong, can someone explain myself better? Could you tell me where to find a complete explanation on this topic? On the net I find only sketchy explanations.

Thanks!

Maybe this is a good resource?

http://www.gaussianprocess.org/
 
  • Like
Likes BRN
  • #3
Ok, I'm understanding a little more, but I still have some doubts ...

Summarize in a schematic way:
  1. It starts with a random sampling of a dataset ##D_t## among all available data;
  2. with ##D_t## calculates the balck box functiuon for frist time;
  3. On the black box function solution a surrogate model is created: the Bayes theorem is applied $$P(f|D_t, \theta )\propto P(D_t|f, \theta)P(f)$$where: the posterior is the function that approximates the black box function; Likelihood is a normal function; The Prior is a Gaussian process with covariance that depends on data and hyperparameters.
  4. The acquisition function EI, which depends on the posterior, intelligently samples a new sampling, among all those still not used, to be added to the dataset ##D_t##;
  5. The steps 3 and 4 are repeated up to convergence or until a certain number of iterations are completed.
What I don't understand is who search better hyperparameters?
Are they found with the method of maximum likelihood at step 3 or found by acquisition function?
 
Last edited:
Back
Top