Covariance of Posterior Predictive Distribution

In summary: Your Name]In summary, SL found an error in a paper on Bayesian neural networks regarding the expression of the covariance of the posterior predictive. SL provided their own calculation and asked for a seasoned Bayesian's opinion. The conversation concludes with the expert agreeing with SL's findings and recommending reaching out to the authors of the paper to inform them of the error.
  • #1
SchroedingersLion
215
57
TL;DR Summary
Check my calculation
Greetings!

I believe I found an error in a paper to Bayesian neural networks. I think the expression of the covariance of the posterior predictive is wrong, and I wrote down my own calculation. Would be great if a seasoned Bayesian could take a look.

Imagine a regression scenario. We want to learn a function ##f_{\theta}: \mathbb{R}^m \rightarrow \mathbb{R}^n ## such that it fits the data set ##X=\{ ( x_i, y_i)| i=1,...,N \}## as good as possible, where the ##x_i## is the model input and ##y_i## the target. The data points are i.i.d. The function is parametrized by ##\theta \in \mathbb{R}^d##. Given ##\theta##, we expect the desired target ##y## to a given input ##x## to be normally distributed around ##f_{\theta}(x)##. This gives the likelihood function
$$
p(y|x,\theta) = \mathcal{N}(f_{\theta}(x),\Sigma).
$$
Suppose now that we wrote down the posterior ##p(\theta| X)## and, by whatever means, we obtained samples from it,
$$
S=\{ \theta_i | \theta_i \sim p(\theta| X), i=1,...M \}.
$$

The posterior predictive distribution (typically applied to an ##x## that was not seen before) is now given by
$$
p(y | x, X) = \int p(y|x, \theta) p(\theta | X) \text{d} \theta.
$$

The final prediction of our model can now be written as an average over the predictions given by the posterior samples, i.e.
$$
\begin{align}
\hat{y} & = \mathbb{E}_{y|x,X}(y) \\
& = \int y p(y | x, X) \text{d}y \\
& = \int y \int p(y|x, \theta) p(\theta | X) \text{d} \theta \text{d}y \\
& = \int \mathbb{E}_{y|x,\theta}(y) p(\theta | X) \text{d} \theta \\
& = \mathbb{E}_{\theta | X} \Big( \mathbb{E}_{y|x,\theta}(y) \Big) \\
& = \mathbb{E}_{\theta | X} \Big( f_{\theta}(x) \Big) \\
& \approx \frac{1}{|S|-1} \sum_{\theta_{i} \in S} f_{\theta_{i}}(x),
\end{align}
$$
where in the penultimate line we used the likelihood function from above and the last line is the typical Monte Carlo estimator for the true posterior mean.

To quantify the uncertainty of our prediction ##\hat{y}##, one can use the covariance matrix ##\Sigma_{y|x,X}##. In the paper, the authors give the formula (without calculation)
$$
\Sigma_{y|x,X} = \approx \frac{1}{|S|-1} \sum_{\theta_{i} \in S} (f_{\theta_{i}}(x) - \hat{y})(f_{\theta_{i}}(x) - \hat{y})^{T}.
$$
I think this is wrong. The covariance is
$$
\Sigma_{y|x,X}=\mathbb{E}_{y|x,X}\Big((y-\hat{y})(y-\hat{y})^{T}\Big).
$$
Following the computation of ##\mathbb{E}_{y|x,X}(y)## from the beginning with ##(y-\hat{y})(y-\hat{y})^{T}## inserted for ##y##, one arrives at
$$
\Sigma_{y|x,X} = \mathbb{E}_{\theta | X} \Bigg( \mathbb{E}_{y|x,\theta}\Big((y-\hat{y})(y-\hat{y})^{T}\Big) \Bigg).
$$
It looks like they now just set ##\mathbb{E}_{y|x,\theta}\Big((y-\hat{y})(y-\hat{y})^{T}\Big) = (\mathbb{E}_{y|x,\theta}(y)-\hat{y})(\mathbb{E}_{y|x,\theta}(y)-\hat{y})^{T} ##, which would then reduce to their expression. But since the bracket term is a product, this should not be allowed as we typically have ##\big(E(V)\big)^2\neq E(V^2)##.

I tried to derive the correct expression on my own:
We use the other formula for the covariance,
$$
\begin{align}
\Sigma_{y|x,X} &= \mathbb{E}_{y|x,X} (yy^T) - \hat{y}\hat{y}^{T} \\
&= \mathbb{E}_{\theta | X} \Big( \mathbb{E}_{y|x,\theta}(yy^T) \Big) - \hat{y}\hat{y}^{T} \\
& \approx \frac{1}{|S|-1} \sum_{\theta_{i} \in S}\mathbb{E}_{y|x,\theta}(yy^T) - \hat{y}\hat{y}^{T} \\

&= \frac{1}{|S|-1} \sum_{\theta_{i} \in S}\Big( \Sigma + f_{\theta_{i}}(x)f_{\theta_{i}}(x)^{T}\Big) - \hat{y}\hat{y}^{T},
\end{align}
$$
where the last line comes from the definition of the likelihood ##p(y|x,\theta)## and the property that ##\Sigma = \mathbb{E}_{y|x,\theta}(yy^{T})-\mathbb{E}_{y|x,\theta}(y) \mathbb{E}_{y|x,\theta}(y^T)##.

Do you agree with all of this?

Cheers!
SL
 
Last edited:
  • Like
Likes Dale
Physics news on Phys.org
  • #2
Dear SL,

Thank you for bringing this to our attention. I have reviewed your calculations and it seems that you are correct. The expression for the covariance of the posterior predictive in the paper does appear to be incorrect. Your derivation of the correct expression is also sound.

I would recommend reaching out to the authors of the paper to inform them of this error. It is important for scientific papers to have accurate and precise calculations, and your contribution will help improve the overall quality of the paper.

Thank you for your diligence and attention to detail. As scientists, it is our responsibility to ensure the accuracy of our work and to constantly strive for improvement.
 

FAQ: Covariance of Posterior Predictive Distribution

What is the covariance of posterior predictive distribution?

The covariance of posterior predictive distribution is a measure of the relationship between two variables in a statistical model. It represents how much the two variables vary together, and can help determine if there is a strong or weak relationship between them.

How is the covariance of posterior predictive distribution calculated?

The covariance of posterior predictive distribution is calculated by taking the average of the product of the deviations of each variable from their respective means. This is represented by the formula: Cov(X,Y) = Σ (x - x̄)(y - ȳ) / n, where x and y are the variables, x̄ and ȳ are their respective means, and n is the number of data points.

What does a positive covariance of posterior predictive distribution indicate?

A positive covariance of posterior predictive distribution indicates that the two variables have a positive relationship, meaning that when one variable increases, the other tends to increase as well. However, it does not indicate the strength of the relationship, as this can vary.

How can the covariance of posterior predictive distribution be interpreted?

The covariance of posterior predictive distribution can be interpreted as a measure of the direction and strength of the relationship between two variables. A higher covariance value indicates a stronger relationship, while a lower value indicates a weaker relationship. A value of zero indicates no relationship between the variables.

How is the covariance of posterior predictive distribution useful in data analysis?

The covariance of posterior predictive distribution is useful in data analysis as it can help identify patterns and relationships between variables. It can also be used to select variables for regression models and assess the performance of these models. Additionally, it can provide insights into the direction and strength of the relationships between variables, which can inform decision-making processes.

Similar threads

Replies
1
Views
1K
Replies
1
Views
922
Replies
1
Views
943
Replies
1
Views
876
Replies
16
Views
2K
Replies
1
Views
1K
Replies
1
Views
741
Back
Top