- #1
Malamala
- 313
- 27
Hello! I am using a mixture density network (MDN) to make some predictions. My model is very simple with one hidden layer only with 10 nodes (the details of the network shouldn't matter for my question but I can provide more if needed). Also my MDN has only one gaussian component which basically mean that my MDN predicts for each input a mean and standard deviation of a Gaussian from which to sample the output. During the training I am basically minimizing the log-likelihood between the prediction and the expected output:
$$log(\sigma(x_{in})) + \frac{(y_{real}-\mu(x_{in}))^2}{2\sigma(x_{in})^2}$$
where ##\sigma(x_{in})## and ##\mu(x_{in})## are predicted by the network and are functions of the input. The network seems to be training well i.e. the loss goes down and I am attaching below 2 histograms I obtained after training the network and trying it on new data. The first one is a histogram of ##\frac{dy}{\mu(x_{in})}##, where ##dy = y_{real}-\mu(x_{in})##. The second histogram shows ##\frac{dy}{\sigma(x_{in})}##. Based on these it seems like the network is doing pretty well (the data has Gaussian noise added to it). However when I try to compute the mean and the error on the mean for ##dy## I get:
$$\frac{\sum_i{\frac{dy_i}{\sigma_i^2}}}{\sum_i{1/\sigma_i^2}} = -0.000172 $$
and
$$\sqrt{\frac{1}{\sum_i{1/\sigma_i^2}}} = 0.000003$$
where the sum is over all the data points I test the MDN on. This means that my predictions are biased by -0.000172. However, I am not sure why that is the case, as the MDN should easily notice that and add 0.000172 to all the ##\mu## predictions. I tried training several MDN's with lots of different parameters and I get the same result i.e. the result is biased (not always by the same amount or direction). Am I missing something or missinterpreting the results? Shouldn't the mean of my errors be consistent with zero and shouldn't simply adding that bias (0.000172 in this case) solve the issue? Any insight would be really appreciated.
$$log(\sigma(x_{in})) + \frac{(y_{real}-\mu(x_{in}))^2}{2\sigma(x_{in})^2}$$
where ##\sigma(x_{in})## and ##\mu(x_{in})## are predicted by the network and are functions of the input. The network seems to be training well i.e. the loss goes down and I am attaching below 2 histograms I obtained after training the network and trying it on new data. The first one is a histogram of ##\frac{dy}{\mu(x_{in})}##, where ##dy = y_{real}-\mu(x_{in})##. The second histogram shows ##\frac{dy}{\sigma(x_{in})}##. Based on these it seems like the network is doing pretty well (the data has Gaussian noise added to it). However when I try to compute the mean and the error on the mean for ##dy## I get:
$$\frac{\sum_i{\frac{dy_i}{\sigma_i^2}}}{\sum_i{1/\sigma_i^2}} = -0.000172 $$
and
$$\sqrt{\frac{1}{\sum_i{1/\sigma_i^2}}} = 0.000003$$
where the sum is over all the data points I test the MDN on. This means that my predictions are biased by -0.000172. However, I am not sure why that is the case, as the MDN should easily notice that and add 0.000172 to all the ##\mu## predictions. I tried training several MDN's with lots of different parameters and I get the same result i.e. the result is biased (not always by the same amount or direction). Am I missing something or missinterpreting the results? Shouldn't the mean of my errors be consistent with zero and shouldn't simply adding that bias (0.000172 in this case) solve the issue? Any insight would be really appreciated.