Computational methods in Bayesian statistics
This paper by Tua and Adami was first posted on arXiv last Wednesday, with a corrected version posted today. Despite its very generic title, its focus is quite restricted since it compares variational Bayes with nested sampling on two examples. The description of both methods is fairly standard, even though I find the part on the variational Bayes approximation slightly confusing with a graph presenting “the Evidence” (should be the log-evidence) as the sum of a Kullback-Leibler divergence and a bound, while the log-evidence may be a negative number—and thus cancel tbe appeal of the decomposition… The paper concludes at the higher speed efficiency of the variational Bayes approximation, which is not a major step forward when considering that this is the very reason for using this approximation! The authors use an “Occam factor” without providing a definition, although it sounds like the difference
where
is the mle,
and it could be computed for both methods despite the authors’ claim (if I understand correctly what “the Likelihood” is). The sentence “when calculating the Evidence the higher Likelihood values are multiplied by smaller weights resulting in a lower Evidence value over all” shows a poor understanding of the nested sampling method, since using a large enough number of particles leads to a proper approxiation of the evidence, as shown for instance in our paper with Nicolas Chopin. Maybe paradoxically, it is interesting to see via this paper how far (numerically) the lower bound provided by the variational Bayes approximation is from the evidence approximated by nested sampling, even though they appear to peak at the same value for the mixture problem in the specific experiment run by the authors.
January 8, 2011 at 12:13 am
[…] is entered in an harmonic mean representation we previously exploited in our HPD proposal for evidence […]