**A**s I was still musing about the posts of last week around infinite variance importance sampling and its potential corrections, I wondered at whether or not there was a fundamental difference between “just” having a [finite] variance and “just” having none. In conjunction with Aki’s post. To get a better feeling, I ran a quick experiment with Exp(1) as the target and Exp(a) as the importance distribution. When estimating **E[X]**=1, the above graph opposes a=1.95 to a=2.05 (variance versus no variance, bright yellow versus wheat), a=2.95 to a=3.05 (third moment versus none, bright yellow versus wheat), and a=3.95 to a=4.05 (fourth moment versus none, bright yellow versus wheat). The graph below is the same for the estimation of **E[**exp(**X**/2)**]**=2, which has an integrand that is not square integrable under the target. Hence seems to require higher moments for the importance weight. Hard to derive universal theories from those two graphs, however they show that protection against sudden drifts in the estimation sequence. As an aside [not really!], apart from our rather confidential Confidence bands for Brownian motion and applications to Monte Carlo simulation with Wilfrid Kendall and Jean-Michel Marin, I do not know of many studies that consider the sequence of averages time-wise rather than across realisations at a given time and still think this is a more relevant perspective for simulation purposes.

## Archive for Monte Carlo experiment

## borderline infinite variance in importance sampling

Posted in Books, Kids, Statistics with tags continuity, importance sampling, infinite variance estimators, moments, Monte Carlo experiment, Monte Carlo Statistical Methods on November 23, 2015 by xi'an## self-healing umbrella sampling

Posted in Kids, pictures, Statistics, University life with tags acceleration of MCMC algorithms, adaptive MCMC methods, Monte Carlo experiment, multimodality, Tintin, umbrella sampling, Wang-Landau algorithm, well-tempered algorithm on November 5, 2014 by xi'an**T**en days ago, Gersende Fort, Benjamin Jourdain, Tony Lelièvre, and Gabriel Stoltz arXived a study about an adaptive umbrella sampler that can be re-interpreted as a Wang-Landau algorithm, if not the most efficient version of the latter. This reminded me very much of the workshop we had all together in Edinburgh last June. And even more of the focus of the molecular dynamics talks in this same ICMS workshop about accelerating the MCMC exploration of multimodal targets. The self-healing aspect of the sampler is to adapt to the multimodal structure thanks to a partition that defines a biased sampling scheme spending time in each set of the partition in a frequency proportional to weights. While the optimal weights are the weights of the sets against the target distribution (are they truly optimal?! I would have thought lifting low density regions, i.e., marshes, could improve the mixing of the chain for a given proposal), those are unknown and they need to be estimated by an adaptive scheme that makes staying in a given set the less desirable the more one has visited it. By increasing the inverse weight of a given set by a factor each time it is visited. Which sounds indeed like Wang-Landau. The plus side of the self-healing umbrella sampler is that it only depends on a scale γ (and on the partition). Besides converging to the right weights of course. The downside is that it does not reach the most efficient convergence, since the adaptivity weight decreases in 1/n rather than 1/√n.

Note that the paper contains a massive experimental side where the authors checked the impact of various parameters by Monte Carlo studies of estimators involving more than a billion iterations. Apparently repeated a large number of times.

The next step in adaptivity should be about the adaptive determination of the partition, hoping for a robustness against the dimension of the space. Which may be unreachable if I judge by the apparent deceleration of the method when the number of terms in the partition increases.

## improved approximate-Bayesian model-choice method for estimating shared evolutionary history

Posted in Books, Statistics, University life with tags ABC, ABC model choice, Bayesian calibration, conjugate priors, Dirichlet process, Kingman's coalescent, Monte Carlo experiment, philogenetic trees, philogeny, prior calibration, refereeing, SMC on May 14, 2014 by xi'an

“An appealing approach would be a comparative, Bayesian model-choice method for inferring theprobability of competing divergence histories while integrating over uncertainty in mutational and ancestral processes via models of nucleotide substitution and lineage coalescence.” (p.2)

**J**amies Oaks arXived (a few months ago now) a rather extensive Monte-Carlo study on the impact of prior modelling on the model-choice performances of ABC model choice. (Of which I only became aware recently.) As in the earlier paper I commented on the Óg, the issue here has much more to do with prior assessment and calibration than with ABC implementation *per se*. For instance, the above quote recaps the whole point of conducting Bayesian model choice. (As missed by Templeton.)

“This causes divergence models with more divergence-time parameters to integrate over a much greater parameter space with low likelihood yet high prior density, resulting in small marginal likelihoods relative to models with fewer divergence-time parameters.” (p.2)

**T**his second quote is essentially stressing the point with Occam’s razor argument. Which I deem [to be] a rather positive feature of Bayesian model choice. A reflection on the determination of the prior distribution, getting away from uniform priors, thus sounds most timely! The current paper takes place within a rather extensive exchange between Oak’s group and Hickerson’s group on what makes Bayesian model choice (and the associated software msBayes) pick or not the correct model. Oak and coauthors objected to the use of “narrow, empirically informed uniform priors”, arguing that this leads to a bias towards models with less parameters, a “statistical issue” in their words, while Hickerson et al. (2014) think this is due to msBayes way of selecting models and their parameters at random. However it refrains from reproducing earlier criticisms of or replies to Hickerson et al.

**T**he current paper claims to have reached a satisfactory prior modelling with ¨improved robustness, accuracy, and power” (p.3). If I understand correctly, the changes are in replacing a uniform distribution with a Gamma or a Dirichlet prior. Which means introducing a seriously large and potentially crippling number of hyperparameters into the picture. Having a lot of flexibility in the prior also means a lot of variability in the resulting inference… In other words, with more flexibility comes more responsibility, to paraphrase Voltaire.

“I have introduced a new approximate-Bayesian model choice method.” (p.21)

**T**he ABC part is rather standard, except for the strange feature that the divergence times are used to construct summary statistics (p.10). Strange because these times are not observed for the actual data. So I must be missing something. (And I object to the above quote and to the title of the paper since there is no new ABC technique there, simply a different form of prior.)

“ABC methods in general are known to be biased for model choice.” (p.21)

**I** do not understand much the part about (reshuffling) introducing bias as detailed on p.11: every approximate method gives a “biased” answer in the sense this answer is *not* the true and proper posterior distribution. Using a different (re-ordered) vector of statistics provides a different ABC outcome, hence a different approximate posterior, for which it seems truly impossible to check whether or not it increases the discrepancy from the true posterior, compared with the other version. I must admit I always find annoying to see the word *bias* used in a vague meaning and esp. within a Bayesian setting. All Bayesian methods are biased. End of the story. Quoting our PNAS paper as concluding that ABC model choice is biased is equally misleading: the intended warning represented by the paper was that Bayes factors and posterior probabilities could be quite unrelated with those based on the whole dataset. That the proper choice of summary statistics leads to a consistent model choice shows ABC model choice is not necessarily “biased”… Furthermore, I also fail to understand why the posterior probability of model i should be distributed as a uniform (“If the method is unbiased, the points should fall near the identity line”) when the data is from model i: this is not a p-value but a posterior probability and the posterior probability is not the frequentist coverage…

**M**y overall problem is that, all in all, this is a single if elaborate Monte Carlo study and, as such, it does not carry enough weight to validate an approach that remains highly subjective in the selection of its hyperparameters. Without raising any doubt about an hypothetical “fixing” of those hyperparameters, I think this remains a controlled experiment with simulated data where the true parameters are know and the prior is “true”. This obviously helps in getting better performances.

“With improving numerical methods (…), advances in Monte Carlo techniques and increasing efficiency of likelihood calculations, analyzing rich comparative phylo-geographical models in a full-likelihood Bayesian framework is becoming computationally feasible.” (p.21)

**T**his conclusion of the paper sounds over-optimistic and rather premature. I do not know of any significant advance in computing the observed likelihood for the population genetics models ABC is currently handling. (The SMC algorithm of Bouchard-Côté, Sankaraman and Jordan, 2012, does not apply to Kingman’s coalescent, as far as I can tell.) This is certainly a goal worth pursuing and borrowing strength from multiple techniques cannot hurt, but it remains so far a lofty goal, still beyond our reach… I thus think the major message of the paper is to reinforce our own and earlier calls for caution when interpreting the output of an ABC model choice (p.20), or even of a regular Bayesian analysis, agreeing that we should aim at seeing “a large amount of posterior uncertainty” rather than posterior probability values close to 0 and 1.