## marginal likelihood with large amounts of missing data

Posted in Books, pictures, Statistics with tags , , , , , , , , on October 20, 2020 by xi'an

In 2018, Panayiota Touloupou, research fellow at Warwick, and her co-authors published a paper in Bayesian analysis that somehow escaped my radar, despite standing in my first circle of topics of interest! They construct an importance sampling approach to the approximation of the marginal likelihood, the importance function being approximated from a preliminary MCMC run, and consider the special case when the sampling density (i.e., the likelihood) can be represented as the marginal of a joint density. While this demarginalisation perspective is rather usual, the central point they make is that it is more efficient to estimate the sampling density based on the auxiliary or latent variables than to consider the joint posterior distribution of parameter and latent in the importance sampler. This induces a considerable reduction in dimension and hence explains (in part) why the approach should prove more efficient. Even though the approximation itself is costly, at about 5 seconds per marginal likelihood. But a nice feature of the paper is to include the above graph that includes both computing time and variability for different methods (the blue range corresponding to the marginal importance solution, the red range to RJMCMC and the green range to Chib’s estimate). Note that bridge sampling does not appear on the picture but returns a variability that is similar to the proposed methodology.

## wrapped Normal distribution

Posted in Books, R, Statistics with tags , , , , , on April 14, 2020 by xi'an

One version of the wrapped Normal distribution on (0,1) is expressed as a sum of Normal distributions with means shifted by all relative integers

$\psi(x;\mu,\sigma)=\sum_{k\in\mathbb Z}\varphi(x;\mu+k,\sigma)\mathbb I_{(0,1)}(x)$

which, while a parameterised density, has imho no particular statistical appeal over the use of other series. It was nonetheless the centre of a series of questions on X validated in the past weeks. Curiously used as the basis of a random walk type move over the unit cube along with a uniform component. Simulating from this distribution is easily done when seeing it as an infinite mixture of truncated Normal distributions, since the weights are easily computed

$\sum_{k\in\mathbb Z}\overbrace{[\Phi_\sigma(1-\mu-k)-\Phi_\sigma(-\mu-k)]}^{p_k(\mu,\sigma)}\times$

$\dfrac{\varphi_\sigma(x-\mu-k)\mathbb I_{(0,1)}(y)}{\Phi_\sigma(1-\mu-k)-\Phi_\sigma(-\mu-k)}$

Hence coding simulations as

wrap<-function(x, mu, sig){
ter = trunc(5*sig + 1)
return(sum(dnorm(x + (-ter):ter, mu, sig)))}
siw = function(N=1e4,beta=.5,mu,sig){
unz = (runif(N)<beta)
ter = trunc(5*sig + 1)
qrbz = diff(prbz<-pnorm(-mu + (-ter):ter, sd=sig))
ndx = sample((-ter+1):ter,N,rep=TRUE,pr=qrbz)+ter
z = sig*qnorm(prbz[ndx]+runif(N)*qrbz[ndx])-ndx+mu+ter+1
return(c(runif(sum(unz)),z[!unz]))}


and checking that the harmonic mean estimator was functioning for this density, predictably since it is lower bounded on (0,1). The prolix originator of the question was also wondering at the mean of the wrapped Normal distribution, which I derived as (predictably)

$\mu+\sum_{k\in\mathbb Z} kp_k(x,\mu,\sigma)$

but could not simplify any further except for x=0,½,1, when it is ½. A simulated evaluation of the mean as a function of μ shows a vaguely sinusoidal pattern, also predictably periodic and unsurprisingly antisymmetric, and apparently independent of the scale parameter σ…

## an arithmetic mean identity

Posted in Books, pictures, R, Statistics, Travel, University life with tags , , , , , , , , , , , , on December 19, 2019 by xi'an

A 2017 paper by Ana Pajor published in Bayesian Analysis addresses my favourite problem [of computing the marginal likelihood] and which I discussed on the ‘Og, linking with another paper by Lenk published in 2012 in JCGS. That I already discussed here last year. Lenk’s (2009) paper is actually using a technique related to the harmonic mean correction based on HPD regions Darren Wraith and myself proposed at MaxEnt 2009. And which Jean-Michel and I presented at Frontiers of statistical decision making and Bayesian analysis in 2010. As I had only vague memories about the arithmetic mean version, we discussed the paper together with graduate students in Paris Dauphine.

The arithmetic mean solution, representing the marginal likelihood as the prior average of the likelihood, is a well-known approach used as well as the basis for nested sampling. With the improvement consisting in restricting the simulation to a set Ð with sufficiently high posterior probability. I am quite uneasy about P(Ð|y) estimated by 1 as the shape of the set containing all posterior simulations is completely arbitrary, parameterisation dependent, and very random since based on the extremes of this posterior sample. Plus, the set Ð converges to the entire parameter space with the number of posterior simulations. An alternative that we advocated in our earlier paper is to take Ð as the HPD region or a variational Bayes version . But the central issue with the HPD regions is how to construct these from an MCMC output and how to compute both P(Ð) and P(Ð|y). It does not seem like a good idea to set P(Ð|x) to the intended α level for the HPD coverage. Using a non-parametric version for estimating Ð could be in the end the only reasonable solution.

As a test, I reran the example of a conjugate normal model used in the paper, based on (exact) simulations from both the prior and  the posterior, and obtained approximations that were all close from the true marginal. With Chib’s being exact in that case (of course!), and an arithmetic mean surprisingly close without an importance correction:

> print(c(hame,chme,came,chib))
[1] -107.6821 -106.5968 -115.5950 -115.3610


Both harmonic versions are of the right order but not trustworthy, the truncation to such a set Ð as the one chosen in this paper having little impact.

## 19 dubious ways to compute the marginal likelihood

Posted in Books, Statistics with tags , , , , , , , , , , on December 11, 2018 by xi'an

A recent arXival on nineteen different [and not necessarily dubious!] ways to approximate the marginal likelihood of a given topology of a philogeny tree that reminded me of our San Antonio survey with Jean-Michel Marin. This includes a version of the Laplace approximation called Laplus (!), accounting for the fact that branch lengths on the tree are positive but may have a MAP at zero. Using a Beta, Gamma, or log-Normal distribution instead of a Normal. For importance sampling, the proposals are derived from either the Laplus (!) approximate distributions or from the variational Bayes solution (based on an Normal product). Harmonic means are still used here despite the obvious danger, along with a defensive version that mixes prior and posterior. Naïve Monte Carlo means simulating from the prior, while bridge sampling seems to use samples from prior and posterior distributions. Path and modified path sampling versions are those proposed in 2008 by Nial Friel and Tony Pettitt (QUT). Stepping stone sampling appears like another version of path sampling, also based on a telescopic product of ratios of normalising constants, the generalised version relying on a normalising reference distribution that need be calibrated. CPO and PPD in the above table are two versions based on posterior predictive density estimates.

When running the comparison between so many contenders, the ground truth is selected as the values returned by MrBayes in a massive MCMC experiment amounting to 7.5 billions generations. For five different datasets. The above picture describes mean square errors for the probabilities of split, over ten replicates [when meaningful], the worst case being naïve Monte Carlo, with nested sampling and harmonic mean solutions close by. Similar assessments proceed from a comparison of Kullback-Leibler divergences. With the (predicatble?) note that “the methods do a better job approximating the marginal likelihood of more probable trees than less probable trees”. And massive variability for the poorest methods:

The comparison above does not account for time and since some methods are deterministic (and fast) there is little to do about this. The stepping steps solutions are very costly, while on the middle range bridge sampling outdoes path sampling. The assessment of nested sampling found in the conclusion is that it “would appear to be an unwise choice for estimating the marginal likelihoods of topologies, as it produces poor approximate posteriors” (p.12). Concluding at the Gamma Laplus approximation being the winner across all categories! (There is no ABC solution studied in this paper as the model likelihood can be computed in this setup, contrary to our own setting.)

## calibrating approximate credible sets

Posted in Books, Statistics with tags , , , , , , , on October 26, 2018 by xi'an

Earlier this week, Jeong Eun Lee, Geoff Nicholls, and Robin Ryder arXived a paper on the calibration of approximate Bayesian credible intervals. (Warning: all three authors are good friends of mine!) They start from the core observation that dates back to Monahan and Boos (1992) of exchangeability between θ being generated from the prior and φ being generated from the posterior associated with one observation generated from the prior predictive. (There is no name for this distribution, other than the prior, that is!) A setting amenable to ABC considerations! Actually, Prangle et al. (2014) relies on this property for assessing the ABC error, while pointing out that the test for exchangeability is not fool-proof since it works equally for two generations from the prior.

“The diagnostic tools we have described cannot be “fooled” in quite the same way checks based on the exchangeability can be.”

The paper thus proposes methods for computing the coverage [under the true posterior] of a credible set computed using an approximate posterior. (I had to fire up a few neurons to realise this was the right perspective, rather than the reverse!) A first solution to approximate the exact coverage of the approximate credible set is to use logistic regression, instead of the exact coverage, based on some summary statistics [not necessarily in an ABC framework]. And a simulation outcome that the parameter [simulated from the prior] at the source of the simulated data is within the credible set. Another approach is to use importance sampling when simulating from the pseudo-posterior. However this sounds dangerously close to resorting to an harmonic mean estimate, since the importance weight is the inverse of the approximate likelihood function. Not that anything unseemly transpires from the simulations.