## scalable Langevin exact algorithm [Read Paper]

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , , , , , on June 23, 2020 by xi'an

Murray Pollock, Paul Fearnhead, Adam M. Johansen and Gareth O. Roberts (CoI: all with whom I have strong professional and personal connections!) have a Read Paper discussion happening tomorrow [under relaxed lockdown conditions in the UK, except for the absurd quatorzine on all travelers|, but still in a virtual format] that we discussed together [from our respective homes] at Paris Dauphine. And which I already discussed on this blog when it first came out.

Here are quotes I spotted during this virtual Dauphine discussion but we did not come up with enough material to build a significant discussion, although wondering at the potential for solving the O(n) bottleneck, handling doubly intractable cases like the Ising model. And noticing the nice features of the log target being estimable by unbiased estimators. And of using control variates, for once well-justified in a non-trivial environment.

“However, in practice this simple idea is unlikely to work. We can see this most clearly with the rejection sampler, as the probability of survival will decrease exponentially with t—and thus the rejection probability will often be prohibitively large.”

“This can be viewed as a rejection sampler to simulate from μ(x,t), the distribution of the Brownian motion at time  t conditional on its surviving to time t. Any realization that has been killed is ‘rejected’ and a realization that is not killed is a draw from μ(x,t). It is easy to construct an importance sampling version of this rejection sampler.”

## Monte Carlo fusion

Posted in Statistics with tags , , , , , , , , , on January 18, 2019 by xi'an

Hongsheng Dai, Murray Pollock (University of Warwick), and Gareth Roberts (University of Warwick) just arXived a paper we discussed together last year while I was at Warwick. Where fusion means bringing different parts of the target distribution

f(x)∝f¹(x)f²(x)…

together, once simulation from each part has been done. In the same spirit as in Scott et al. (2016) consensus Monte Carlo. Where for instance the components of the target cannot be computed simultaneously, either because of the size of the dataset, or because of privacy issues.The idea in this paper is to target an augmented density with the above marginal, using for each component of f, an auxiliary variable x¹,x²,…, and a target that is the product of the squared component, f¹(x¹)², f²(x²)², … by a transition density keeping f¹(.)²,f²(.)²,… invariant:

$f^c(x^c)^2 p_c(y|x^c) / f_c(y)$

as for instance the transition density of a Langevin diffusion. The marginal of

$\prod_c f^c(x^c)^2 p_c(y|x^c) / f_c(y)$

as a function of y is then the targeted original product. Simulating from this new extended target can be achieved by rejection sampling. (Any impact of the number of auxiliary variables on the convergence?) The practical implementation actually implies using the path-space rejection sampling methods in the Read Paper of Beskos et al. (2006). (An extreme case of the algorithm is actually an (exact) ABC version where the simulations x¹,x²,… from all components have to be identical and equal to y. The opposite extreme is the consensus Monte Carlo Algorithm, which explains why this algorithm is not an efficient solution.) An alternative is based on an Ornstein-Uhlenbeck bridge. While the paper remains at a theoretical level with toy examples, I heard from the same sources that applications to more realistic problems and implementation on parallel processors is under way.

Posted in Books, Mountains, pictures, Statistics, University life with tags , , , , , on May 26, 2015 by xi'an

[Here are comments made by Matt Graham that I thought would be more readable in a post format. The beautiful picture of the Alps above is his as well. I do not truly understand what Matt’s point is, as I did not cover continuous time processes in my discussion…]

In terms of interpretation of the diffusion with non-reversible drift component, I think this can be generalised from the Gaussian invariant density case by

dx = [ – (∂E/∂x) dt + √2 dw ] + S’ (∂E/∂x) dt

where ∂E/∂x is the usual gradient of the negative log (unnormalised) density / energy and S=-S’ is a skew symmetric matrix. In this form it seems the dynamic can be interpreted as the composition of an energy and volume conserving (non-canonical) Hamiltonian dynamic

dx/dt = S’ ∂E/∂x

and a (non-preconditioned) Langevin diffusion

dx = – (∂E/∂x) dt + √2 dw

As an alternative to discretising the combined dynamic, it might be interesting to compare to sequential alternation between ‘Hamiltonian’ steps either using a simple Euler discretisation

x’ = x + h S’ ∂E/∂x

or a symplectic method like implicit midpoint to maintain reversibility / volume preservation and then a standard MALA step

x’ = x – h (∂E/∂x) + √2 h w, w ~ N(0, I)

plus MH accept. If only one final MH accept step is done this overall dynamic will be reversible, however if a an intermediary MH accept was done after each Hamiltonian step (flipping the sign / transposing S on a rejection to maintain reversibility), the composed dynamic would in general be non-longer reversible and it would be interesting to compare performance in this case to that using a non-reversible MH acceptance on the combined dynamic (this alternative sidestepping the issues with finding an appropriate scale ε to maintain the non-negativity condition on the sum of the vorticity density and joint density on a proposed and current state).

With regards to your point on the positivity of g(x,y)+π(y)q(y,x), I’m not sure if I have fully understood your notation correctly or not, but I think you may have misread the definition of g(x,y) for the discretised Ornstein-Uhlenbeck case (apologies if instead the misinterpretation is mine!). The vorticity density is defined as the skew symmetric component of the density f of F(dx, dy) = µ(dx) Q(x, dy) with respect to the Lebesgue measure, where µ(dx) is the true invariant distribution of the Euler-Maruyama discretised diffusion based proposal density Q(x, dy) rather than g(x, y) being defined in terms of the skew-symmetric component of π(dx) Q(x, dy) which in general would lead to a vorticity density which does not meet the zero integral requirement as the target density π is not invariant in general with respect to Q.

## ABC with indirect summary statistics

Posted in Statistics, University life with tags , , , , , , , on February 3, 2014 by xi'an

After reading Drovandi’s and Pettitt’s Bayesian Indirect Inference, I checked (in the plane to Birmingham) the earlier Gleim’s and Pigorsch’s Approximate Bayesian Computation with indirect summary statistics. The setting is indeed quite similar to the above, with a description of three ways of connecting indirect inference with ABC, albeit with a different range of illustrations. This preprint states most clearly its assumption that the generating model is a particular case of the auxiliary model, which sounds anticlimactic since the auxiliary model is precisely used because the original one is mostly out of reach! This certainly was the original motivation for using indirect inference.

The part of the paper that I find the most intriguing is the argument that the indirect approach leads to sufficient summary statistics, in the sense that they “are sufficient for the parameters of the auxiliary model and (…) sufficiency carries over to the model of interest” (p.31). Looking at the details in the Appendix, I found that the argument is lacking, because the likelihood as a functional is shown to be a (sufficient) statistic, which seems both a tautology and irrelevant because this is different from the likelihood considered at the (auxiliary) MLE, which is the summary statistic used in fine.

“…we expand the square root of an innovation density h in a Hermite expansion and truncate the in finite polynomial at some integer K which, together with other tuning parameters of the SNP density, has to be determined through a model selection criterion (such as BIC). Now we take the leading term of the Hermite expansion to follow a Gaussian GARCH model.”

As in Drovandi and Pettitt, the performances of the ABC-I schemes are tested on a toy example, which is a very basic exponential iid sample with a conjugate prior. With a gamma model as auxiliary. The authors use a standard ABC based on the first two moments as their benchmark, however they do not calibrate those moments in the distance and end up with poor performances of ABC (in a setting where there is a sufficient statistic!). The best choice in this experiment appears as the solution based on the score, but the variances of the distances are not included in the comparison tables. The second implementation considered in the paper is a rather daunting continuous-time non-Gaussian Ornstein-Uhlenbeck stochastic volatility model à la Barndorf -Nielsen and Shephard (2001). The construction of the semi-nonparametric (why not semi-parametric?) auxiliary model is quite involved as well, as illustrated by the quote above. The approach provides an answer, with posterior ABC-IS distributions on all parameters of the original model, which kindles the question of the validation of this answer in terms of the original posterior. Handling simultaneously several approximation processes would help in this regard.