Archive for thinning

generalised Poisson difference autoregressive processes

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , on February 14, 2020 by xi'an

Yesterday, Giulia Carallo arXived the paper on generalised Poisson difference autoregressive processes that is a component of her Ph.D. thesis at Ca’ Foscari Universita di Venezia and to which I contributed while visiting Venezia last Spring. The stochastic process under study is integer valued as a difference of two generalised Poisson variates, made dependent by an INGARCH process that expresses the mean as a regression over past values of the process and past means. Which can be easily simulated as a difference of (correlated) Poisson variates. These two variates can in their turn be (re)defined through a thinning operator that I find most compelling, namely as a sum of Poisson variates with a number of terms being a (quasi-) Binomial variate depending on the previous value. This representation proves useful in establishing stationarity conditions on the process. Beyond establishing various properties of the process, the paper also examines how to conduct Bayesian inference in this context, with specialised Gibbs samplers in action. And comparing models on real datasets via Geyer‘s (1994) logistic approximation to Bayes factors.

scalable Metropolis-Hastings

Posted in Books, Statistics, Travel with tags , , , , , , , , , on February 12, 2019 by xi'an

Among the flury of arXived papers of last week (414!), including a fair chunk of papers submitted to ICML 2019, I spotted one entry by Cornish et al. on scalable Metropolis-Hastings, which Arnaud Doucet had mentioned to me yesterday when in Oxford. The paper builds on the delayed acceptance paper we wrote with Marco Banterlé, Clara Grazian and Anthony Lee, itself relying on a factorisation decomposition of the likelihood, combined with control variate accelerating techniques. The factorisation of both the target and the proposal allows for a (less efficient) Metropolis-Hastings acceptance ratio that is the product

\prod_{i=1}^m \alpha_i(\theta,\theta')

of individual Metropolis-Hastings acceptance ratios, but which allows for quicker rejection if one of the probabilities in the product is small, because the corresponding Bernoulli draw is zero with high probability. One advance made in Michel et al. (2017) [which I doubly missed] is that subsampling is achievable by thinning (as in PDMPs, where these authors have been quite active) through an algorithm of Shantikumar (1985) [described in Devroye’s bible]. Provided each Metropolis-Hastings probability can be lower bounded:

\alpha_i(\theta,\theta') \ge \exp\{-\psi_i \phi(\theta,\theta')\}

by a term where the transition φ does not depend on the index i in the product. The computing cost of the thinning process thus depends on the efficiency of the subsampling, namely whether or not the (Poisson) number of terms is much smaller than m, number of terms in the product. A neat trick in the current paper that extends the the Fukui-Todo procedure is to switch to the original Metropolis-Hastings when the overall lower bound is too small, recovering the geometric ergodicity of this original if it holds (Theorem 2.1). Another neat remark is that when using the naïve factorisation as the product of the n individual likelihoods, the resulting algorithm is sort of doomed as n grows, even with an optimal scaling of the proposals. To achieve scalability, the authors introduce a Taylor (i.e., Gaussian) approximation to each local target in the product and start the acceptance decomposition by using the resulting overall Gaussian approximation. Meaning that the remaining product is now made of ratios of targets over their local Taylor approximations, hence most likely close to one. And potentially lower-bounded by the remainder term in the Taylor expansion. Leading to the conclusion that, when everything goes well, meaning that the Taylor expansions can be conducted and the bounds derived for the appropriate expansion, the order of the Poisson scale is O(1/√n)..! The proposal for the Metropolis-Hastings move is actually tuned to the Gaussian approximation, appearing as a variant of the Langevin move or more exactly a discretization of an Hamiltonian move. Obviously, I cannot judge of the complexity in implementing this new scheme from just reading the paper, but this development on the split target is definitely an exciting prospect for handling huge datasets and their friends!

thinning a Markov chain, statistically

Posted in Books, pictures, R, Statistics with tags , , , , , , on June 13, 2017 by xi'an

Art Owen has arXived a new version of his thinning MCMC paper, where he studies how thinning or subsampling can improve computing time in MCMC chains. I remember quite well the message set by Mark Berliner and Steve MacEachern in an early 1990’s paper that subsampling was always increasing the variance of the resulting estimators. We actually have this result in our Monte Carlo Statistical Methods book. Now, there are other perspectives on this, as for instance cases when thinning can be hard-wired by simulating directly a k-step move, delaying rejection or acceptance, prefetching, or simulating directly the accepted values as in our vanilla Rao-Blackwellisation approach. Here, Art considers the case when there is a cost θ of computing a transform of the simulation [when the transition cost a unit] and when those transforms are positively correlated with correlation ρ. Somewhat unsurprisingly, when θ is large enough, thinning becomes worth implementing. But requires extra computations in evaluating the correlation ρ and the cost θ, which is rarely comparable with the cost of computing the likelihood itself, a requirement for the Metropolis-Hastings or Hamiltonian Monte Carlo step(s).  Subsampling while keeping the right target (which is a hard constraint!) should thus have a much more effective impact on computing budgets.