**Y**esterday, Giulia Carallo arXived the paper on generalised Poisson difference autoregressive processes that is a component of her Ph.D. thesis at Ca’ Foscari Universita di Venezia and to which I contributed while visiting Venezia last Spring. The stochastic process under study is integer valued as a difference of two generalised Poisson variates, made dependent by an INGARCH process that expresses the mean as a regression over past values of the process and past means. Which can be easily simulated as a difference of (correlated) Poisson variates. These two variates can in their turn be (re)defined through a thinning operator that I find most compelling, namely as a sum of Poisson variates with a number of terms being a (quasi-) Binomial variate depending on the previous value. This representation proves useful in establishing stationarity conditions on the process. Beyond establishing various properties of the process, the paper also examines how to conduct Bayesian inference in this context, with specialised Gibbs samplers in action. And comparing models on real datasets via Geyer‘s (1994) logistic approximation to Bayes factors.

## Archive for Bayes factors

## generalised Poisson difference autoregressive processes

Posted in pictures, Statistics, Travel, University life with tags autoregressive model, Bayes factors, Ca' Foscari University, Charlie Geyer, PhD thesis, Poisson distribution, stationarity, thinning, Venezia on February 14, 2020 by xi'an## Siem Reap conference

Posted in Kids, pictures, Travel, University life with tags Arkhangelsk, Bayes factors, Bayesian model choice, Bayesian model comparison, Cambodia, conference, CREST, Data Science and Finance conference, geometric ergodicity, group picture, Hyvärinen score, India, krama, Langevin MCMC algorithm, NGO, Pakistan, Sala Baï school, Siem Reap, Wasserstein distance on March 8, 2019 by xi'an**A**s I returned from the conference in Siem Reap. on a flight avoiding India and Pakistan and their [brittle and bristling!] boundary on the way back, instead flying far far north, near Arkhangelsk (but with nothing to show for it, as the flight back was fully in the dark), I reflected how enjoyable this conference had been, within a highly friendly atmosphere, meeting again with many old friends (some met prior to the creation of CREST) and new ones, a pleasure not hindered by the fabulous location near Angkor of course. (The above picture is the “last hour” group picture, missing a major part of the participants, already gone!)

Among the many talks, Stéphane Shao gave a great presentation on a paper [to appear in JASA] jointly written with Pierre Jacob, Jie Ding, and Vahid Tarokh on the Hyvärinen score and its use for Bayesian model choice, with a highly intuitive representation of this divergence function (which I first met in Padua when Phil Dawid gave a talk on this approach to Bayesian model comparison). Which is based on the use of a divergence function based on the squared error difference between the gradients of the true log-score and of the model log-score functions. Providing an alternative to the Bayes factor that can be shown to be consistent, even for some non-iid data, with some gains in the experiments represented by the above graph.

Arnak Dalalyan (CREST) presented a paper written with Lionel Riou-Durand on the convergence of non-Metropolised Langevin Monte Carlo methods, with a new discretization which leads to a substantial improvement of the upper bound on the sampling error rate measured in Wasserstein distance. Moving from p/ε to √p/√ε in the requested number of steps when p is the dimension and ε the target precision, for smooth and strongly log-concave targets.

This post gives me the opportunity to advertise for the NGO Sala Baï hostelry school, which the whole conference visited for lunch and which trains youths from underprivileged backgrounds towards jobs in hostelery, supported by donations, companies (like Krama Krama), or visiting the Sala Baï restaurant and/or hotel while in Siem Reap.

## leave Bayes factors where they once belonged

Posted in Statistics with tags Bayes factors, Bayesian Analysis, Bayesian decision theory, cross validated, prior comparison, prior predictive, prior selection, The Bayesian Choice, The Beatles, using the data twice, xkcd on February 19, 2019 by xi'an**I**n the past weeks I have received and read several papers (and X validated entries)where the Bayes factor is used to compare priors. Which does not look right to me, not on the basis of my general dislike of Bayes factors!, but simply because this seems to clash with the (my?) concept of Bayesian model choice and also because data should not play a role in that situation, from being used to select a *prior*, hence at least twice to run the inference, to resort to a *single* parameter value (namely the one behind the data) to decide between two distributions, to having no asymptotic justification, to eventually favouring the prior concentrated on the maximum likelihood estimator. And more. But I fear that this reticence to test for prior adequacy also extends to the prior predictive, or Box’s p-value, namely the probability under this prior predictive to observe something “more extreme” than the current observation, to quote from David Spiegelhalter.

## a question from McGill about The Bayesian Choice

Posted in Books, pictures, Running, Statistics, Travel, University life with tags Bayes factors, Bayesian hypothesis testing, Canada, cross validated, improper prior, McGill University, Montréal, posterior probability on December 26, 2018 by xi'an**I** received an email from a group of McGill students working on Bayesian statistics and using The Bayesian Choice (although the exercise pictured below is not in the book, the closest being exercise 1.53 inspired from Raiffa and Shlaiffer, 1961, and exercise 5.10 as mentioned in the email):

There was a question that some of us cannot seem to decide what is the correct answer. Here are the issues,

Some people believe that the answer to both is ½, while others believe it is 1. The reasoning for ½ is that since Beta is a continuous distribution, we never could have θ exactly equal to ½. Thus regardless of α, the probability that θ=½ in that case is 0. Hence it is ½. I found a related stack exchange question that seems to indicate this as well.

The other side is that by Markov property and mean of Beta(a,a), as α goes to infinity , we will approach ½ with probability 1. And hence the limit as α goes to infinity for both (a) and (b) is 1. I think this also could make sense in another context, as if you use the Bayes factor representation. This is similar I believe to the questions in the Bayesian Choice, 5.10, and 5.11.

As it happens, the answer is ½ in the first case (a) because π(H⁰) is ½ regardless of α and 1 in the second case (b) because the evidence against H⁰ goes to zero as α goes to zero *(watch out!)*, along with the mass of the prior on any compact of (0,1) since Γ(2α)/Γ(α)². (The limit does not correspond to a proper prior and hence is somewhat meaningless.) However, when α goes to infinity, the evidence against H⁰ goes to infinity and the posterior probability of ½ goes to zero, despite the prior under the alternative being more and more concentrated around ½!

## a come-back of the harmonic mean estimator

Posted in Statistics with tags Alan Gelfand, Bayes factors, Bayesian computing, harmonic mean estimator, Max Planck Institute, München, Werner-Heisenberg-Institut on September 6, 2018 by xi'an**A**re we in for a return of the harmonic mean estimator?! Allen Caldwell and co-authors arXived a new document that Allen also sent me, following a technique that offers similarities with our earlier approach with Darren Wraith, the difference being in the more careful and practical construct of the partition set and use of multiple hypercubes, which is the smart thing. I visited Allen’s group at the Max Planck Institut für Physik (Heisenberg) in München (Garching) in 2015 and we confronted our perspectives on harmonic means at that time. The approach followed in the paper starts from what I would call the canonical Gelfand and Dey (1995) representation with a uniform prior, namely that the integral of an arbitrary non-negative function [or unnormalised density] ƒ can be connected with the integral of the said function ƒ over a smaller set Δ with a finite measure measure [or volume]. And therefore to simulations from the density ƒ restricted to this set Δ. Which can be recycled by the harmonic mean identity towards producing an estimate of the integral of ƒ over the set Δ. When considering a partition, these integrals sum up to the integral of interest but this is not necessarily the only exploitation one can make of the fundamental identity. The most novel part stands in constructing an adaptive partition based on the sample, made of hypercubes obtained after whitening of the sample. Only keeping points with large enough density and sufficient separation to avoid overlap. (I am unsure a genuine partition is needed.) In order to avoid selection biases the original sample is separated into two groups, used independently. Integrals that stand too much away from the others are removed as well. This construction may sound a bit daunting in the number of steps it involves and in the poor adequation of a Normal to an hypercube or conversely, but it seems to shy away from the number one issue with the basic harmonic mean estimator, the almost certain infinite variance. Although it would be nice to be completely certain this doom is avoided. I still wonder at the degenerateness of the approximation of the integral with the dimension, as well as at other ways of exploiting this always fascinating [if fraught with dangers] representation. And comparing variances.