**I**n the past weeks I have received and read several papers (and X validated entries)where the Bayes factor is used to compare priors. Which does not look right to me, not on the basis of my general dislike of Bayes factors!, but simply because this seems to clash with the (my?) concept of Bayesian model choice and also because data should not play a role in that situation, from being used to select a *prior*, hence at least twice to run the inference, to resort to a *single* parameter value (namely the one behind the data) to decide between two distributions, to having no asymptotic justification, to eventually favouring the prior concentrated on the maximum likelihood estimator. And more. But I fear that this reticence to test for prior adequacy also extends to the prior predictive, or Box’s p-value, namely the probability under this prior predictive to observe something “more extreme” than the current observation, to quote from David Spiegelhalter.

## Archive for prior predictive

## leave Bayes factors where they once belonged

Posted in Statistics with tags Bayes factors, Bayesian Analysis, Bayesian decision theory, cross validated, prior comparison, prior predictive, prior selection, The Bayesian Choice, The Beatles, using the data twice, xkcd on February 19, 2019 by xi'an## back to the Bayesian Choice

Posted in Books, Kids, Statistics, University life with tags autoregressive model, Bayesian decision theory, Book, exercises, improper posteriors, improper prior, inverse Gamma distribution, prior predictive, The Bayesian Choice on October 17, 2018 by xi'an**S**urprisingly (or not?!), I received two requests about some exercises from The Bayesian Choice, one from a group of students from McGill having difficulties solving the above, wondering about the properness of the posterior (but missing the integration of x), to whom I sent back this correction. And another one from the Czech Republic about a difficulty with the term “evaluation” by which I meant (pardon my French!) estimation.

## agent-based models

Posted in Books, pictures, Statistics with tags agent-based models, climate change, data, drug users, Nature, Philadelphia, prior predictive, Sims, simulation model on October 2, 2018 by xi'an**A**n August issue of Nature I recently browsed [on my NUS trip] contained a news feature on agent- based models applied to understanding the opioid crisis in US. (With a rather sordid picture of a drug injection in Philadelphia, hence my own picture.)

To create an agent-based model, researchers first ‘build’ a virtual town or region, sometimes based on a real place, including buildings such as schools and food shops. They then populate it with agents, using census data to give each one its own characteristics, such as age, race and income, and to distribute the agents throughout the virtual town.The agents are autonomous but operate within pre-programmed routines — going to work five times a week, for instance. Some behaviours may be more random, such as a 5% chance per day of skipping work, or a 50% chance of meeting a certain person in the agent’s network. Once the system is as realistic as possible, the researchers introduce a variable such as a flu virus, with a rate and pattern of spread based on its real-life characteristics. They then run the simulation to test how the agents’ behaviour shifts when a school is closed or a vaccination campaign is started, repeating it thousands of times to determine the likelihood of different outcomes.

While I am obviously supportive of simulation based solutions, I cannot but express some reservation at the outcome, given that it is the product of the assumptions in the model. In Bayesian terms, this is purely prior predictive rather than posterior predictive. There is no hard data to create “realism”, apart from the census data. (The article also mixes the outcome of the simulation with real data. Or epidemiological data, not yet available according to the authors.)

In response to the opioid epidemic, Bobashev’s group has constructed Pain Town — a generic city complete with 10,000 people suffering from chronic pain, 70 drug dealers, 30 doctors, 10 emergency rooms and 10 pharmacies. The researchers run the model over five simulated years, recording how the situation changes each virtual day.

This is not to criticise the use of such tools to experiment with social, medical or political interventions, which practically and ethically cannot be tested in real life and working with such targeted versions of the Sims game can paradoxically be more convincing when dealing with policy makers. If they do not object at the artificiality of the outcome, as they often do for climate change models. Just from reading this general public article, I thus wonder at whether model selection and validation tools are implemented in conjunction with agent-based models…

## JSM 2018 [#3]

Posted in Mountains, Statistics, Travel, University life with tags ABC, Approximate Bayesian computation, Bayesian network, Bayesian p-values, British Columbia, Canada, curse of dimensionality, JSM 2018, prior predictive, pseudo-marginal MCMC, spectral analysis, spike-and-slab prior, stochastic gradient descent, Vancouver, variational Bayes methods on August 1, 2018 by xi'an**A**s I skipped day #2 for climbing, here I am on day #3, attending JSM 2018, with a [fully Canadian!] session on (conditional) copula (where Bruno Rémillard talked of copulas for mixed data, with unknown atoms, which sounded like an impossible target!), and another on four highlights from Bayesian Analysis, (the journal), with Maria Terres defending the (often ill-considered!) spectral approach within Bayesian analysis, modelling spectral densities (Fourier transforms of correlations functions, not probability densities), an advantage compared with MCAR modelling being the automated derivation of dependence graphs. While the spectral ghost did not completely dissipate for me, the use of DIC that she mentioned at the very end seems to call for investigation as I do not know of well-studied cases of complex dependent data with clearly specified DICs. Then Chris Drobandi was speaking of ABC being used for prior choice, an idea I vaguely remember seeing quite a while ago as a referee (or another paper!), paper in BA that I missed (and obviously did not referee). Using the same reference table works (for simple ABC) with different datasets but also different priors. I did not get first the notion that the reference table also produces an evaluation of the marginal distribution but indeed the entire simulation from prior x generative model gives a Monte Carlo representation of the marginal, hence the evidence at the observed data. Borrowing from Evans’ fringe Bayesian approach to model choice by prior predictive check for prior-model conflict. I remain sceptic or at least agnostic on the notion of using data to compare priors. And here on using ABC in tractable settings.

The afternoon session was [a mostly Australian] Advanced Bayesian computational methods, with Robert Kohn on variational Bayes, with an interesting comparison of (exact) MCMC and (approximative) variational Bayes results for some species intensity and the remark that forecasting may be much more tolerant to the approximation than estimation. Making me wonder at a possibility of assessing VB on the marginals manageable by MCMC. Unless I miss a complexity such that the decomposition is impossible. And Antonietta Mira on estimating time-evolving networks estimated by ABC (which Anto first showed me in Orly airport, waiting for her plane!). With a possibility of a zero distance. Next talk by Nadja Klein on impicit copulas, linked with shrinkage properties I was unaware of, including the case of spike & slab copulas. Michael Smith also spoke of copulas with discrete margins, mentioning a version with continuous latent variables (as I thought could be done during the first session of the day), then moving to variational Bayes which sounds quite popular at JSM 2018. And David Gunawan made a presentation of a paper mixing pseudo-marginal Metropolis with particle Gibbs sampling, written with Chris Carter and Robert Kohn, making me wonder at their feature of using the white noise as an auxiliary variable in the estimation of the likelihood, which is quite clever but seems to get against the validation of the pseudo-marginal principle. *(Warning: I have been known to be wrong!)*

## the Hyvärinen score is back

Posted in pictures, Statistics, Travel with tags Bayes factor, Bayesian model comparison, Bayesian model selection, consistency, Harvard University, Hyvärinen score, Lévy diffusion process, logarithmic score, Padova, penalisation, prior predictive, sequential Monte Carlo, SMC, SMC² on November 21, 2017 by xi'an**S**téphane Shao, Pierre Jacob and co-authors from Harvard have just posted on arXiv a new paper on Bayesian model comparison using the Hyvärinen score

which thus uses the Laplacian as a natural and normalisation-free penalisation for the score test. (Score that I first met in Padova, a few weeks before moving from X to IX.) Which brings a decision-theoretic alternative to the Bayes factor and which delivers a coherent answer when using improper priors. Thus a very appealing proposal in my (biased) opinion! The paper is mostly computational in that it proposes SMC and SMC² solutions to handle the estimation of the Hyvärinen score for models with tractable likelihoods and tractable completed likelihoods, respectively. (Reminding me that Pierre worked on SMC² algorithms quite early during his Ph.D. thesis.)

A most interesting remark in the paper is to recall that the Hyvärinen score associated with a generic model on a series must be the prequential (predictive) version

rather than the version on the joint marginal density of the whole series. (Followed by a remark within the remark that the logarithm scoring rule does not make for this distinction. And I had to write down the cascading representation

to convince myself that this unnatural decomposition, where the posterior on θ varies on each terms, is true!) For consistency reasons.

This prequential decomposition is however a plus in terms of computation when resorting to sequential Monte Carlo. Since each time step produces an evaluation of the associated marginal. In the case of state space models, another decomposition of the authors, based on measurement densities and partial conditional expectations of the latent states allows for another (SMC²) approximation. The paper also establishes that for non-nested models, the Hyvärinen score as a model selection tool asymptotically selects the closest model to the data generating process. For the divergence induced by the score. Even for state-space models, under some technical assumptions. From this asymptotic perspective, the paper exhibits an example where the Bayes factor and the Hyvärinen factor disagree, even asymptotically in the number of observations, about which mis-specified model to select. And last but not least the authors propose and assess a discrete alternative relying on finite differences instead of derivatives. Which remains a proper scoring rule.

I am quite excited by this work (call me biased!) and I hope it can induce following works as a viable alternative to Bayes factors, if only for being more robust to the [unspecified] impact of the prior tails. As in the above picture where some realisations of the SMC² output and of the sequential decision process see the wrong model being almost acceptable for quite a long while…

## ABC for repulsive point processes

Posted in Books, pictures, Statistics, University life with tags ABC, ABC model choice, Bayesian model comparison, determinantal point process, Gibbs point process, point processes, prior predictive, summary statistics on May 5, 2016 by xi'an**S**hinichiro Shirota and Alan Gelfand arXived a paper on the use of ABC for analysing some repulsive point processes, more exactly the Gibbs point processes, for which ABC requires a perfect sampler to operate, unless one is okay with stopping an MCMC chain before it converges, and determinantal point processes studied by Lavancier et al. (2015) [a paper I wanted to review and could not find time to!]. Detrimental point processes have an intensity function that is the determinant of a covariance kernel, hence repulsive. Simulation of a determinantal process itself is not straightforward and involves approximations. But the likelihood itself is unavailable and Lavancier et al. (2015) use approximate versions by fast Fourier transforms, which means MCMC is challenging even with those approximate steps.

“The main computational cost of our algorithm is simulation of x for each iteration of the ABC-MCMC.”

The authors propose here to use ABC instead. With an extra approximative step for simulating the determinantal process itself. Interestingly, the Gibbs point process allows for a sufficient statistic, the number of R-closed points, although I fail to see how the radius R is determined by the model, while the determinantal process does not. The summary statistics end up being a collection of frequencies within various spheres of different radii. However, these statistics are then processed by Fearnhead’s and Prangle’s proposal, namely to come up as an approximation of E[θ|y] as the natural summary. Obtained by regression over the original summaries. Another layer of complexity stems from using an ABC-MCMC approach. And including a Lasso step in the regression towards excluding less relevant radii. The paper also considers Bayesian model validation for such point processes, implementing prior predictive tests with a ranked probability score, rather than a Bayes factor.

As point processes have always been somewhat mysterious to me, I do not have any intuition about the strength of the distributional assumptions there and the relevance of picking a determinantal process against, say, a Strauss process. The model comparisons operated in the paper are not strongly supporting one repulsive model versus the others, with the authors concluding at the need for many points towards a discrimination between models. I also wonder at the possibility of including other summaries than Ripley’s K-functions, which somewhat imply a discretisation of the space, by concentric rings. Maybe using other point processes for deriving summary statistics as MLEs or Bayes estimators for those models would help. (Or maybe not.)

## ABC model choice via random forests [expanded]

Posted in Statistics, University life with tags 1000 Genomes Project, ABC, ABC model choice, admixture, bootstrap, DIYABC, Human evolution, logistic regression, out-of -Africa scenario, posterior predictive, prior predictive, random forests on October 1, 2014 by xi'an**T**oday, we arXived a second version of our paper on ABC model choice with random forests. Or maybe [A]BC model choice with random forests. Since the random forest is built on a simulation from the prior predictive and no further approximation is used in the process. Except for the computation of the posterior [predictive] error rate. The update wrt the earlier version is that we ran massive simulations throughout the summer, on existing and new datasets. In particular, we have included a Human dataset extracted from the 1000 Genomes Project. Made of 51,250 SNP loci. While this dataset is not used to test new evolution scenarios, we compared six out-of-Africa scenarios, with a possible admixture for Americans of African ancestry. The scenario selected by a random forest procedure posits a single out-of-Africa colonization event with a secondary split into a European and an East Asian population lineages, and a recent genetic admixture between African and European lineages, for Americans of African origin. The procedure reported a high level of confidence since the estimated posterior error rate is equal to zero. The SNP loci were carefully selected using the following criteria: (i) all individuals have a genotype characterized by a quality score (GQ)>10, (ii) polymorphism is present in at least one of the individuals in order to fit the SNP simulation algorithm of Hudson (2002) used in DIYABC V2 (Cornuet et al., 2014), (iii) the minimum distance between two consecutive SNPs is 1 kb in order to minimize linkage disequilibrium between SNP, and (iv) SNP loci showing significant deviation from Hardy-Weinberg equilibrium at a 1% threshold in at least one of the four populations have been removed.

In terms of random forests, we optimised the size of the bootstrap subsamples for all of our datasets. While this optimisation requires extra computing time, it is negligible when compared with the enormous time taken by a logistic regression, which is [yet] the standard ABC model choice approach. Now the data has been gathered, it is only a matter of days before we can send the paper to a journal