Archive for prior predictive

JSM 2018 [#3]

Posted in Mountains, Statistics, Travel, University life with tags , , , , , , , , , , , , , , on August 1, 2018 by xi'an

As I skipped day #2 for climbing, here I am on day #3, attending JSM 2018, with a [fully Canadian!] session on (conditional) copula (where Bruno Rémillard talked of copulas for mixed data, with unknown atoms, which sounded like an impossible target!), and another on four highlights from Bayesian Analysis, (the journal), with Maria Terres defending the (often ill-considered!) spectral approach within Bayesian analysis, modelling spectral densities (Fourier transforms of correlations functions, not probability densities), an advantage compared with MCAR modelling being the automated derivation of dependence graphs. While the spectral ghost did not completely dissipate for me, the use of DIC that she mentioned at the very end seems to call for investigation as I do not know of well-studied cases of complex dependent data with clearly specified DICs. Then Chris Drobandi was speaking of ABC being used for prior choice, an idea I vaguely remember seeing quite a while ago as a referee (or another paper!), paper in BA that I missed (and obviously did not referee). Using the same reference table works (for simple ABC) with different datasets but also different priors. I did not get first the notion that the reference table also produces an evaluation of the marginal distribution but indeed the entire simulation from prior x generative model gives a Monte Carlo representation of the marginal, hence the evidence at the observed data. Borrowing from Evans’ fringe Bayesian approach to model choice by prior predictive check for prior-model conflict. I remain sceptic or at least agnostic on the notion of using data to compare priors. And here on using ABC in tractable settings.

The afternoon session was [a mostly Australian] Advanced Bayesian computational methods,  with Robert Kohn on variational Bayes, with an interesting comparison of (exact) MCMC and (approximative) variational Bayes results for some species intensity and the remark that forecasting may be much more tolerant to the approximation than estimation. Making me wonder at a possibility of assessing VB on the marginals manageable by MCMC. Unless I miss a complexity such that the decomposition is impossible. And Antonietta Mira on estimating time-evolving networks estimated by ABC (which Anto first showed me in Orly airport, waiting for her plane!). With a possibility of a zero distance. Next talk by Nadja Klein on impicit copulas, linked with shrinkage properties I was unaware of, including the case of spike & slab copulas. Michael Smith also spoke of copulas with discrete margins, mentioning a version with continuous latent variables (as I thought could be done during the first session of the day), then moving to variational Bayes which sounds quite popular at JSM 2018. And David Gunawan made a presentation of a paper mixing pseudo-marginal Metropolis with particle Gibbs sampling, written with Chris Carter and Robert Kohn, making me wonder at their feature of using the white noise as an auxiliary variable in the estimation of the likelihood, which is quite clever but seems to get against the validation of the pseudo-marginal principle. (Warning: I have been known to be wrong!)

the Hyvärinen score is back

Posted in pictures, Statistics, Travel with tags , , , , , , , , , , , , , on November 21, 2017 by xi'an

Stéphane Shao, Pierre Jacob and co-authors from Harvard have just posted on arXiv a new paper on Bayesian model comparison using the Hyvärinen score

\mathcal{H}(y, p) = 2\Delta_y \log p(y) + ||\nabla_y \log p(y)||^2

which thus uses the Laplacian as a natural and normalisation-free penalisation for the score test. (Score that I first met in Padova, a few weeks before moving from X to IX.) Which brings a decision-theoretic alternative to the Bayes factor and which delivers a coherent answer when using improper priors. Thus a very appealing proposal in my (biased) opinion! The paper is mostly computational in that it proposes SMC and SMC² solutions to handle the estimation of the Hyvärinen score for models with tractable likelihoods and tractable completed likelihoods, respectively. (Reminding me that Pierre worked on SMC² algorithms quite early during his Ph.D. thesis.)

A most interesting remark in the paper is to recall that the Hyvärinen score associated with a generic model on a series must be the prequential (predictive) version

\mathcal{H}_T (M) = \sum_{t=1}^T \mathcal{H}(y_t; p_M(dy_t|y_{1:(t-1)}))

rather than the version on the joint marginal density of the whole series. (Followed by a remark within the remark that the logarithm scoring rule does not make for this distinction. And I had to write down the cascading representation

\log p(y_{1:T})=\sum_{t=1}^T \log p(y_t|y_{1:t-1})

to convince myself that this unnatural decomposition, where the posterior on θ varies on each terms, is true!) For consistency reasons.

This prequential decomposition is however a plus in terms of computation when resorting to sequential Monte Carlo. Since each time step produces an evaluation of the associated marginal. In the case of state space models, another decomposition of the authors, based on measurement densities and partial conditional expectations of the latent states allows for another (SMC²) approximation. The paper also establishes that for non-nested models, the Hyvärinen score as a model selection tool asymptotically selects the closest model to the data generating process. For the divergence induced by the score. Even for state-space models, under some technical assumptions.  From this asymptotic perspective, the paper exhibits an example where the Bayes factor and the Hyvärinen factor disagree, even asymptotically in the number of observations, about which mis-specified model to select. And last but not least the authors propose and assess a discrete alternative relying on finite differences instead of derivatives. Which remains a proper scoring rule.

I am quite excited by this work (call me biased!) and I hope it can induce following works as a viable alternative to Bayes factors, if only for being more robust to the [unspecified] impact of the prior tails. As in the above picture where some realisations of the SMC² output and of the sequential decision process see the wrong model being almost acceptable for quite a long while…

ABC for repulsive point processes

Posted in Books, pictures, Statistics, University life with tags , , , , , , , on May 5, 2016 by xi'an

garden tree, Jan. 12, 2012Shinichiro Shirota and Alan Gelfand arXived a paper on the use of ABC for analysing some repulsive point processes, more exactly the Gibbs point processes, for which ABC requires a perfect sampler to operate, unless one is okay with stopping an MCMC chain before it converges, and determinantal point processes studied by Lavancier et al. (2015) [a paper I wanted to review and could not find time to!]. Detrimental point processes have an intensity function that is the determinant of a covariance kernel, hence repulsive. Simulation of a determinantal process itself is not straightforward and involves approximations. But the likelihood itself is unavailable and Lavancier et al. (2015) use approximate versions by fast Fourier transforms, which means MCMC is challenging even with those approximate steps.

“The main computational cost of our algorithm is simulation of x for each iteration of the ABC-MCMC.”

The authors propose here to use ABC instead. With an extra approximative step for simulating the determinantal process itself. Interestingly, the Gibbs point process allows for a sufficient statistic, the number of R-closed points, although I fail to see how the radius R is determined by the model, while the determinantal process does not. The summary statistics end up being a collection of frequencies within various spheres of different radii. However, these statistics are then processed by Fearnhead’s and Prangle’s proposal, namely to come up as an approximation of E[θ|y] as the natural summary. Obtained by regression over the original summaries. Another layer of complexity stems from using an ABC-MCMC approach. And including a Lasso step in the regression towards excluding less relevant radii. The paper also considers Bayesian model validation for such point processes, implementing prior predictive tests with a ranked probability score, rather than a Bayes factor.

As point processes have always been somewhat mysterious to me, I do not have any intuition about the strength of the distributional assumptions there and the relevance of picking a determinantal process against, say, a Strauss process. The model comparisons operated in the paper are not strongly supporting one repulsive model versus the others, with the authors concluding at the need for many points towards a discrimination between models. I also wonder at the possibility of including other summaries than Ripley’s K-functions, which somewhat imply a discretisation of the space, by concentric rings. Maybe using other point processes for deriving summary statistics as MLEs or Bayes estimators for those models would help. (Or maybe not.)

ABC model choice via random forests [expanded]

Posted in Statistics, University life with tags , , , , , , , , , , , on October 1, 2014 by xi'an

outofAfToday, we arXived a second version of our paper on ABC model choice with random forests. Or maybe [A]BC model choice with random forests. Since the random forest is built on a simulation from the prior predictive and no further approximation is used in the process. Except for the computation of the posterior [predictive] error rate. The update wrt the earlier version is that we ran massive simulations throughout the summer, on existing and new datasets. In particular, we have included a Human dataset extracted from the 1000 Genomes Project. Made of 51,250 SNP loci. While this dataset is not used to test new evolution scenarios, we compared six out-of-Africa scenarios, with a possible admixture for Americans of African ancestry. The scenario selected by a random forest procedure posits a single out-of-Africa colonization event with a secondary split into a European and an East Asian population lineages, and a recent genetic admixture between African and European lineages, for Americans of African origin. The procedure reported a high level of confidence since the estimated posterior error rate is equal to zero. The SNP loci were carefully selected using the following criteria: (i) all individuals have a genotype characterized by a quality score (GQ)>10, (ii) polymorphism is present in at least one of the individuals in order to fit the SNP simulation algorithm of Hudson (2002) used in DIYABC V2 (Cornuet et al., 2014), (iii) the minimum distance between two consecutive SNPs is 1 kb in order to minimize linkage disequilibrium between SNP, and (iv) SNP loci showing significant deviation from Hardy-Weinberg equilibrium at a 1% threshold in at least one of the four populations have been removed.

In terms of random forests, we optimised the size of the bootstrap subsamples for all of our datasets. While this optimisation requires extra computing time, it is negligible when compared with the enormous time taken by a logistic regression, which is [yet] the standard ABC model choice approach. Now the data has been gathered, it is only a matter of days before we can send the paper to a journal

%d bloggers like this: