**W**hile reading Confidence, Likelihood, Probability), by Tore Schweder and Nils Hjort, in the train from Oxford to Warwick, I came upon this unexpected property shown by Lindqvist and Taraldsen (Biometrika, 2005) that to simulate a sample **y** conditional on the realisation of a sufficient statistic, T(**y**)=t⁰, it is sufficient (!!!) to simulate the components of **y** as y=G(u,θ), with u a random variable with fixed distribution, e.g., a U(0,1), and to solve in θ the fixed point equation T(**y**)=t⁰. Assuming there exists a single solution. Brilliant (like an aurora borealis)! To borrow a simple example from the authors, take an exponential sample to be simulated given the sum statistics. As it is well-known, the conditional distribution is then a (rescaled) Beta and the proposed algorithm ends up being a standard Beta generator. For the method to work in general, T(**y**) must factorise through a function of the u’s, a so-called pivotal condition which brings us back to my post title. If this condition does not hold, the authors once again brilliantly introduce a pseudo-prior distribution on the parameter θ to make it independent from the u’s conditional on T(**y**)=t⁰. And discuss the choice of the Jeffreys prior as optimal in this setting even when this prior is improper. While the setting is necessarily one of exponential families and of sufficient conditioning statistics, I find it amazing that this property is not more well-known [at least by me!]. And wonder if there is an equivalent outside exponential families, for instance for simulating a ** t** sample conditional on the average of this sample.

## Archive for the Books Category

## fiducial simulation

Posted in Books, Kids, pictures, Statistics, Travel, University life with tags book review, conditional density, English train, fiducial statistics, Jeffreys prior, Monte Carlo Statistical Methods, Oxford, pseudo-random generator, simulation, Student's t distribution, sufficient statistics, University of Warwick on April 19, 2018 by xi'an## Le Monde puzzle [#1049]

Posted in Books, Kids, R with tags Le Monde, mathematical puzzle, R, recursive function on April 18, 2018 by xi'an**A**n algorithmic Le Monde mathematical puzzle with a direct

Alice and Bob play a game by picking alternatively one of the remaining digits between 1 and 10 and putting it in either one of two available stacks, 1 or 2. Their respective gains are the products of the piles (1 for Alice and 2 for Bob).

The problem is manageable by a recursive function

facten=factorial(10) pick=function(play=1,remz=matrix(0,2,5)){ if ((min(remz[1,])>0)||(min(remz[2,])>0)){#finale remz[remz==0]=(1:10)[!(1:10)%in%remz] return(prod(remz[play,])) }else{ gainz=0 for (i in (1:10)[!(1:10)%in%remz]){ propz=rbind(c(remz[1,remz[1,]>0],i, rep(0,sum(remz[1,]==0)-1)),remz[2,]) gainz=max(gainz,facten/pick(3-play,remz=propz))} for (i in (1:10)[!(1:10)%in%remz]){ propz=rbind(remz[1,],c(remz[2,remz[2,]>0],i, rep(0,sum(remz[2,]==0)-1))) gainz=max(gainz,facten/pick(3-play,remz=propz))} return(gainz)}}

that shows the optimal gain for Alice is 3360=2x5x6x7x 8, versus Bob getting 1080=1x3x4x9x10. The moves ensuring the gain are 2-10-…

## bitcoin and cryptography for statistical inference and AI

Posted in Books, Mountains, pictures, Running, Statistics, Travel, University life with tags AI, anonymised data, bitcoin, Britain, cryptography, encryption, Gregynog Hall, Gregynog Statistical Conference, information, Navy, Powys, Tregynon, Wales on April 16, 2018 by xi'an**A** recent news editorial in Nature (15 March issue) reminded me of the lectures Louis Aslett gave at the Gregynog Statistical Conference last week, on the advanced use of cryptography tools to analyse sensitive and private data. Lectures that reminded me of a graduate course I took on cryptography and coding, in Paris 6, and which led me to visit a lab at the Université de Limoges during my conscripted year in the French Navy. With no research outcome. Now, the notion of using encrypted data towards statistical analysis is fascinating in that it may allow for efficient inference and personal data protection at the same time. As opposed to earlier solutions of anonymisation that introduced noise and data degradation, not always providing sufficient protection of privacy. Encryption that is also the notion at the basis of the Nature editorial. An issue completely missing from the paper, while stressed by Louis, is that this encryption (like Bitcoin) is costly, in order to deter hacking, and hence energy inefficient. Or limiting the amount of data that can be used in such studies, which would turn the idea into a stillborn notion.

## accelerating MCMC

Posted in Books, Statistics, University life with tags acceleration of MCMC algorithms, algorithms, arXiv, cross validated, MCMC, Monte Carlo Statistical Methods, referee, simulation, Telecom Lille, typology, Université Paris Dauphine, University of Warwick, WIREs on April 11, 2018 by xi'an**A**s forecasted a rather long while ago (!), I wrote a short and incomplete survey on some approaches to accelerating MCMC. With the massive help of Victor Elvira (Lille), Nick Tawn (Warwick) and Changye Wu (Dauphine). Survey which current version just got arXived and which has now been accepted by WIREs Computational Statistics. The typology (and even the range of methods) adopted here is certainly mostly arbitrary, with suggestions for different divisions made by a very involved and helpful reviewer. While we achieved a quick conclusion to the review process, suggestions and comments are most welcome! Even if we cannot include every possible suggestion, just like those already made on X validated. (WIREs stands for Wiley Interdisciplinary Reviews and its dozen topics cover several fields, from computational stats to biology, to medicine, to engineering.)

## Bayesian goodness of fit

Posted in Statistics, University life, Books, pictures with tags ABC, Bayesian foundations, exchange algorithm, goodness of fit, harmonic mean estimator, image analysis, Ising model, Persi Diaconis, Stanford University, thermodynamic integration on April 10, 2018 by xi'an

**P**ersi Diaconis and Guanyang Wang have just arXived an interesting reflection on the notion of Bayesian goodness of fit tests. Which is a notion that has always bothered me, in a rather positive sense (!), as

“I also have to confess at the outset to the zeal of a convert, a born again believer in stochastic methods. Last week, Dave Wright reminded me of the advice I had given a graduate student during my algebraic geometry days in the 70’s :`Good Grief, don’t waste your time studying statistics. It’s all cookbook nonsense.’ I take it back! …”David Mumford

The paper starts with a reference to David Mumford, whose paper with Wu and Zhou on exponential “maximum entropy” synthetic distributions is at the source (?) of this paper, and whose name appears in its very title: “A conversation for David Mumford”…, about his conversion from pure (algebraic) maths to applied maths. The issue of (Bayesian) goodness of fit is addressed, with card shuffling examples, the null hypothesis being that the permutation resulting from the shuffling is uniformly distributed if shuffling takes enough time. Interestingly, while the parameter space is compact as a distribution on a finite set, Lindley’s paradox still occurs, namely that the null (the permutation comes from a Uniform) is always accepted provided there is no repetition under a “flat prior”, which is the Dirichlet D(1,…,1) over all permutations. (In this finite setting an improper prior is definitely improper as it does not get proper after accounting for observations. Although I do not understand why the Jeffreys prior is not the Dirichlet(½,…,½) in this case…) When resorting to the exponential family of distributions entertained by Zhou, Wu and Mumford, including the uniform distribution as one of its members, Diaconis and Wang advocate the use of a conjugate prior (exponential family, right?!) to compute a Bayes factor that simplifies into a ratio of two intractable normalising constants. For which the authors suggest using importance sampling, thermodynamic integration, or the exchange algorithm. Except that they rely on the (dreaded) harmonic mean estimator for computing the Bayes factor in the following illustrative section! Due to the finite nature of the space, I presume this estimator still has a finite variance. (Remark 1 calls for convergence results on exchange algorithms, which can be found I think in the just as recent arXival by Christophe Andrieu and co-authors.) An interesting if rare feature of the example processed in the paper is that the sufficient statistic used for the permutation model can be directly simulated from a Multinomial distribution. This is rare as seen when considering the benchmark of Ising models, for which the summary and sufficient statistic cannot be directly simulated. (If only…!) In fine, while I enjoyed the paper a lot, I remain uncertain as to its bearings, since defining an objective alternative for the goodness-of-fit test becomes quickly challenging outside simple enough models.

## Masterclass in Bayesian Statistics in Marseilles next Fall

Posted in Books, Kids, Mountains, pictures, R, Running, Statistics, Travel, University life with tags Aalto Science Institute, applied Bayesian analysis, Bayesian statistics, calanques, CIRM, CNRS, France, INLA, Luminy, Marseille, masterclass, Méditerranée, Provence, QUT, R, SMF, STAN on April 9, 2018 by xi'an**T**his post is to announce a second occurrence of the exciting “masterclass in Bayesian Statistics” that we organised in 2016, near Marseilles. It will take place on 22-26 October 2018 once more at CIRM (Centre International de Recherches Mathématiques, Luminy, Marseilles, France). The targeted audience includes all scientists interested in learning how Bayesian inference may be used to tackle the practical problems they face in their own research. In particular PhD students and post-docs should benefit most directly from this masterclass. Among the invited speakers, Kerrie Mengersen from QUT, Brisbane, visiting Marseilles this Fall, will deliver a series of lectures on the interface between Bayesian statistics and applied modelling, Havard Rue from KAUST will talk on computing with INLA, and Aki Vehtari from Aalto U, Helsinki, will give a course on Bayesian model assessment and model choice. There will be two tutorials on R and on Stan.

All interested participants in this masterclass should pre-register as early as possible, given that the total attendance is limited to roughly 90 participants. Some specific funding for local expenses (i.e., food + accommodation on-siteat CIRM) is available (thanks to CIRM, and potentially to Fondation Jacques Hadamard, to be confirmed); this funding will be attributed by the scientific committee, with high priority to PhD students and post-docs.