## fiducial simulation

Posted in Books, Kids, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , on April 19, 2018 by xi'an

While reading Confidence, Likelihood, Probability), by Tore Schweder and Nils Hjort, in the train from Oxford to Warwick, I came upon this unexpected property shown by Lindqvist and Taraldsen (Biometrika, 2005) that to simulate a sample y conditional on the realisation of a sufficient statistic, T(y)=t⁰, it is sufficient (!!!) to simulate the components of  y as y=G(u,θ), with u a random variable with fixed distribution, e.g., a U(0,1), and to solve in θ the fixed point equation T(y)=t⁰. Assuming there exists a single solution. Brilliant (like an aurora borealis)! To borrow a simple example from the authors, take an exponential sample to be simulated given the sum statistics. As it is well-known, the conditional distribution is then a (rescaled) Beta and the proposed algorithm ends up being a standard Beta generator. For the method to work in general, T(y) must factorise through a function of the u’s, a so-called pivotal condition which brings us back to my post title. If this condition does not hold, the authors once again brilliantly introduce a pseudo-prior distribution on the parameter θ to make it independent from the u’s conditional on T(y)=t⁰. And discuss the choice of the Jeffreys prior as optimal in this setting even when this prior is improper. While the setting is necessarily one of exponential families and of sufficient conditioning statistics, I find it amazing that this property is not more well-known [at least by me!]. And wonder if there is an equivalent outside exponential families, for instance for simulating a t sample conditional on the average of this sample.

## accelerating MCMC

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , on April 11, 2018 by xi'an

As forecasted a rather long while ago (!), I wrote a short and incomplete survey on some approaches to accelerating MCMC. With the massive help of Victor Elvira (Lille), Nick Tawn (Warwick) and Changye Wu (Dauphine). Survey which current version just got arXived and which has now been accepted by WIREs Computational Statistics. The typology (and even the range of methods) adopted here is certainly mostly arbitrary, with suggestions for different divisions made by a very involved and helpful reviewer. While we achieved a quick conclusion to the review process, suggestions and comments are most welcome! Even if we cannot include every possible suggestion, just like those already made on X validated. (WIREs stands for Wiley Interdisciplinary Reviews and its dozen topics cover several fields, from computational stats to biology, to medicine, to engineering.)

## MCMC with multiple tries

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , on April 5, 2018 by xi'an

Earlier this year, Luca Martino wrote and arXived a review on multiple try MCMC. As its name suggests, the starting point of this algorithm is to propose N potential moves simultaneously instead of one, possibly according to N different proposal (conditional) densities, and to select one by a normalised importance sampling weight. The move is accepted by a Metropolis-Hastings step based on the ratio of the normalisation constants [at the current and at the one-before-current stages]. Besides the cost of computing the summation and generating the different variates, this method also faces the drawback of requiring N-1 supplementary simulations that are only used for achieving detailed balance and computing a backward summation of importance weights. (A first section of the review is dedicated to independent Metropolis-Hastings proposals, q(θ), which make life simpler, but are less realistic in my opinion since some prior knowledge or experimentation is necessary to build a relevant distribution q(θ).) An alternative covered in the survey is ensemble Monte Carlo (Neal, 2011), which produces a whole sample at each iteration, with target the product of the initial targets. This reminded me of our pinball sampler, which aimed at producing a spread-out sample while keeping the marginal correct. Although the motivation sounds closer to a particle sampler. Especially with this associated notion of an empirical approximation of the target. The next part of the review is about delayed rejection, which is a natural alternative approach to speeding up MCMC by considering several possibilities, if sequentially. Started in Antonietta Mira‘s 1999 PhD thesis. The difficulty with this approach is that the acceptance probability gets increasingly complex as the number of delays grows, which may annihilate its appeal relative to simultaneous multiple tries.

## X entropy for optimisation

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , on March 29, 2018 by xi'an

At Gregynog, with mounds of snow still visible in the surrounding hills, not to be confused with the many sheep dotting the fields(!), Owen Jones gave a three hour lecture on simulation for optimisation, which is a less travelled path when compared with simulation for integration. His second lecture covered cross entropy for optimisation purposes. (I had forgotten that Reuven Rubinstein and Dirk Kroese had put forward this aspect of their technique in the very title of their book. As “A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation and Machine Learning”.) The X entropy approaches pushes for simulations restricted to top values of the target function, iterating to find the best parameter in the parametric family used for the simulation. (Best to be understood in the Kullback sense.) Now, this is a wee bit like simulated annealing, where lots of artificial entities have to be calibrated in the algorithm, due to the original problem being unrelated to an specific stochastic framework. X entropy facilitates concentration on the highest values of the target, but requires a family of probability distributions that puts weight on the top region. This may be a damning issue in large dimensions. Owen illustrated the approach in the case of the travelling salesman problem, where the parameterised distribution is a Markov chain on the state space of city sequences. Further, if the optimal value of the target is unknown, avoiding getting stuck in a local optimum may be tricky. (Owen presented a proof of convergence for a temperature going to zero slowly enough that is equivalent to a sure exploration of the entire state space, in a discrete setting, which does not provide a reassurance in this respect, as the corresponding algorithm cannot be implemented.) This method falls into the range of methods that are doubly stochastic in that they rely on Monte Carlo approximations at each iteration of the exploration algorithm.

During a later talk, I tried to recycle one of my earlier R codes on simulated annealing for sudokus, but could not find a useful family of proposal distributions to reach the (unique) solution. Using a mere product of distributions on each of the free positions in the sudoku grid only led me to a penalty of 13 errors…

1    2    8    5    9    7    4    9    3
7    3    5    1    2    4    6    2    8
4    6    9    6    3    8    5    7    1
2    7    5    3    1    6    9    4    8
8    1    4    7    8    9    7    6    2
6    9    3    8    4    2    1    3    5
3    8    6    4    7    5    2    1    9
1    4    2    9    6    3    8    5    7
9    5    7    2    1    8    3    4    6


It is hard to consider a distribution on the space of permutations, 𝔖⁸¹.

## Bayes Comp 2018 [call for guest posts]

Posted in Mountains, pictures, Statistics, Travel, University life with tags , , , , , , , on March 26, 2018 by xi'an

As the next MCMski conference, now called Bayes Comp, is starting in Barcelona, Spain, March 26-29, I welcome all guest posts covering the conference, since I am not going to be there! Enjoy!

## an interesting identity

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , on March 1, 2018 by xi'an

Another interesting X validated question, another remembrance of past discussions on that issue. Discussions that took place in the Institut d’Astrophysique de Paris, nearby this painting of Laplace, when working on our cosmostats project. Namely the potential appeal of recycling multidimensional simulations by permuting the individual components in nearly independent settings. As shown by the variance decomposition in my answer, when opposing N iid pairs (X,Y) to the N combinations of √N simulations of X and √N simulations of Y, the comparison

$\text{var} \hat{\mathfrak{h}}^2_N=\text{var} (\hat{\mathfrak{h}}^1_N)+\frac{mn(n-1)}{N^2}\,\text{var}^Y\left\{ \mathbb{E}^{X}\left\{\mathfrak{h}(X,Y)\right\}\right\}$

$+\frac{m(m-1)n}{N^2}\,\text{var}^X\left[\mathbb{E}^Y\left\{\mathfrak{h}(X,Y)\right\}\right]$

unsurprisingly gives the upper hand to the iid sequence. A sort of converse to Rao-Blackwellisation…. Unless the production of N simulations gets much more costly when compared with the N function evaluations. No wonder we never see this proposal in Monte Carlo textbooks!

## amazing appendix

Posted in Books, Statistics, Travel, University life with tags , , , , , , , , , , , on February 13, 2018 by xi'an

In the first appendix of the 1995 Statistical Science paper of Besag, Green, Higdon and Mengersen, on MCMC, “Bayesian Computation and Stochastic Systems”, stands a fairly neat result I was not aware of (and which Arnaud Doucet, with his unrivalled knowledge of the literature!, pointed out to me in Oxford, avoiding me the tedium to try to prove it afresco!). I remember well reading a version of the paper in Fort Collins, Colorado, in 1993 (I think!) but nothing about this result.

It goes as follows: when running a Metropolis-within-Gibbs sampler for component x¹ of a collection of variates x¹,x²,…, thus aiming at simulating from the full conditional of x¹ given x⁻¹ by making a proposal q(x|x¹,x⁻¹), it is perfectly acceptable to use a proposal that depends on a parameter α (no surprise so far!) and to generate this parameter α anew at each iteration (still unsurprising as α can be taken as an auxiliary variable) and to have the distribution of this parameter α depending on the other variates x²,…, i.e., x⁻¹. This is the surprising part, as adding α as an auxiliary variable was messing up the update of x⁻¹. But the proof as found in the 1995 paper [page 35] does not require to consider α as such as it establishes global balance directly. (Or maybe still detailed balance when writing the whole Gibbs sampler as a cycle of Metropolis steps.) Terrific! And a whiff mysterious..!