**I** was (exceptionally) working in (and for) my garden when my daughter shouted down from her window that John Nash had just died. I thus completed my tree trimming and went to check about this sad item of news. What I read made the news even sadder as he and his wife had died in a taxi crash in New Jersey, apparently for not wearing seat-belts, a strategy you would think far from minimax… Since Nash was in Norway a few days earlier to receive the 2015 Abel Prize, it may even be that the couple was on its way home back from the airport. A senseless death for a Beautiful Mind.

## Archive for the Books Category

## an even more senseless taxi-ride

Posted in Books, Kids, pictures, Travel, University life with tags A Beautiful Mind, Abel Prize, car accident, John Nash, New Jersey, Nobel Prize, Norway, Princeton University, seat-belt, taxi on May 24, 2015 by xi'an## Bruce Lindsay (March 7, 1947 — May 5, 2015)

Posted in Books, Running, Statistics, Travel, University life with tags American statisticians, Bruce Lindsay, composite likelihood, Edinburgh, ICMS, Italia, marathon, mixture estimation, mixtures of distributions, Penn State University, unknown number of components, Venezia on May 22, 2015 by xi'an**W**hen early registering for Seattle (JSM 2015) today, I discovered on the ASA webpage the very sad news that Bruce Lindsay had passed away on May 5. While Bruce was not a very close friend, we had met and interacted enough times for me to feel quite strongly about his most untimely death. Bruce was indeed “Mister mixtures” in many ways and I have always admired the unusual and innovative ways he had found for analysing mixtures. Including algebraic ones through the rank of associated matrices. Which is why I first met him—besides a few words at the 1989 Gertrude Cox (first) scholarship race in Washington DC—at the workshop I organised with Gilles Celeux and Mike West in Aussois, French Alps, in 1995. After this meeting, we met twice in Edinburgh at ICMS workshops on mixtures, organised with Mike Titterington. I remember sitting next to Bruce at one workshop dinner (at Blonde) and him talking about his childhood in Oregon and his father being a journalist and how this induced him to become an academic. He also contributed a chapter on estimating the number of components [of a mixture] to the Wiley book we edited out of this workshop. Obviously, his work extended beyond mixtures to a general neo-Fisherian theory of likelihood inference. (Bruce was certainly *not* a Bayesian!) Last time, I met him, it was in Italia, at a likelihood workshop in Venezia, October 2012, mixing Bayesian nonparametrics, intractable likelihoods, and pseudo-likelihoods. He gave a survey talk about composite likelihood, telling me about his extended stay in Italy (Padua?) around that time… So, Bruce, I hope you are now running great marathons in a place so full of mixtures that you can always keep ahead of the pack! Fare well!

## non-reversible MCMC

Posted in Books, Statistics, University life with tags arXiv, MCMC algorithms, Monte Carlo Statistical Methods, Ornstein-Uhlenbeck model, reversibility, Université Paris Dauphine, University of Warwick on May 21, 2015 by xi'an**W**hile visiting Dauphine, Natesh Pillai and Aaron Smith pointed out this interesting paper of Joris Bierkens (Warwick) that had escaped my arXiv watch/monitoring. The paper is about turning Metropolis-Hastings algorithms into non-reversible versions, towards improving mixing.

In a discrete setting, a way to produce a non-reversible move is to mix the proposal kernel Q with its time-reversed version Q’ and use an acceptance probability of the form

where ε is any weight. This construction is generalised in the paper to any vorticity (skew-symmetric with zero sum rows) matrix Γ, with the acceptance probability

where ε is small enough to ensure all numerator values are non-negative. This is a rather annoying assumption in that, except for the special case derived from the time-reversed kernel, it has to be checked over all pairs (x,y). (I first thought it also implied the normalising constant of π but everything can be set in terms of the unormalised version of π, Γ or ε included.) The paper establishes that the new acceptance probability preserves π as its stationary distribution. An alternative construction is to make the proposal change from Q in H such that H(x,y)=Q(x,y)+εΓ(x,y)/π(x). Which seems more pertinent as not changing the proposal cannot improve that much the mixing behaviour of the chain. Still, the move to the non-reversible versions has the noticeable plus of decreasing the asymptotic variance of the Monte Carlo estimate for any integrable function. Any. (Those results are found in the physics literature of the 2000’s.)

The extension to the continuous case is a wee bit more delicate. One needs to find an anti-symmetric vortex function g with zero integral [equivalent to the row sums being zero] such that g(x,y)+π(y)q(y,x)>0 and with same support as π(x)q(x,y) so that the acceptance probability of g(x,y)+π(y)q(y,x)/π(x)q(x,y) leads to π being the stationary distribution. Once again g(x,y)=ε(π(y)q(y,x)-π(x)q(x,y)) is a natural candidate but it is unclear to me why it should work. As the paper only contains one illustration for the discretised Ornstein-Uhlenbeck model, with the above choice of g for a small enough ε (a point I fail to understand since any ε<1 should provide a positive g(x,y)+π(y)q(y,x)), it is also unclear to me that this modification (i) is widely applicable and (ii) is relevant for genuine MCMC settings.

## speed seminar-ing

Posted in Books, pictures, Statistics, Travel, University life, Wines with tags ABC, Bayes factor, Bayesian model choice, Bayesian testing, French cheese, French wines, Languedoc wines, Livarot, Mas Bruguière, Montpellier, Pic Saint Loup on May 20, 2015 by xi'an**Y**esterday, I made a quick afternoon trip to Montpellier as replacement of a seminar speaker who had cancelled at the last minute. Most obviously, I gave a talk about our “testing as mixture” proposal. And as previously, the talk generated a fair amount of discussion and feedback from the audience. Providing me with additional aspects to include in a revision of the paper. Whether or not the current submission is rejected, new points made and received during those seminars will have to get in a revised version as they definitely add to the appeal to the perspective. In that seminar, most of the discussion concentrated on the connection with *decisions* based on such a tool as the posterior distribution of the mixture weight(s). My argument for sticking with the posterior rather than providing a hard decision rule was that the message is indeed in arguing hard rules that end up mimicking the p- or b-values. And the catastrophic consequences of fishing for significance and the like. Producing instead a validation by simulating under each model pseudo-samples shows what to expect for each model under comparison. The argument did not really convince Jean-Michel Marin, I am afraid! Another point he raised was that we could instead use a distribution on α with support {0,1}, to avoid the encompassing model he felt was too far from the original models. However, this leads back to the Bayes factor as the weights in 0 and 1 are the marginal likelihoods, nothing more. However, this perspective on the classical approach has at least the appeal of completely validating the use of improper priors on common (nuisance or not) parameters. Pierre Pudlo also wondered why we could not conduct an analysis on the mixture of the likelihoods. Instead of the likelihood of the mixture. My first answer was that there was not enough information in the data for estimating the weight(s). A few more seconds of reflection led me to the further argument that the posterior on α with support (0,1) would then be a mixture of Be(2,1) and Be(1,2) with weights the marginal likelihoods, again (under a uniform prior on α). So indeed not much to gain. A last point we discussed was the case of the evolution trees we analyse with population geneticists from the neighbourhood (and with ABC). Jean-Michel’s argument was that the scenari under comparison were not compatible with a mixture, the models being exclusive. My reply involved an admixture model that contained all scenarios as special cases. After a longer pondering, I think his objection was more about the non iid nature of the data. But the admixture construction remains valid. And makes a very strong case in favour of our approach, I believe.

After the seminar, Christian Lavergne and Jean-Michel had organised a doubly exceptional wine-and-cheese party: first because it is not usually the case there is such a post-seminar party and second because they had chosen a terrific series of wines from the Mas Bruguière (Pic Saint-Loup) vineyards. Ending up with a great 2007 L’Arbouse. Perfect ending for an exciting day. (I am not even mentioning a special Livarot from close to my home-town!)

## Cauchy Distribution: Evil or Angel?

Posted in Books, pictures, Running, Statistics, Travel, University life, Wines with tags Boston, Box-Muller algorithm, Cauchy distribution, champagne, correlation, Harvard University, JSM 2014, Mathias Drton, Monte Carlo Statistical Methods, Mystic river, Natesh Pillai, Sommerville, Xiao-Li Meng on May 19, 2015 by xi'an**N**atesh Pillai and Xiao-Li Meng just arXived a short paper that solves the Cauchy conjecture of Drton and Xiao [I mentioned last year at JSM], namely that, when considering two normal vectors with generic variance matrix S, a weighted average of the ratios X/Y remains Cauchy(0,1), just as in the iid S=I case. Even when the weights are random. The fascinating side of this now resolved (!) conjecture is that the correlation between the terms does not seem to matter. Pushing the correlation to one [assuming it is meaningful, which is a suspension of belief!, since there is no standard correlation for Cauchy variates] leads to a paradox: all terms are equal and yet… it works: we recover a single term, which again is Cauchy(0,1). All that remains thus to prove is that it stays Cauchy(0,1) between those two extremes, a weird kind of intermediary values theorem!

Actually, Natesh and XL further prove an inverse χ² theorem: the inverse of the normal vector, renormalised into a quadratic form is an inverse χ² no matter what its covariance matrix. The proof of this amazing theorem relies on a spherical representation of the bivariate Gaussian (also underlying the Box-Müller algorithm). The angles are then jointly distributed as

and from there follows the argument that conditional on the differences between the θ’s, all ratios are Cauchy distributed. Hence the conclusion!

A question that stems from reading this version of the paper is whether this property extends to other formats of non-independent Cauchy variates. Somewhat connected to my recent post about generating correlated variates from arbitrary distributions: using the inverse cdf transform of a Gaussian copula shows this is possibly the case: the following code is meaningless in that the empirical correlation has no connection with a “true” correlation, but nonetheless the experiment seems of interest…

> ro=.999999;x=matrix(rnorm(2e4),ncol=2);y=ro*x+sqrt(1-ro^2)*matrix(rnorm(2e4),ncol=2) > cor(x[,1]/x[,2],y[,1]/y[,2]) [1] -0.1351967 > ro=.99999999;x=matrix(rnorm(2e4),ncol=2);y=ro*x+sqrt(1-ro^2)*matrix(rnorm(2e4),ncol=2) > cor(x[,1]/x[,2],y[,1]/y[,2]) [1] 0.8622714 > ro=1-1e-5;x=matrix(rnorm(2e4),ncol=2);y=ro*x+sqrt(1-ro^2)*matrix(rnorm(2e4),ncol=2) > z=qcauchy(pnorm(as.vector(x)));w=qcauchy(pnorm(as.vector(y))) > cor(x=z,y=w) [1] 0.9999732 > ks.test((z+w)/2,"pcauchy") One-sample Kolmogorov-Smirnov test data: (z + w)/2 D = 0.0068, p-value = 0.3203 alternative hypothesis: two-sided > ro=1-1e-3;x=matrix(rnorm(2e4),ncol=2);y=ro*x+sqrt(1-ro^2)*matrix(rnorm(2e4),ncol=2) > z=qcauchy(pnorm(as.vector(x)));w=qcauchy(pnorm(as.vector(y))) > cor(x=z,y=w) [1] 0.9920858 > ks.test((z+w)/2,"pcauchy") One-sample Kolmogorov-Smirnov test data: (z + w)/2 D = 0.0036, p-value = 0.9574 alternative hypothesis: two-sided

## the Flatland paradox [reply from the author]

Posted in Books, Statistics, University life with tags Abbot, flat prior, Flatland, Gaussian random walk, improper prior, marginalisation paradoxes, Mervyn Stone on May 15, 2015 by xi'an*[Here is a reply by Pierre Druihlet to my comments on his paper.]*

**T**here are several goals in the paper, the last one being the most important one.

The first one is to insist that considering θ as a parameter is not appropriate. We are in complete agreement on that point, but I prefer considering l(θ) as the parameter rather than N, mainly because it is much simpler. Knowing N, the law of l(θ) is given by the law of a random walk with 0 as reflexive boundary (Jaynes in his book, explores this link). So for a given prior on N, we can derive a prior on l(θ). Since the random process that generate N is completely unknown, except that N is probably large, the true law of l(θ) is completely unknown, so we may consider l(θ).

The second one is to state explicitly that a flat prior on θ implies an exponentially increasing prior on l(θ). As an anecdote, Stone, in 1972, warned against this kind of prior for Gaussian models. Another interesting anecdote is that he cited the novel by Abbot “Flatland : a romance of many dimension” who described a world where the dimension is changed. This is exactly the case in the FP since θ has to be seen in two dimensions rather than in one dimension.

The third one is to make a distinction between randomness of the parameter and prior distribution, each one having its own rule. This point is extensively discussed in Section 2.3.

– In the intuitive reasoning, the probability of no annihilation involves the true joint distribution on (θ, x) and therefore the true unknown distribution of θ,.

– In the Bayesian reasoning, the posterior probability of no annihilation is derived from the prior distribution which is improper. The underlying idea is that a prior distribution does not obey probability rules but belongs to a projective space of measure. This is especially true if the prior does not represent an accurate knowledge. In that case, there is no discontinuity between proper and improper priors and therefore the impropriety of the distribution is not a key point. In that context, the joint and marginal distributions are irrelevant, not because the prior is improper, but because it is a prior and not a true law. If the prior were the true probability law of θ,, then the flat distribution could not be considered as a limit of probability distributions.

For most applications, the distinction between prior and probability law is not necessary and even pedantic, but it may appear essential in some situations. For example, in the Jeffreys-Lindley paradox, we may note that the construction of the prior is not compatible with the projective space structure.