## sudoku break

Posted in pictures, R, Statistics with tags , , , , on December 13, 2013 by xi'an

While in Warwick last week, one evening after having exhausted my laptop battery, I tried the following Sudoku (from Libération):

+-------+-------+-------+
| 4   6 |   2   | 3   9 |
|   3   |       |   2   |
| 7   2 |       | 5   6 |
+-------+-------+-------+
|       | 9 4 5 |       |
| 5     | 7 6 2 |     1 |
|       | 3 1 8 |       |
+-------+-------+-------+
| 6   9 |       | 1   3 |
|   7   |       |   9   |
| 3   1 |   9   | 4   7 |
+-------+-------+-------+

and could not even start. As it happened, this was a setting with no deterministic move, i.e. all free/empty entries had multiple possible values. So after trying for a while and following trees to no obvious contradiction (!) I decided to give up and on the next day (with power) to call my “old” sudoku solver (built while at SAMSI), using simulated annealing and got the result after a few thousand iterations. The detail of the exploration is represented above, the two colours being code for two different moves on the Sudoku table. Leading to the solution

+-------+-------+-------+
| 4 8 6 | 5 2 1 | 3 7 9 |
| 1 3 5 | 6 7 9 | 8 2 4 |
| 7 9 2 | 8 3 4 | 5 1 6 |
+-------+-------+-------+
| 2 1 3 | 9 4 5 | 7 6 8 |
| 5 4 8 | 7 6 2 | 9 3 1 |
| 9 6 7 | 3 1 8 | 2 4 5 |
+-------+-------+-------+
| 6 2 9 | 4 8 7 | 1 5 3 |
| 8 7 4 | 1 5 3 | 6 9 2 |
| 3 5 1 | 2 9 6 | 4 8 7 |
+-------+-------+-------+

I then tried a variant with more proposals (hence more colours) at each iteration, which ended up being stuck at a penalty of 4 (instead of 0) in the final thousand iterations. Although this is a one occurrence experiment, I find it interesting that having move proposals may get the algorithm stuck faster in a local minimum. Nothing very deep there, of course..!

## comment j’ai détesté les maths [teaser]

Posted in Kids, Statistics, University life with tags , , , on December 12, 2013 by xi'an

A (French) documentary film about maths just came out on French screens this week, here is the preview/teaser (with English translation or subtitles):

I have not seen {comment j’ai détesté les maths} (and do not plan to!) as this movie/documentary seems to centre on a few exotic characters like Cédric Villani and to blame the subprime crisis on the mathematical modelling used in constructing complex financial products, so cannot see how this could improve the vision outsiders have of mathematics. Rather than of  mathematicians. And I have always hated the joke on the film poster (“Find X. Here it is!”), joke that adorns too many office doors in maths departments all over the World…

## ABC with composite score functions

Posted in Books, pictures, Statistics, University life with tags , , , , , , , on December 12, 2013 by xi'an

My friends Erlis Ruli, Nicola Sartori and Laura Ventura from Università degli Studi de Padova have just arXived a new paper entitled Approximate Bayesian Computation with composite score functions. While the paper provides a survey of composite likelihood methods, the core idea of the paper is to use the score function (of the composite likelihood) as the summary statistic,

$\dfrac{\partial\,c\ell(\theta;y)}{\partial\,\theta},$

when evaluated at the maximum composite likelihood at the observed data point. In the specific (but unrealistic) case of an exponential family, an ABC based on the score is asymptotically (i.e., as the tolerance ε goes to zero) exact. The choice of the composite likelihood thus induces a natural summary statistics and, as in our empirical likelihood paper, where we also use the score of a composite likelihood, the composite likelihoods that are available for computation are usually quite a few, thus leading to an automated choice of a summary statistic..

An interesting (common) feature in most examples found in this paper is that comparisons are made between ABC using the (truly) sufficient statistic and ABC based on the pairwise score function, which essentially relies on the very same statistics. So the difference, when there is a difference, pertains to the choice of a different combination of the summary statistics or, somehow equivalently to the choice of a different distance function. One of the examples starts from our MA(2) toy-example in the 2012 survey in Statistics and Computing. The composite likelihood is then based on the consecutive triplet marginal densities. As shown by the picture below, the composite version improves to some extent upon the original ABC solution using three autocorrelations.

A suggestion I would have about a refinement of the proposed method deals with the distance utilised in the paper, namely the sum of the absolute differences between the statistics. Indeed, this sum is not scaled at all, neither for regular ABC nor for composite ABC, while the composite likelihood perspective provides in addition to the score a natural metric through the matrix A(θ) [defined on page 12]. So I would suggest comparing the performances of the methods using instead this rescaling since, in my opinion and in contrast with a remark on page 13, it is relevant in some (many?) settings where the amount of information brought by the composite model widely varies from one parameter to the next.

## beyond Kingman’s coalescent

Posted in Statistics on December 11, 2013 by xi'an

Three of my colleagues at Warwick, Jere Koskela, Paul Jenkins, and Dario Spanò, just arXived a paper entitled computational inference beyond Kingman’s coalescent. The paper is rather technical (for me) but the essence is in extending Kingman’s coalescent, used to model population genetic evolutions from a common ancestor. And in proposing importance samplers that apply to those extensions and that compare (favourably) with the reference importance sampler of Stephens and Donnelly (2000, JRSS Series B, Read Paper). The processes under study are called Λ-coalescent (“which allows for multiple mergers but only permits one merger at a time”) and Ξ-coalescent (“which permits any number of simultaneous, multiple mergers”). As in Stephens and Donnelly (2000), the authors derive optimal conditional (importance) sampling distributions. Which are approximated to achieve manageable proposals. The importance sampler performs better than Stephen’s and Donnelly’s (2000) when the model is not a Kingman’s coalescent. There is also a comparison with an alternative approach based on products of approximate conditionals (PAC), which approximate rather well the MLEs if not the likelihood functions and hence can be used as calibration tools. I obviously wonder what a comparison with ABC (and the use of PAC proposals in an empirical likelihood version) would produce in this case.

Besides the appeal in studying new importance samplers in this setting, an additional feature of the paper is that Jere Koskela worked on this project as part of his MASDOC (MSc) training at Warwick, which demonstrates the excellency of this elite Master (in math and stats) programme.

## invariant conjugate analysis for exponential families

Posted in Books, Statistics, University life with tags , , on December 10, 2013 by xi'an

Here is a paper from Bayesian Analysis that I somehow missed and only become aware thanks to a (more) recent paper of the first author: in 2012, Pierre Druilhet and Denis Pommeret published invariant conjugate analysis for exponential families. The authors define a new class of conjugate families, called Jeffreys’ conjugate priors (JCP) by using Jeffreys’ prior as the reference density (rather than the uniform in regular conjugate families). Following from the earlier proposal of Druilhet and Marin (2007, BA). Both families of course coincide in the case of quadratic variance exponential families. The motivation for using those new conjugate priors is that the family is invariant by a change of parametrisation. And to include Jeffreys’ prior as a special case of conjugate prior. In the special case of the inverse Gaussian distribution, this approach leads to the conjugacy of the inverse normal distribution, a feature I noticed in 1991 when working on an astronomy project. There are two obvious drawbacks to those new conjugate families: one is that the priors are not longer always proper. The other one is that the computations associated with those new priors are more involved, which may explain why the authors propose the MAP as their default estimator. Since posterior expectations of the mean (in the natural representation [in x] of the exponential family) are no longer linear in x.