## the Kouign-Amann experiment

Posted in Kids, pictures, Travel with tags , , , , , , , on June 10, 2019 by xi'an

Having found a recipe for Kouign-Amanns, these excessive cookies from Britanny that are essentially cooked salted butter!, I had a first try that ended up in disaster (including a deep cut on the remaining thumb) and a second try that went better as both food and body parts are concerned. (The name means cake of butter in Breton.)The underlying dough is pretty standard up to the moment it starts being profusedly buttered and layered, again and again, until it becomes sufficiently feuilleté to put in the oven. The buttery nature of the product, clearly visibly on the first picture, implies the cookies must be kept in containers like these muffin pans to preserve its shape and keep the boiling butter from  inundating the oven, two aspects I had not forecasted on the first attempt.The other if minor drawback of these cookies is that they do not keep well as they contain so much butter. Bringing enough calories input for an hearty breakfast (and reminding me of those I ate in Cambridge while last visiting Pierre).

## position at Harvard

Posted in pictures, Running, University life with tags , , , , , , , , on October 27, 2018 by xi'an

This to point out an opening for a tenure track position in statistics and probability at Harvard University, with deadline December 1. More specifically, for a candidate in any field of statistics and probability as well as in any interdisciplinary areas where innovative and principled use of statistics and/or probability is of vital importance

## controlled sequential Monte Carlo [BiPS seminar]

Posted in Statistics with tags , , , , , , , on June 5, 2018 by xi'an

The last BiPS seminar of the semester will be given by Jeremy Heng (Harvard) on Monday 11 June at 2pm, in room 3001, ENSAE, Paris-Saclay about his Controlled sequential Monte Carlo paper:

Sequential Monte Carlo methods, also known as particle methods, are a popular set of techniques to approximate high-dimensional probability distributions and their normalizing constants. They have found numerous applications in statistics and related fields as they can be applied to perform state estimation for non-linear non-Gaussian state space models and Bayesian inference for complex static models. Like many Monte Carlo sampling schemes, they rely on proposal distributions which have a crucial impact on their performance. We introduce here a class of controlled sequential Monte Carlo algorithms, where the proposal distributions are determined by approximating the solution to an associated optimal control problem using an iterative scheme. We provide theoretical analysis of our proposed methodology and demonstrate significant gains over state-of-the-art methods at a fixed computational complexity on a variety of applications.

## the Hyvärinen score is back

Posted in pictures, Statistics, Travel with tags , , , , , , , , , , , , , on November 21, 2017 by xi'an

Stéphane Shao, Pierre Jacob and co-authors from Harvard have just posted on arXiv a new paper on Bayesian model comparison using the Hyvärinen score

$\mathcal{H}(y, p) = 2\Delta_y \log p(y) + ||\nabla_y \log p(y)||^2$

which thus uses the Laplacian as a natural and normalisation-free penalisation for the score test. (Score that I first met in Padova, a few weeks before moving from X to IX.) Which brings a decision-theoretic alternative to the Bayes factor and which delivers a coherent answer when using improper priors. Thus a very appealing proposal in my (biased) opinion! The paper is mostly computational in that it proposes SMC and SMC² solutions to handle the estimation of the Hyvärinen score for models with tractable likelihoods and tractable completed likelihoods, respectively. (Reminding me that Pierre worked on SMC² algorithms quite early during his Ph.D. thesis.)

A most interesting remark in the paper is to recall that the Hyvärinen score associated with a generic model on a series must be the prequential (predictive) version

$\mathcal{H}_T (M) = \sum_{t=1}^T \mathcal{H}(y_t; p_M(dy_t|y_{1:(t-1)}))$

rather than the version on the joint marginal density of the whole series. (Followed by a remark within the remark that the logarithm scoring rule does not make for this distinction. And I had to write down the cascading representation

$\log p(y_{1:T})=\sum_{t=1}^T \log p(y_t|y_{1:t-1})$

to convince myself that this unnatural decomposition, where the posterior on θ varies on each terms, is true!) For consistency reasons.

This prequential decomposition is however a plus in terms of computation when resorting to sequential Monte Carlo. Since each time step produces an evaluation of the associated marginal. In the case of state space models, another decomposition of the authors, based on measurement densities and partial conditional expectations of the latent states allows for another (SMC²) approximation. The paper also establishes that for non-nested models, the Hyvärinen score as a model selection tool asymptotically selects the closest model to the data generating process. For the divergence induced by the score. Even for state-space models, under some technical assumptions.  From this asymptotic perspective, the paper exhibits an example where the Bayes factor and the Hyvärinen factor disagree, even asymptotically in the number of observations, about which mis-specified model to select. And last but not least the authors propose and assess a discrete alternative relying on finite differences instead of derivatives. Which remains a proper scoring rule.

I am quite excited by this work (call me biased!) and I hope it can induce following works as a viable alternative to Bayes factors, if only for being more robust to the [unspecified] impact of the prior tails. As in the above picture where some realisations of the SMC² output and of the sequential decision process see the wrong model being almost acceptable for quite a long while…

## positions in North-East America

Posted in Kids, pictures, Statistics, Travel, University life with tags , , , , , , , , on September 14, 2017 by xi'an

• Professor in Statistics, Biostatistics or Data Science at U de M, deadline October 30th, 2017, a requirement being proficiency in the French language;
• Tenure-Track Professorship in Statistics at Harvard University, Department of Statistics, details there.

## fiducial on a string

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , on June 26, 2017 by xi'an

A very short note in arXiv today by Gunnar Taraldsen and Bo Henry Lindqvist (NTU, Norway). With the above title. I find the note close to unreadable, I must say, as the notations are not all or well- defined. The problem starts from Teddy Seidenfeld [whom I met in Harvard around Dutch book arguments] arguing about the lack of unicity of fiducial distributions in a relatively simple setting. Actually the note is also inspired from Bayes, Fiducial and Frequentist, and comments from Teddy, a talk I apparently missed by taking a flight back home too early!

What I find surprising in this note is that the “fiducial on a string” is a conditional distribution on the parameter space restricted to a  curve, derived from the original fiducial distribution by a conditioning argument. Except that since the conditioning is on a set of measure zero, this conditional is not only not-unique, but it is completely undefined and arbitrary, since changing it does not modify the properties of the joint distribution.

## ACDC versus ABC

Posted in Books, Kids, pictures, Statistics, Travel with tags , , , , , on June 12, 2017 by xi'an

At the Bayes, Fiducial and Frequentist workshop last month, I discussed with the authors of this newly arXived paper, Approximate confidence distribution computing, Suzanne Thornton and Min-ge Xie. Which they abbreviate as ACC and not as ACDC. While I have discussed the notion of confidence distribution in some earlier posts, this paper aims at producing proper frequentist coverage within a likelihood-free setting. Given the proximity with our recent paper on the asymptotics of ABC, as well as with Li and Fearnhead (2016) parallel endeavour, it is difficult (for me) to spot the actual distinction between ACC and ABC given that we also achieve (asymptotically) proper coverage when the limiting ABC distribution is Gaussian, which is the case for a tolerance decreasing quickly enough to zero (in the sample size).

“Inference from the ABC posterior will always be difficult to justify within a Bayesian framework.”

Indeed the ACC setting is eerily similar to ABC apart from the potential of the generating distribution to be data dependent. (Which is fine when considering that the confidence distributions have no Bayesian motivation but are a tool to ensure proper frequentist coverage.) That it is “able to offer theoretical support for ABC” (p.5) is unclear to me, given both this data dependence and the constraints it imposes on the [sampling and algorithmic] setting. Similarly, I do not understand how the authors “are not committing the error of doubly using the data” (p.5) and why they should be concerned about it, standing outside the Bayesian framework. If the prior involves the data as in the Cauchy location example, it literally uses the data [once], followed by an ABC comparison between simulated and actual data, that uses the data [a second time].

“Rather than engaging in a pursuit to define a moving target such as [a range of posterior distributions], ACC maintains a consistently clear frequentist interpretation (…) and thereby offers a consistently cohesive interpretation of likelihood-free methods.”

The frequentist coverage guarantee comes from a bootstrap-like assumption that [with tolerance equal to zero] the distribution of the ABC/ACC/ACDC random parameter around an estimate of the parameter given the summary statistic is identical to the [frequentist] distribution of this estimate around the true parameter [given the true parameter, although this conditioning makes no sense outside a Bayesian framework]. (There must be a typo in the paper when the authors define [p.10] the estimator as minimising the derivative of the density of the summary statistic, while still calling it an MLE.) That this bootstrap-like assumption holds is established (in Theorem 1) under a CLT on this MLE and assumptions on the data-dependent proposal that connect it to the density of the summary statistic. Connection that seem to imply a data-dependence as well as a certain knowledge about this density. What I find most surprising in this derivation is the total absence of conditions or even discussion on the tolerance level which, as we have shown, is paramount to the validation or invalidation of ABC inference. It sounds like the authors of Approximate confidence distribution computing are setting ε equal to zero for those theoretical derivations. While in practice they apply rules [for choosing ε] they do not voice out, but which result in very different acceptance rates for the ACC version they oppose to an ABC version. (In all illustrations, it seems that ε=0.1, which does not make much sense.) All in all, I am thus rather skeptical about the practical implications of the paper in that it seems to achieve confidence guarantees by first assuming proper if implicit choices of summary statistics and parameter generating distribution.