[Here is an editorial (my take at a Google translation) from Le Monde about the installment last week of a fixed fine of €200 for drug possession. Introduced in 2018 by the French Parliament, it is presented by the French government as a way to fight drug-trafficking (and its far reaching consequences in the (de)structuration of some suburbs) by turning consumers into *de facto* accomplices. Which I find counterproductive and irrational as prohibition never works and ultimately benefits criminals. Drug legalisation or at least drug decriminalisation, adopted in many other countries, would be much more beneficial. Disclaimer #1: I am not supporting the use of drugs, except tea of course. Disclaimer #2: I do not agree with the entirety of the editorial below.]

## Archive for penalisation

## Drogue : sortir du tout-répressif [reposted]

Posted in Books, Kids, Travel, Wines with tags cannabis, depenalisation, drug dealers, drug users, editorial, French police, French politics, Google translation, Le Monde, legalisation, Libé, Liberation, penalisation, tea dealer, tribune on September 13, 2020 by xi'an## the Hyvärinen score is back

Posted in pictures, Statistics, Travel with tags Bayes factor, Bayesian model comparison, Bayesian model selection, consistency, Harvard University, Hyvärinen score, Lévy diffusion process, logarithmic score, Padova, penalisation, prior predictive, sequential Monte Carlo, SMC, SMC² on November 21, 2017 by xi'an**S**téphane Shao, Pierre Jacob and co-authors from Harvard have just posted on arXiv a new paper on Bayesian model comparison using the Hyvärinen score

which thus uses the Laplacian as a natural and normalisation-free penalisation for the score test. (Score that I first met in Padova, a few weeks before moving from X to IX.) Which brings a decision-theoretic alternative to the Bayes factor and which delivers a coherent answer when using improper priors. Thus a very appealing proposal in my (biased) opinion! The paper is mostly computational in that it proposes SMC and SMC² solutions to handle the estimation of the Hyvärinen score for models with tractable likelihoods and tractable completed likelihoods, respectively. (Reminding me that Pierre worked on SMC² algorithms quite early during his Ph.D. thesis.)

A most interesting remark in the paper is to recall that the Hyvärinen score associated with a generic model on a series must be the prequential (predictive) version

rather than the version on the joint marginal density of the whole series. (Followed by a remark within the remark that the logarithm scoring rule does not make for this distinction. And I had to write down the cascading representation

to convince myself that this unnatural decomposition, where the posterior on θ varies on each terms, is true!) For consistency reasons.

This prequential decomposition is however a plus in terms of computation when resorting to sequential Monte Carlo. Since each time step produces an evaluation of the associated marginal. In the case of state space models, another decomposition of the authors, based on measurement densities and partial conditional expectations of the latent states allows for another (SMC²) approximation. The paper also establishes that for non-nested models, the Hyvärinen score as a model selection tool asymptotically selects the closest model to the data generating process. For the divergence induced by the score. Even for state-space models, under some technical assumptions. From this asymptotic perspective, the paper exhibits an example where the Bayes factor and the Hyvärinen factor disagree, even asymptotically in the number of observations, about which mis-specified model to select. And last but not least the authors propose and assess a discrete alternative relying on finite differences instead of derivatives. Which remains a proper scoring rule.

I am quite excited by this work (call me biased!) and I hope it can induce following works as a viable alternative to Bayes factors, if only for being more robust to the [unspecified] impact of the prior tails. As in the above picture where some realisations of the SMC² output and of the sequential decision process see the wrong model being almost acceptable for quite a long while…

## a day for comments

Posted in Mountains, Statistics, Travel, University life with tags AISTATS 2014, Bayesian variable selection, Brad Carlin, Cuillin ridge, Gaussian mixture, Gibbs sampler, hierarchical models, Iceland, ICML, Langevin MCMC algorithm, MCMC, Metropolis-Hastings algorithms, mixtures, model complexity, penalisation, reference priors, Reykjavik, RJMCMC, Russian doll, Scotland, sequential Monte Carlo, Sid Chib, Skye, speedup, spike-and-slab prior, variable dimension models on April 21, 2014 by xi'an**A**s I was flying over Skye (with [maybe] a first if hazy perspective on the Cuillin ridge!) to Iceland, three long sets of replies to some of my posts appeared on the ‘Og:

- Dan Simpson replied to my comments of last Tuesday about his PC construction;
- Arnaud Doucet precised some issues about his adaptive subsampling paper;
- Amandine Schreck clarified why I had missed some points in her Bayesian variable selection paper;
- Randal Douc defended the efficiency of using Carlin and Chib (1995) method for mixture simulation.

Thanks to them for taking the time to answer my musings…

## Dan Simpson’s seminar at CREST

Posted in Kids, Mountains, Statistics, Travel, University life with tags Banff, BiPS, CREST, hierarchical models, model complexity, Paris, penalisation, reference priors, Russian doll, Russian roulette on April 18, 2014 by xi'an

**D**aniel Simpson gave a seminar at CREST yesterday on his recently arXived paper, “Penalising model component complexity: A principled, practical approach to constructing priors” written with Thiago Martins, Andrea Riebler, Håvard Rue, and Sigrunn Sørbye. Paper that he should also have given in Banff last month had he not lost his passport in København airport… I have already commented at length on this exciting paper, hopefully to become a discussion paper in a top journal!, so I am just pointing out two things that came to my mind during the energetic talk delivered by Dan to our group. The first thing is that those penalised complexity (PC) priors of theirs rely on some choices in the ordering of the relevance, complexity, nuisance level, &tc. of the parameters, just like reference priors. While Dan already wrote a paper on Russian roulette, there is also a Russian doll principle at work behind (or within) PC priors. Each shell of the Russian doll corresponds to a further level of complexity whose order need be decided by the modeller… Not very realistic in a hierarchical model with several types of parameters having only local meaning.

**M**y second point is that the construction of those “politically correct” (PC) priors reflects another Russian doll structure, namely one of embedded models, hence would and should lead to a natural multiple testing methodology. Except that Dan rejected this notion during his talk, by being opposed to testing *per se*. (A good topic for one of my summer projects, if nothing more, then!)

## penalising model component complexity

Posted in Books, Mountains, pictures, Statistics, University life with tags Banff, default prior, Fisher information, ISBA, Jeffreys priors, Kullback-Leibler divergence, model complexity, noninformative priors, O'Bayes, penalisation, Riemann manifold on April 1, 2014 by xi'an*“Prior selection is the fundamental issue in Bayesian statistics. Priors are the Bayesian’s greatest tool, but they are also the greatest point for criticism: the arbitrariness of prior selection procedures and the lack of realistic sensitivity analysis (…) are a serious argument against current Bayesian practice.” (p.23)*

**A** paper that I first read and annotated in the very early hours of the morning in Banff, when temperatures were down in the mid minus 20’s now appeared on arXiv, “Penalising model component complexity: A principled, practical approach to constructing priors” by Thiago Martins, Dan Simpson, Andrea Riebler, Håvard Rue, and Sigrunn Sørbye. It is a highly timely and pertinent paper on the selection of default priors! Which shows that the field of “objective” Bayes is still full of open problems and significant advances and makes a great argument for the future president [that I am] of the O’Bayes section of ISBA to encourage young Bayesian researchers to consider this branch of the field.

“On the other end of the hunt for the holy grail, “objective” priors are data-dependent and are not uniformly accepted among Bayesians on philosophical grounds.” (p.2)

**A**part from the above quote, as objective priors are *not* data-dependent! (this is presumably a typo, used instead of *model-dependent*), I like very much the introduction (appreciating the reference to the very recent Kamary (2014) that just got rejected by TAS for quoting my blog post way too much… and that we jointly resubmitted to Statistics and Computing). Maybe missing the alternative solution of going hierarchical as far as needed and ending up with default priors [at the top of the ladder]. And not discussing the difficulty in specifying the sensitivity of weakly informative priors.

“Most model components can be naturally regarded as a flexible version of a base model.” (p.3)

**T**he starting point for the modelling is the *base model*. How easy is it to define this base model? Does it [always?] translate into a null hypothesis formulation? Is there an automated derivation? I assume this somewhat follows from the “block” idea that I do like but how generic is model construction by blocks?

“Occam’s razor is the principle of parsimony, for which simpler model formulations should be preferred until there is enough support for a more complex model.” (p.4)

**I** also like this idea of putting a prior on the distance from the base! Even more because it is parameterisation invariant (at least at the hyperparameter level). (This vaguely reminded me of a paper we wrote with George a while ago replacing tests with distance evaluations.) And because it gives a definitive meaning to Occam’s razor. However, unless the hyperparameter ξ is one-dimensional this does not define a prior on ξ per se. I equally like Eqn (2) as it shows how the base constraint takes one away from Jeffrey’s prior. Plus, if one takes the Kullback as an intrinsic loss function, this also sounds related to Holmes’s and Walker’s substitute loss pseudopriors, no? Now, eqn (2) does not sound right in the general case. Unless one implicitly takes a uniform prior on the Kullback sphere of radius d? There is a feeling of one-d-ness in the description of the paper (at least till page 6) and I wanted to see how it extends to models with many (≥2) hyperparameters. Until I reached Section 6 where the authors state exactly that! There is also a potential difficulty in that d(ξ) cannot be computed in a general setting. (Assuming that d(ξ) has a non-vanishing Jacobian as on page 19 sounds rather unrealistic.) Still about Section 6, handling reference priors on correlation matrices is a major endeavour, which should produce a steady flow of followers..!

“The current practice of prior specification is, to be honest, not in a good shape. While there has been a strong growth of Bayesian analysis in science, the research field of “practical prior specification” has been left behind.” (*p.23)

**T**here are still quantities to specify and calibrate in the PC priors, which may actually be deemed a good thing by Bayesians (and some modellers). But overall I think this paper and its message constitute a terrific step for Bayesian statistics and I hope the paper can make it to a major journal.