**A** recent news editorial in Nature (15 March issue) reminded me of the lectures Louis Aslett gave at the Gregynog Statistical Conference last week, on the advanced use of cryptography tools to analyse sensitive and private data. Lectures that reminded me of a graduate course I took on cryptography and coding, in Paris 6, and which led me to visit a lab at the Université de Limoges during my conscripted year in the French Navy. With no research outcome. Now, the notion of using encrypted data towards statistical analysis is fascinating in that it may allow for efficient inference and personal data protection at the same time. As opposed to earlier solutions of anonymisation that introduced noise and data degradation, not always providing sufficient protection of privacy. Encryption that is also the notion at the basis of the Nature editorial. An issue completely missing from the paper, while stressed by Louis, is that this encryption (like Bitcoin) is costly, in order to deter hacking, and hence energy inefficient. Or limiting the amount of data that can be used in such studies, which would turn the idea into a stillborn notion.

## Archive for Powys

## bitcoin and cryptography for statistical inference and AI

Posted in Books, Mountains, pictures, Running, Statistics, Travel, University life with tags AI, anonymised data, bitcoin, Britain, cryptography, encryption, Gregynog Hall, Gregynog Statistical Conference, information, Navy, Powys, Tregynon, Wales on April 16, 2018 by xi'an## Gregynog Hall ciplun [jatp]

Posted in Mountains, pictures, Running, Travel, University life with tags Cymru, Gregynog Hall, jatp, Powys, Wales on March 25, 2018 by xi'an## Why should I be Bayesian when my model is wrong?

Posted in Books, pictures, Running, Statistics, Travel, University life with tags all models are wrong, Bayesian foundations, cross validated, Cymru, Gregynog, misspecified model, Powys, Spring, statistical inference, Wales on May 9, 2017 by xi'an**G**uillaume Dehaene posted the above question on X validated last Friday. Here is an except from it:

However, as everybody knows, assuming that my model is correct is fairly arrogant: why should Nature fall neatly inside the box of the models which I have considered? It is much more realistic to assume that the real model of the data p(x) differs from p(x|θ) for all values of θ. This is usually called a “misspecified” model.

My problem is that, in this more realistic misspecified case, I don’t have any good arguments for being Bayesian (i.e: computing the posterior distribution) versus simply computing the Maximum Likelihood Estimator.

Indeed, according to Kleijn, v.d Vaart (2012), in the misspecified case, the posterior distribution converges as n→∞ to a Dirac distribution centred at the MLE but does not have the correct variance (unless two values just happen to be same) in order to ensure that credible intervals of the posterior match confidence intervals for θ.

Which is a very interesting question…that may not have an answer (but that does not make it less interesting!)

A few thoughts about that meme that *all models are wrong*: (resonating from last week discussion):

- While the hypothetical model is indeed almost invariably and irremediably
*wrong*, it still makes sense to act in an efficient or coherent manner with respect to this model if this is the best one can do. The resulting inference produces an evaluation of the formal model that is the “closest” to the actual data generating model (if any); - There exist Bayesian approaches that can do
*without the model*, a most recent example being the papers by Bissiri et al. (with my comments) and by Watson and Holmes (which I discussed with Judith Rousseau); - In a connected way, there exists a whole branch of Bayesian statistics dealing with M-open inference;
- And yet another direction I like a lot is the SafeBayes approach of Peter Grünwald, who takes into account model misspecification to replace the likelihood with a down-graded version expressed as a power of the original likelihood.
- The very recent Read Paper by Gelman and Hennig addresses this issue, albeit in a circumvoluted manner (and I added some comments on my blog).
- In a sense, Bayesians should be
*the least concerned*among statisticians and modellers about this aspect since the sampling model is to be taken as one of several prior assumptions and the outcome is conditional or relative to all those prior assumptions.