## free and graphic session at RSS 2018 in Cardiff

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , on July 11, 2018 by xi'an

Reposting an email I received from the Royal Statistical Society, this is to announce a discussion session on three papers on Data visualization in Cardiff City Hall next September 5, as a free part of the RSS annual conference. (But the conference team must be told in advance.)

Paper:             ‘Visualizing spatiotemporal models with virtual reality: from fully immersive environments to applications in stereoscopic view

Authors:         Stefano Castruccio (University of Notre Dame, USA) and Marc G. Genton and Ying Sun (King Abdullah University of Science and Technology, Thuwal)

Paper:             Visualization in Bayesian workflow’

Authors:            Jonah Gabry (Columbia University, New York), Daniel Simpson (University of Toronto), Aki Vehtari (Aalto University, Espoo), Michael Betancourt (Columbia University, New York, and Symplectomorphic, New York) and Andrew Gelman (Columbia University, New York)

Paper:             ‘Graphics for uncertainty’

Authors:         Adrian W. Bowman (University of Glasgow)

PDFs and supplementary files of these papers from StatsLife and the RSS website. As usual, contributions can be sent in writing, with a deadline of September 19.

## bitcoin and cryptography for statistical inference and AI

Posted in Books, Mountains, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , on April 16, 2018 by xi'an

A recent news editorial in Nature (15 March issue) reminded me of the lectures Louis Aslett gave at the Gregynog Statistical Conference last week, on the advanced use of cryptography tools to analyse sensitive and private data. Lectures that reminded me of a graduate course I took on cryptography and coding, in Paris 6, and which led me to visit a lab at the Université de Limoges during my conscripted year in the French Navy. With no research outcome. Now, the notion of using encrypted data towards statistical analysis is fascinating in that it may allow for efficient inference and personal data protection at the same time. As opposed to earlier solutions of anonymisation that introduced noise and data degradation, not always providing sufficient protection of privacy. Encryption that is also the notion at the basis of the Nature editorial. An issue completely missing from the paper, while stressed by Louis, is that this encryption (like Bitcoin) is costly, in order to deter hacking, and hence energy inefficient. Or limiting the amount of data that can be used in such studies, which would turn the idea into a stillborn notion.

## X entropy for optimisation

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , on March 29, 2018 by xi'an

At Gregynog, with mounds of snow still visible in the surrounding hills, not to be confused with the many sheep dotting the fields(!), Owen Jones gave a three hour lecture on simulation for optimisation, which is a less travelled path when compared with simulation for integration. His second lecture covered cross entropy for optimisation purposes. (I had forgotten that Reuven Rubinstein and Dirk Kroese had put forward this aspect of their technique in the very title of their book. As “A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation and Machine Learning”.) The X entropy approaches pushes for simulations restricted to top values of the target function, iterating to find the best parameter in the parametric family used for the simulation. (Best to be understood in the Kullback sense.) Now, this is a wee bit like simulated annealing, where lots of artificial entities have to be calibrated in the algorithm, due to the original problem being unrelated to an specific stochastic framework. X entropy facilitates concentration on the highest values of the target, but requires a family of probability distributions that puts weight on the top region. This may be a damning issue in large dimensions. Owen illustrated the approach in the case of the travelling salesman problem, where the parameterised distribution is a Markov chain on the state space of city sequences. Further, if the optimal value of the target is unknown, avoiding getting stuck in a local optimum may be tricky. (Owen presented a proof of convergence for a temperature going to zero slowly enough that is equivalent to a sure exploration of the entire state space, in a discrete setting, which does not provide a reassurance in this respect, as the corresponding algorithm cannot be implemented.) This method falls into the range of methods that are doubly stochastic in that they rely on Monte Carlo approximations at each iteration of the exploration algorithm.

During a later talk, I tried to recycle one of my earlier R codes on simulated annealing for sudokus, but could not find a useful family of proposal distributions to reach the (unique) solution. Using a mere product of distributions on each of the free positions in the sudoku grid only led me to a penalty of 13 errors…

1    2    8    5    9    7    4    9    3
7    3    5    1    2    4    6    2    8
4    6    9    6    3    8    5    7    1
2    7    5    3    1    6    9    4    8
8    1    4    7    8    9    7    6    2
6    9    3    8    4    2    1    3    5
3    8    6    4    7    5    2    1    9
1    4    2    9    6    3    8    5    7
9    5    7    2    1    8    3    4    6


It is hard to consider a distribution on the space of permutations, 𝔖⁸¹.

## Gregynog Hall ciplun [jatp]

Posted in Mountains, pictures, Running, Travel, University life with tags , , , , on March 25, 2018 by xi'an

## back to Wales [54th Gregynog Statistical Conference]

Posted in Mountains, pictures, Running, Travel, University life with tags , , , , , , , , on March 23, 2018 by xi'an

Today, provided the Air France strike let me fly to Birmingham airport!, I am back at Gregynog Hall, Wales, for the weekend conference organised there every year by some Welsh and English statistics departments, including Warwick. Looking forward to the relaxed gathering in the glorious Welsh countryside (and hoping that my knee will have sufficiently recovered for some trail running around Gregynog Hall…!) Here are the slides of the talk I will present tomorrow:

## the invasion of the stochastic gradients

Posted in Statistics with tags , , , , , , , , , on May 10, 2017 by xi'an

Within the same day, I spotted three submissions to arXiv involving stochastic gradient descent, that I briefly browsed on my trip back from Wales:

1. Stochastic Gradient Descent as Approximate Bayesian inference, by Mandt, Hoffman, and Blei, where this technique is used as a type of variational Bayes method, where the minimum Kullback-Leibler distance to the true posterior can be achieved. Rephrasing the [scalable] MCMC algorithm of Welling and Teh (2011) as such an approximation.
2. Further and stronger analogy between sampling and optimization: Langevin Monte Carlo and gradient descent, by Arnak Dalalyan, which establishes a convergence of the uncorrected Langevin algorithm to the right target distribution in the sense of the Wasserstein distance. (Uncorrected in the sense that there is no Metropolis step, meaning this is a Euler approximation.) With an extension to the noisy version, when the gradient is approximated eg by subsampling. The connection with stochastic gradient descent is thus tenuous, but Arnak explains the somewhat disappointing rate of convergence as being in agreement with optimisation rates.
3. Stein variational adaptive importance sampling, by Jun Han and Qiang Liu, which relates to our population Monte Carlo algorithm, but as a non-parametric version, using RKHS to represent the transforms of the particles at each iteration. The sampling method follows two threads of particles, one that is used to estimate the transform by a stochastic gradient update, and another one that is used for estimation purposes as in a regular population Monte Carlo approach. Deconstructing into those threads allows for conditional independence that makes convergence easier to establish. (A problem we also hit when working on the AMIS algorithm.)

## Why should I be Bayesian when my model is wrong?

Posted in Books, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , on May 9, 2017 by xi'an

Guillaume Dehaene posted the above question on X validated last Friday. Here is an except from it:

However, as everybody knows, assuming that my model is correct is fairly arrogant: why should Nature fall neatly inside the box of the models which I have considered? It is much more realistic to assume that the real model of the data p(x) differs from p(x|θ) for all values of θ. This is usually called a “misspecified” model.

My problem is that, in this more realistic misspecified case, I don’t have any good arguments for being Bayesian (i.e: computing the posterior distribution) versus simply computing the Maximum Likelihood Estimator.

Indeed, according to Kleijn, v.d Vaart (2012), in the misspecified case, the posterior distribution converges as nto a Dirac distribution centred at the MLE but does not have the correct variance (unless two values just happen to be same) in order to ensure that credible intervals of the posterior match confidence intervals for θ.

Which is a very interesting question…that may not have an answer (but that does not make it less interesting!)

A few thoughts about that meme that all models are wrong: (resonating from last week discussion):

1. While the hypothetical model is indeed almost invariably and irremediably wrong, it still makes sense to act in an efficient or coherent manner with respect to this model if this is the best one can do. The resulting inference produces an evaluation of the formal model that is the “closest” to the actual data generating model (if any);
2. There exist Bayesian approaches that can do without the model, a most recent example being the papers by Bissiri et al. (with my comments) and by Watson and Holmes (which I discussed with Judith Rousseau);
3. In a connected way, there exists a whole branch of Bayesian statistics dealing with M-open inference;
4. And yet another direction I like a lot is the SafeBayes approach of Peter Grünwald, who takes into account model misspecification to replace the likelihood with a down-graded version expressed as a power of the original likelihood.
5. The very recent Read Paper by Gelman and Hennig addresses this issue, albeit in a circumvoluted manner (and I added some comments on my blog).
6. In a sense, Bayesians should be the least concerned among statisticians and modellers about this aspect since the sampling model is to be taken as one of several prior assumptions and the outcome is conditional or relative to all those prior assumptions.