Archive for Royal Statistical Society

a statistic with consequences

Posted in pictures, Statistics with tags , , , , , , , on July 18, 2019 by xi'an

In the latest Significance, there was a flyer with some members updates, an important one being that Sylvia Richardson had been elected the next president of the Royal Statistical Society. Congratulations to my friend Sylvia! Another item was that the publication of the 2018 RSS Statistic of the Year has led an Australian water company to switch from plastic to aluminum. Hmm, what about switching to nothing and supporting a use-your-own bottle approach? While it is correct that aluminum cans can be 100% made of recycled aluminum, this water company does not seem to appear to make any concerted effort to ensure its can are made of recycled aluminum or to increase the recycling rate for aluminum in Australia towards achieving those of Brazil (92%) or Japan (86%). (Another shocking statistic that could have been added to the 90.5% non-recycled plastic waste [in the World?] is that a water bottle consumes the equivalent of one-fourth of its contents in oil to produce.) Another US water company still promotes water bottles as one of the most effective and inert carbon capture & sequestration methods”..! There is no boundary for green-washing.

O’Bayes 19/2

Posted in Books, pictures, Running, Travel, University life with tags , , , , , , , , , , , , , , , , , on July 1, 2019 by xi'an

One talk on Day 2 of O’Bayes 2019 was by Ryan Martin on data dependent priors (or “priors”). Which I have already discussed in this blog. Including the notion of a Gibbs posterior about quantities that “are not always defined through a model” [which is debatable if one sees it like part of a semi-parametric model]. Gibbs posterior that is built through a pseudo-likelihood constructed from the empirical risk, which reminds me of Bissiri, Holmes and Walker. Although requiring a prior on this quantity that is  not part of a model. And is not necessarily a true posterior and not necessarily with the same concentration rate as a true posterior. Constructing a data-dependent distribution on the parameter does not necessarily mean an interesting inference and to keep up with the theme of the conference has no automated claim to [more] “objectivity”.

And after calling a prior both Beauty and The Beast!, Erlis Ruli argued about a “bias-reduction” prior where the prior is solution to a differential equation related with some cumulants, connected with an earlier work of David Firth (Warwick).  An interesting conundrum is how to create an MCMC algorithm when the prior is that intractable, with a possible help from PDMP techniques like the Zig-Zag sampler.

While Peter Orbanz’ talk was centred on a central limit theorem under group invariance, further penalised by being the last of the (sun) day, Peter did a magnificent job of presenting the result and motivating each term. It reminded me of the work Jim Bondar was doing in Ottawa in the 1980’s on Haar measures for Bayesian inference. Including the notion of amenability [a term due to von Neumann] I had not met since then. (Neither have I met Jim since the last summer I spent in Carleton.) The CLT and associated LLN are remarkable in that the average is not over observations but over shifts of the same observation under elements of a sub-group of transformations. I wondered as well at the potential connection with the Read Paper of Kong et al. in 2003 on the use of group averaging for Monte Carlo integration [connection apart from the fact that both discussants, Michael Evans and myself, are present at this conference].

RSS tribute

Posted in Statistics, University life with tags , , , , , , on November 4, 2018 by xi'an

visual effects

Posted in Books, pictures, Statistics with tags , , , , , , , , , , , on November 2, 2018 by xi'an

As advertised and re-discussed by Dan Simpson on the Statistical Modeling, &tc. blog he shares with Andrew and a few others, the paper Visualization in Bayesian workflow he wrote with Jonah Gabry, Aki Vehtari, Michael Betancourt and Andrew Gelman was one of three discussed at the RSS conference in Cardiff, last week month, as a Read Paper for Series A. I had stored the paper when it came out towards reading and discussing it, but as often this good intention led to no concrete ending. [Except concrete as in concrete shoes…] Hence a few notes rather than a discussion in Series B A.

Exploratory data analysis goes beyond just plotting the data, which should sound reasonable to all modeling readers.

Fake data [not fake news!] can be almost [more!] as valuable as real data for building your model, oh yes!, this is the message I am always trying to convey to my first year students, when arguing about the connection between models and simulation, as well as a defense of ABC methods. And more globally of the very idea of statistical modelling. While indeed “Bayesian models with proper priors are generative models”, I am not particularly fan of using the prior predictive [or the evidence] to assess the prior as it may end up in a classification of more or less all but terrible priors, meaning that all give very little weight to neighbourhoods of high likelihood values. Still, in a discussion of a TAS paper by Seaman et al. on the role of prior, Kaniav Kamary and I produced prior assessments that were similar to the comparison illustrated in Figure 4. (And this makes me wondering which point we missed in this discussion, according to Dan.)  Unhappy am I with the weakly informative prior illustration (and concept) as the amount of fudging and calibrating to move from the immensely vague choice of N(0,100) to the fairly tight choice of N(0,1) or N(1,1) is not provided. The paper reads like these priors were the obvious and first choice of the authors. I completely agree with the warning that “the utility of the the prior predictive distribution to evaluate the model does not extend to utility in selecting between models”.

MCMC diagnostics, beyond trace plots, yes again, but this recommendation sounds a wee bit outdated. (As our 1998 reviewww!) Figure 5(b) links different parameters of the model with lines, which does not clearly relate to a better understanding of convergence. Figure 5(a) does not tell much either since the green (divergent) dots stand within the black dots, at least in the projected 2D plot (and how can one reach beyond 2D?) Feels like I need to rtfm..!

“Posterior predictive checks are vital for model evaluation”, to wit that I find Figure 6 much more to my liking and closer to my practice. There could have been a reference to Ratmann et al. for ABC where graphical measures of discrepancy were used in conjunction with ABC output as direct tools for model assessment and comparison. Essentially predicting a zero error with the ABC posterior predictive. And of course “posterior predictive checking makes use of the data twice, once for the fitting and once for the checking.” Which means one should either resort to loo solutions (as mentioned in the paper) or call for calibration of the double-use by re-simulating pseudo-datasets from the posterior predictive. I find the suggestion that “it is a good idea to choose statistics that are orthogonal to the model parameters” somewhat antiquated, in that this sounds like rephrasing the primeval call to ancillary statistics for model assessment (Kiefer, 1975), while pretty hard to implement in modern complex models.

free and graphic session at RSS 2018 in Cardiff

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , on July 11, 2018 by xi'an

Reposting an email I received from the Royal Statistical Society, this is to announce a discussion session on three papers on Data visualization in Cardiff City Hall next September 5, as a free part of the RSS annual conference. (But the conference team must be told in advance.)

Paper:             ‘Visualizing spatiotemporal models with virtual reality: from fully immersive environments to applications in stereoscopic view

Authors:         Stefano Castruccio (University of Notre Dame, USA) and Marc G. Genton and Ying Sun (King Abdullah University of Science and Technology, Thuwal)

 Paper:             Visualization in Bayesian workflow’

Authors:            Jonah Gabry (Columbia University, New York), Daniel Simpson (University of Toronto), Aki Vehtari (Aalto University, Espoo), Michael Betancourt (Columbia University, New York, and Symplectomorphic, New York) and Andrew Gelman (Columbia University, New York)

Paper:             ‘Graphics for uncertainty’

Authors:         Adrian W. Bowman (University of Glasgow)

PDFs and supplementary files of these papers from StatsLife and the RSS website. As usual, contributions can be sent in writing, with a deadline of September 19.

statistics: a data science for the 21st century

Posted in Statistics with tags , , , , , on May 15, 2018 by xi'an


the DeepMind debacle

Posted in Books, Statistics, Travel with tags , , , , , , , , on August 19, 2017 by xi'an

“I hope for a world where data is at the heart of understanding and decision making. To achieve this we need better public dialogue.” Hetan Shah

As I was reading one of the Nature issues I brought on vacations, while the rain was falling on an aborted hiking day on the fringes of Monte Rosa, I came across a 20 July tribune by Hetan Shah, executive director of the RSS. A rare occurrence of a statistician’s perspective in Nature. The event prompting this column is the ruling against the Royal Free London hospital group providing patient data to DeepMind for predicting kidney. Without the patients’ agreement. And with enough information to identify the patients. The issues raised by Hetan Shah are that data transfers should become open, and that they should be commensurate in volume and details to the intended goals. And that public approval should be seeked. While I know nothing about this specific case, I find the article overly critical of DeepMind, which interest in health related problems is certainly not pure and disinterested but nonetheless can contribute advances in (personalised) care and prevention through its expertise in machine learning. (Disclaimer: I have neither connection nor conflict with the company!) And I do not see exactly how public approval or dialogue can help in making progress in handling data, unless I am mistaken in my understanding of “the public”. The article mentions the launch of a UK project on data ethics, involving several [public] institutions like the RSS: this is certainly commandable and may improve personal data is handled by companies, but I would not call this conglomerate representative of the public, which most likely does not really trust these institutions either…