Archive for galaxy formation

limited shelf validity

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , on December 11, 2019 by xi'an

A great article from Steve Stigler in the new, multi-scaled, and so exciting Harvard Data Science Review magisterially operated by Xiao-Li Meng, on the limitations of old datasets. Illustrated by three famous datasets used by three equally famous statisticians, Quetelet, Bortkiewicz, and Gosset. None of whom were fundamentally interested in the data for their own sake. First, Quetelet’s data was (wrongly) reconstructed and missed the opportunity to beat Galton at discovering correlation. Second, Bortkiewicz went looking (or even cherry-picking!) for these rare events in yearly tables of mortality minutely divided between causes such as military horse kicks. The third dataset is not Guinness‘, but a test between two sleeping pills, operated rather crudely over inmates from a psychiatric institution in Kalamazoo, with further mishandling by Gosset himself. Manipulations that turn the data into dead data, as Steve put it. (And illustrates with the above skull collection picture. As well as warning against attempts at resuscitating dead data into what could be called “zombie data”.)

“Successful resurrection is only slightly more common than in Christian theology.”

His global perspective on dead data is that they should stop being used before extending their (shelf) life, rather than turning into benchmarks recycled over and over as a proof of concept. If only (my two cents) because it leads to calibrate (and choose) methods doing well over these benchmarks. Another example that could have been added to the skulls above is the Galaxy Velocity Dataset that makes frequent appearances in works estimating Gaussian mixtures. Which Radford Neal signaled at the 2001 ICMS workshop on mixture estimation as an inappropriate use of the dataset since astrophysical arguments weighted against a mixture modelling.

“…the role of context in shaping data selection and form—context in temporal, political, and social as well as scientific terms—has been shown to be a powerful and interesting phenomenon.”

The potential for “dead-er” data (my neologism!) increases with the epoch in that the careful sleuth work Steve (and others) conducted about these historical datasets is absolutely impossible with the current massive data sets. Massive and proprietary. And presumably discarded once the associated neural net is designed and sold. Letting the burden of unmasking the potential (or highly probable?) biases to others. Most interestingly, this recoups a “comment” in Nature of 17 October by Sabina Leonelli on the transformation of data from a national treasure to a commodity which “ownership can confer and signal power”. But her call for openness and governance of research data seems as illusory as other attempts to sever the GAFAs from their extra-territorial privileges…

JSM 2015 [day #3]

Posted in Books, Statistics, University life with tags , , , , , , , , , , on August 12, 2015 by xi'an

My first morning session was about inference for philogenies. While I was expecting some developments around Kingman’s  coalescent models my coauthors needed and developped ABC for, I was surprised to see models that were producing closed form (or close enough to) likelihoods. Due to strong restrictions on the population sizes and migration possibilities, as explained later to me by Vladimir Minin. No need for ABC there since MCMC was working on the species trees, with Vladimir Minin making use of [the Savage Award winner] Vinayak Rao’s approach on trees that differ from the coalescent. And enough structure to even consider and demonstrate tree identifiability in Laura Kubatko’s case.

I then stopped by the astrostatistics session as the first talk by Gwendolin Eddie was about galaxy mass estimation, a problem I may actually be working on in the Fall, but it ended up being a completely different problem and I was further surprised that the issue of whether or not the data was missing at random was not considered by the authors.searise3

Christening a session Unifying foundation(s) may be calling for trouble, at least from me! In this spirit, Xiao Li Meng gave a talk attempting at a sort of unification of the frequentist, Bayesian, and fiducial paradigms by introducing the notion of personalized inference, which is a notion I had vaguely thought of in the past. How much or how far do you condition upon? However, I have never thought of this justifying fiducial inference in any way and Xiao Li’s lively arguments during and after the session not enough to convince me of the opposite: Prior-free does not translate into (arbitrary) choice-free. In the earlier talk about confidence distributions by Regina Liu and Minge Xie, that I partly missed for Galactic reasons, I just entered into the room at the very time when ABC was briefly described as a confidence distribution because it was not producing a convergent approximation to the exact posterior, a logic that escapes me (unless those confidence distributions are described in such a loose way as to include about any method f inference). Dongchu Sun also gave us a crash course on reference priors, with a notion of random posteriors I had not heard of before… As well as constructive posteriors… (They seemed to mean constructible matching priors as far as I understood.)

The final talk in this session by Chuanhei Liu on a new approach (modestly!) called inferential model was incomprehensible, with the speaker repeatedly stating that the principles were too hard to explain in five minutes and needed an incoming book… I later took a brief look at an associated paper, which relates to fiducial inference and to Dempster’s belief functions. For me, it has the same Münchhausen feeling of creating a probability out of nothing, creating a distribution on the parameter by ignoring the fact that the fiducial equation x=a(θ,u) modifies the distribution of u once x is observed.

big Bayes stories

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , on July 29, 2013 by xi'an

(The following is our preface to the incoming “Big Bayes stories” special issue of Statistical Science, edited by Sharon McGrayne, Kerrie Mengersen and myself.)

Bayesian statistics is now endemic in many areas of scienti c, business and social research. Founded a quarter of a millenium ago, the enabling theory, models and computational tools have expanded exponentially in the past thirty years. So what is it that makes this approach so popular in practice? Now that Bayesian statistics has “grown up”, what has it got to show for it- self? In particular, what real-life problems has it really solved? A number of events motivated us to ask these questions: a conference in honour of Adrian Smith, one of the founders of modern Bayesian Statistics, which showcased a range of research emanating from his seminal work in the field, and the impressive book by Sharon McGrayne, the theory that would not die. At a café in Paris in 2011, we conceived the idea of gathering a similar collection of “Big Bayes stories”, that would demonstrate the appeal of adopting a Bayesian modelling approach in practice. That is, we wanted to collect real cases in which a Bayesian approach had made a significant di fference, either in addressing problems that could not be analysed otherwise, or in generating a new or deeper understanding of the data and the associated real-life problem.

After submitting this proposal to Jon Wellner, editor of Statistical Science, and obtaining his encouragement and support, we made a call for proposals. We received around 30 submissions (for which authors are to be warmly thanked!) and after a regular review process by both Bayesian and non-Bayesian referees (who are also deeply thanked), we ended up with 17 papers that reflected the type of stories we had hoped to hear. Sharon McGrayne, then read each paper with the utmost attention and provided helpful and encouraging comments on all. Sharon became part the editorial team in acknowledgement of this substantial editing contribution, which has made the stories much more enjoyable. In addition, referees who handled several submissions were asked to contribute discussions about the stories and some of them managed to fi nd additional time for this task, providing yet another perspective on the stories..

Bayesian Estimation of Population – Level Trends in Measures of Health Status Mariel M. Finucane, Christopher J. Paciorek, Goodarz Danaei, and Majid Ezzati
Galaxy Formation: Bayesian History Matching for the Observable Universe Ian Vernon, Michael Goldstein, and Richard G Bower
Estimating the Distribution of Dietary Consumption Patterns Raymond James Carroll
Bayesian Population Projections for the United Nations Adrian E. Raftery, Leontine Alkema, and Patrick Gerland
From Science to Management: Using Bayesian Networks to Learn about Lyngbya Sandra Johnson, Eva Abal, Kathleen Ahern, and Grant Hamilton
Search for the Wreckage of Air France Flight AF 447 Lawrence D Stone, Colleen M. Keller, Thomas M Kratzke, and Johan P Strumpfer
Finding the most distant quasars using Bayesian selection methods Daniel Mortlock
Estimation of HIV burden through Bayesian evidence synthesis Daniela De Angelis, Anne M Presanis, Stefano Conti, and A E Ades
Experiences in Bayesian Inference in Baltic Salmon Management Sakari Kuikka, Jarno Vanhatalo, Henni Pulkkinen, Samu Mäntyniemi, and Jukka Corander

As can be gathered from the table of contents, the spectrum of applications ranges across astronomy, epidemiology, ecology and demography, with the special case of the Air France wreckage story also reported in the paper- back edition of the theory that would not die. What made those cases so well suited for a Bayesian solution? In some situations, the prior or the expert opinion was crucial; in others, the complexity of the data model called for a hierarchical decomposition naturally provided in a Bayesian framework; and others involved many actors, perspectives and data sources that only Bayesian networks could aggregate. Now, before or (better) after reading those stories, one may wonder whether or not the “plus” brought by the Bayesian paradigm was truly significant. We think they did, at one level or another of the statistical analysis, while we acknowledge that in several cases other statistical perspectives or even other disciplines could have brought another solution, but presumably at a higher cost.

Now, before or (better) after reading those stories, one may wonder whether or not the \plus” brought by the Bayesian paradigm was truly signifi cant. We think it did, at one level or another of the statistical analysis, while we acknowledge that in several cases other statistical perspectives or even other disciplines could have provided another solution, but presumably at a higher cost. We think this collection of papers constitutes a worthy tribute to the maturity of the Bayesian paradigm, appropriate for commemorating the 250th anniversary of the publication of Bayes’ Essay towards solving a Problem in the Doctrine of Chances. We thus hope you will enjoy those stories, whether or not Bayesiana is your statistical republic.