Jonathan Harrison and Ruth Baker (Oxford University) arXived this morning a paper on the optimal combination of summaries for ABC in the sense of deriving the proper weights in an Euclidean distance involving all the available summaries. The idea is to find the weights that lead to the maximal distance between prior and posterior, in a way reminiscent of Bernardo’s (1979) maximal information principle. Plus a sparsity penalty à la Lasso. The associated algorithm is sequential in that the weights are updated at each iteration. The paper does not get into theoretical justifications but considers instead several examples with limited numbers of both parameters and summary statistics. Which may highlight the limitations of the approach in that handling (and eliminating) a large number of parameters may prove impossible this way, when compared with optimisation methods like random forests. Or summary-free distances between empirical distributions like the Wasserstein distance.
Archive for University of Oxford
As I was teaching my introduction to Bayesian Statistics this morning, ending up with the chapter on tests of hypotheses, I found reflecting [out loud] on the relative nature of posterior quantities. Just like when I introduced the role of priors in Bayesian analysis the day before, I stressed the relativity of quantities coming out of the BBB [Big Bayesian Black Box], namely that whatever happens as a Bayesian procedure is to be understood, scaled, and relativised against the prior equivalent, i.e., that the reference measure or gauge is the prior. This is sort of obvious, clearly, but bringing the argument forward from the start avoids all sorts of misunderstanding and disagreement, in that it excludes the claims of absolute and certainty that may come with the production of a posterior distribution. It also removes the endless debate about the determination of the prior, by making each prior a reference on its own. With an additional possibility of calibration by simulation under the assumed model. Or an alternative. Again nothing new there, but I got rather excited by this presentation choice, as it seems to clarify the path to Bayesian modelling and avoid misapprehensions.
Further, the curious case of the Bayes factor (or of the posterior probability) could possibly be resolved most satisfactorily in this framework, as the [dreaded] dependence on the model prior probabilities then becomes a matter of relativity! Those posterior probabilities depend directly and almost linearly on the prior probabilities, but they should not be interpreted in an absolute sense as the ultimate and unique probability of the hypothesis (which anyway does not mean anything in terms of the observed experiment). In other words, this posterior probability does not need to be scaled against a U(0,1) distribution. Or against the p-value if anyone wishes to do so. By the end of the lecture, I was even wondering [not so loudly] whether or not this perspective was allowing for a resolution of the Lindley-Jeffreys paradox, as the resulting number could be set relative to the choice of the [arbitrary] normalising constant. Continue reading
As in the previous years, I am back in Oxford (England) for my short Bayesian Statistics course in the joint Oxford-Warwick PhD programme, OxWaSP. For some unclear reason, presumably related to the Internet connection from Oxford, I have not been able to upload my slides to Slideshare, so here the [99.9% identical] older version:
Lawrence Murray, Sumeet Singh, Pierre Jacob, and Anthony Lee (Warwick) recently arXived a paper on Anytime Monte Carlo. (The earlier post on this topic is no coincidence, as Lawrence had told me about this problem when he visited Paris last Spring. Including a forced extension when his passport got stolen.) The difficulty with anytime algorithms for MCMC is the lack of exchangeability of the MCMC sequence (except for formal settings where regeneration can be used).
When accounting for duration of computation between steps of an MCMC generation, the Markov chain turns into a Markov jump process, whose stationary distribution α is biased by the average delivery time. Unless it is constant. The authors manage this difficulty by interlocking the original chain with a secondary chain so that even- and odd-index chains are independent. The secondary chain is then discarded. This provides a way to run an anytime MCMC. The principle can be extended to K+1 chains, run one after the other, since only one of those chains need be discarded. It also applies to SMC and SMC². The appeal of anytime simulation in this particle setting is that resampling is no longer a bottleneck. Hence easily distributed among processors. One aspect I do not fully understand is how the computing budget is handled, since allocating the same real time to each iteration of SMC seems to envision each target in the sequence as requiring the same amount of time. (An interesting side remark made in this paper is the lack of exchangeability resulting from elaborate resampling mechanisms, lack I had not thought of before.)
[Here is a call for a two-year postdoc in Oxford sent to me by Arnaud Doucet. For those worried about moving to Britain, I think that, given the current pace—or lack thereof—of the negotiations with the EU, it is very likely that Britain will not have Brexited two years from now.]
Numerous medical problems ranging from screening to diagnosis to treatment of chronic diseases to management of care in hospitals requires the development of novel statistical models and methods. These models and methods need to address the unique characteristics of medical data such as sampling bias, heterogeneity, non-stationarity, informative censoring etc. Existing state-of-the-art machine learning and statistics techniques often fail to exploit those characteristics. Additionally, the focus needs to be on probabilistic models which are
interpretable by the clinicians so that the inference results can be integrated within the medical-decision making.
We have access to unique datasets for clinical deterioration of patients in the hospital, for cancer screening, and for treatment of chronic diseases. Preliminary work has been tested and implemented at UCLA Medical Center, resulting in significantly management care in this hospital.
The successful applicant will be expected to develop new probabilistic models and learning methods inspired by these applications. The focus will be primarily on methodological and theoretical developments, and involve collaborating with Oxford researchers in machine learning, computational statistics and medicine to bring these developments to practice.
The post-doctoral researcher will be jointly supervised by Prof. Mihaela van der Schaar and Prof. Arnaud Doucet. Both of them have a strong track-record in advising PhD students and post-doctoral researchers who subsequently became successful academics in statistics, engineering sciences, computer science and economics. The position is for 2 years.
Seth Flaxman (Oxford), Dougal J. Sutherland (UCL), Yu-Xiang Wang (CMU), and Yee Whye Teh (Oxford), published on arXiv this morning an analysis of the US election, in what they called most appropriately a post-mortem. Using ecological inference already employed after Obama’s re-election. And producing graphs like the following one: