Archive for University of Oxford

annual visit to Oxford

Posted in Kids, pictures, Statistics, Travel, University life with tags , , , , , , , , , on February 1, 2018 by xi'an

As in every year since 2014, I am spending a few days in Oxford to teach a module on Bayesian Statistics to our Oxford-Warwick PhD students. This time I was a wee bit under the weather due to a mild case of food poisoning and I can only hope that my more than sedate delivery did not turn definitely the students away from Bayesian pursuits!

The above picture is at St. Hugh’s College, where I was staying. Or should it be Saint Hughes, since this 12th century bishop was a pre-Brexit European worker from Avalon, France… (This college was created in 1886 for young women of poorer background. And only opened to male students a century later. The 1924 rules posted in one corridor show how these women were considered to be so “dangerous” by the institution that they had to be kept segregated from men, except their brothers!, at all times…)

a paradox about likelihood ratios?

Posted in Books, pictures, Statistics, University life with tags , , , , , , , on January 15, 2018 by xi'an

Aware of my fascination for paradoxes (and heterodox publications), Ewan Cameron sent me the link to a recent arXival by Louis Lyons (Oxford) on different asymptotic distributions of the likelihood ratio. Which is full of approximations. The overall point of the note is hard to fathom… Unless it simply plans to illustrate Betteridge’s law of headlines, as suggested by Ewan.

For instance, the limiting distribution of the log-likelihood of an exponential sample at the true value of the parameter τ is not asymptotically Gaussian but almost surely infinite. While the log of the (Wilks) likelihood ratio at the true value of τ is truly (if asymptotically) a Χ² variable with one degree of freedom. That it is not a Gaussian is deemed a “paradox” by the author, explained by a cancellation of first order terms… Same thing again for the common Gaussian mean problem!

Au’Bayes 17

Posted in Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , on December 14, 2017 by xi'an

Some notes scribbled during the O’Bayes 17 conference in Austin, not reflecting on the highly diverse range of talks. And many new faces and topics, meaning O’Bayes is alive and evolving. With all possible objectivity, a fantastic conference! (Not even mentioning the bars where Peter Müller hosted the poster sessions, a feat I would have loved to see duplicated for the posters of ISBA 2018… Or the Ethiopian restaurant just around the corner with the right amount of fierce spices!)

The wiki on objective, reference, vague, neutral [or whichever label one favours] priors that was suggested at the previous O’Bayes meeting in Valencià, was introduced as Wikiprevia by Gonzalo Garcia-Donato. It aims at classifying recommended priors in most of the classical models, along with discussion panels, and it should soon get an official launch, when contributors will be welcome to include articles in a wiki principle. I wish the best to this venture which, I hope, will induce O’Bayesians to contribute actively.

In a brilliant talk that quickly reverted my jetlag doziness, Peter Grünwald returned to the topic he presented last year in Sardinia, namely safe Bayes or powered-down likelihoods to handle some degree of misspecification, with a further twist of introducing an impossible value `o’ that captures missing mass (to be called Peter’s demon?!), which absolute necessity I did not perceive. Food for thoughts, definitely. (But I feel that the only safe Bayes is the dead Bayes, as protecting against all kinds of mispecifications means no action is possible.)

I also appreciated Cristiano Villa’s approach to constructing prior weights in model comparison from a principled and decision-theoretic perspective even though I felt that the notion of ranking parameter importance required too much input to be practically feasible. (Unless I missed that point.)

Laura Ventura gave her talk on using for ABC various scores or estimating equations as summary statistics, rather than the corresponding M-estimators, which offers the appealing feature of reducing computation while being asymptotically equivalent. (A feature we also exploited for the regular score function in our ABC paper with Gael, David, Brendan, and Wonapree.) She mentioned the Hyvärinen score [of which I first heard in Padova!] as a way to bypass issues related to doubly intractable likelihoods. Which is a most interesting proposal that bypasses (ABC) simulations from such complex targets by exploiting a pseudo-posterior.

Veronika Rockova presented a recent work on concentration rates for regression tree methods that produce a rigorous analysis of these methods. Showing that the spike & slab priors plus BART [equals spike & tree] achieve sparsity and optimal concentration. In an oracle sense. With a side entry on assembling partition trees towards creating a new form of BART. Which made me wonder whether or not this was also applicable to random forests. Although they are not exactly Bayes. Demanding work in terms of the theory behind but with impressive consequences!

Just before I left O’Bayes 17 for Houston airport, Nick Polson, along with Peter McCullach, proposed an intriguing notion of sparse Bayes factors, which corresponds to the limit of a Bayes factor when the prior probability υ of the null goes to zero. When the limiting prior is replaced with an exceedance measure that can be normalised into a distribution, but does it make the limit a special prior? Linking  υ with the prior under the null is not an issue (this was the basis of my 1992 Lindley paradox paper) but the sequence of priors indexed by υ need be chosen. And reading from the paper at Houston airport, I could not spot a construction principle that would lead to a reference prior of sorts. One thing that Nick mentioned during his talk was that we observed directly realisations of the data marginal, but this is generally not the case as the observations are associated with a given value of the parameter, not one for each observation.The next edition of the O’Bayes conference will be in… Warwick on June 29-July 2, as I volunteered to organise this edition (16 years after O’Bayes 03 in Aussois!) just after the BNP meeting in Oxford on June 23-28, hopefully creating the environment for fruitful interactions between both communities! (And jumping from Au’Bayes to Wa’Bayes.)

Posted in Books, Kids, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , on April 12, 2017 by xi'an

The reason for my short visit to Berlin last week was an OxWaSP (Oxford and Warwick Statistics Program) workshop hosted by Amazon Berlin with talks between statistics and machine learning, plus posters from our second year students. While the workshop was quite intense, I enjoyed very much the atmosphere and the variety of talks there. (Just sorry that I left too early to enjoy the social programme at a local brewery, Brauhaus Lemke, and the natural history museum. But still managed nice runs east and west!) One thing I found most interesting (if obvious in retrospect) was the different focus of academic and production talks, where the later do not aim at a full generality or at a guaranteed improvement over the existing, provided the new methodology provides a gain in efficiency over the existing.

This connected nicely with me reading several Nature articles on quantum computing during that trip,  where researchers from Google predict commercial products appearing in the coming five years, even though the technology is far from perfect and the outcome qubit error prone. Among the examples they provided, quantum simulation (not meaning what I consider to be simulation!), quantum optimisation (as a way to overcome multimodality), and quantum sampling (targeting given probability distributions). I find the inclusion of the latest puzzling in that simulation (in that sense) shows very little tolerance for errors, especially systematic bias. It may be that specific quantum architectures can be designed for specific probability distributions, just like some are already conceived for optimisation. (It may even be the case that quantum solutions are (just next to) available for intractable constants as in Ising or Potts models!)

automated ABC summary combination

Posted in Books, pictures, Statistics, University life with tags , , , , , , , on March 16, 2017 by xi'an

Jonathan Harrison and Ruth Baker (Oxford University) arXived this morning a paper on the optimal combination of summaries for ABC in the sense of deriving the proper weights in an Euclidean distance involving all the available summaries. The idea is to find the weights that lead to the maximal distance between prior and posterior, in a way reminiscent of Bernardo’s (1979) maximal information principle. Plus a sparsity penalty à la Lasso. The associated algorithm is sequential in that the weights are updated at each iteration. The paper does not get into theoretical justifications but considers instead several examples with limited numbers of both parameters and summary statistics. Which may highlight the limitations of the approach in that handling (and eliminating) a large number of parameters may prove impossible this way, when compared with optimisation methods like random forests. Or summary-free distances between empirical distributions like the Wasserstein distance.

Oxford snapshot [jatp]

Posted in Books, Kids, pictures, Travel, University life with tags , , , , , , on February 9, 2017 by xi'an

relativity is the keyword

Posted in Books, Statistics, University life with tags , , , , , , , on February 1, 2017 by xi'an

St John's College, Oxford, Feb. 23, 2012As I was teaching my introduction to Bayesian Statistics this morning, ending up with the chapter on tests of hypotheses, I found reflecting [out loud] on the relative nature of posterior quantities. Just like when I introduced the role of priors in Bayesian analysis the day before, I stressed the relativity of quantities coming out of the BBB [Big Bayesian Black Box], namely that whatever happens as a Bayesian procedure is to be understood, scaled, and relativised against the prior equivalent, i.e., that the reference measure or gauge is the prior. This is sort of obvious, clearly, but bringing the argument forward from the start avoids all sorts of misunderstanding and disagreement, in that it excludes the claims of absolute and certainty that may come with the production of a posterior distribution. It also removes the endless debate about the determination of the prior, by making each prior a reference on its own. With an additional possibility of calibration by simulation under the assumed model. Or an alternative. Again nothing new there, but I got rather excited by this presentation choice, as it seems to clarify the path to Bayesian modelling and avoid misapprehensions.

Further, the curious case of the Bayes factor (or of the posterior probability) could possibly be resolved most satisfactorily in this framework, as the [dreaded] dependence on the model prior probabilities then becomes a matter of relativity! Those posterior probabilities depend directly and almost linearly on the prior probabilities, but they should not be interpreted in an absolute sense as the ultimate and unique probability of the hypothesis (which anyway does not mean anything in terms of the observed experiment). In other words, this posterior probability does not need to be scaled against a U(0,1) distribution. Or against the p-value if anyone wishes to do so. By the end of the lecture, I was even wondering [not so loudly] whether or not this perspective was allowing for a resolution of the Lindley-Jeffreys paradox, as the resulting number could be set relative to the choice of the [arbitrary] normalising constant. Continue reading