Archive for University of Bristol

p-values, Bayes factors, and sufficiency

Posted in Books, pictures, Statistics with tags , , , , , , , , , on April 15, 2019 by xi'an

Among the many papers published in this special issue of TAS on statistical significance or lack thereof, there is a paper I had already read before (besides ours!), namely the paper by Jonty Rougier (U of Bristol, hence the picture) on connecting p-values, likelihood ratio, and Bayes factors. Jonty starts from the notion that the p-value is induced by a transform, summary, statistic of the sample, t(x), the larger this t(x), the less likely the null hypothesis, with density f⁰(x), to create an embedding model by exponential tilting, namely the exponential family with dominating measure f⁰, and natural statistic, t(x), and a positive parameter θ. In this embedding model, a Bayes factor can be derived from any prior on θ and the p-value satisfies an interesting double inequality, namely that it is less than the likelihood ratio, itself lower than any (other) Bayes factor. One novel aspect from my perspective is that I had thought up to now that this inequality only holds for one-dimensional problems, but there is no constraint here on the dimension of the data x. A remark I presumably made to Jonty on the first version of the paper is that the p-value itself remains invariant under a bijective increasing transform of the summary t(.). This means that there exists an infinity of such embedding families and that the bound remains true over all such families, although the value of this minimum is beyond my reach (could it be the p-value itself?!). This point is also clear in the justification of the analysis thanks to the Pitman-Koopman lemma. Another remark is that the perspective can be inverted in a more realistic setting when a genuine alternative model M¹ is considered and a genuine likelihood ratio is available. In that case the Bayes factor remains smaller than the likelihood ratio, itself larger than the p-value induced by the likelihood ratio statistic. Or its log. The induced embedded exponential tilting is then a geometric mixture of the null and of the locally optimal member of the alternative. I wonder if there is a parameterisation of this likelihood ratio into a p-value that would turn it into a uniform variate (under the null). Presumably not. While the approach remains firmly entrenched within the realm of p-values and Bayes factors, this exploration of a natural embedding of the original p-value is definitely worth mentioning in a class on the topic! (One typo though, namely that the Bayes factor is mentioned to be lower than one, which is incorrect.)

mixture modelling for testing hypotheses

Posted in Books, Statistics, University life with tags , , , , , , , , , , on January 4, 2019 by xi'an

After a fairly long delay (since the first version was posted and submitted in December 2014), we eventually revised and resubmitted our paper with Kaniav Kamary [who has now graduated], Kerrie Mengersen, and Judith Rousseau on the final day of 2018. The main reason for this massive delay is mine’s, as I got fairly depressed by the general tone of the dozen of reviews we received after submitting the paper as a Read Paper in the Journal of the Royal Statistical Society. Despite a rather opposite reaction from the community (an admittedly biased sample!) including two dozens of citations in other papers. (There seems to be a pattern in my submissions of Read Papers, witness our earlier and unsuccessful attempt with Christophe Andrieu in the early 2000’s with the paper on controlled MCMC, leading to 121 citations so far according to G scholar.) Anyway, thanks to my co-authors keeping up the fight!, we started working on a revision including stronger convergence results, managing to show that the approach leads to an optimal separation rate, contrary to the Bayes factor which has an extra √log(n) factor. This may sound paradoxical since, while the Bayes factor  converges to 0 under the alternative model exponentially quickly, the convergence rate of the mixture weight α to 1 is of order 1/√n, but this does not mean that the separation rate of the procedure based on the mixture model is worse than that of the Bayes factor. On the contrary, while it is well known that the Bayes factor leads to a separation rate of order √log(n) in parametric models, we show that our approach can lead to a testing procedure with a better separation rate of order 1/√n. We also studied a non-parametric setting where the null is a specified family of distributions (e.g., Gaussians) and the alternative is a Dirichlet process mixture. Establishing that the posterior distribution concentrates around the null at the rate √log(n)/√n. We thus resubmitted the paper for publication, although not as a Read Paper, with hopefully more luck this time!

five postdoc positions in top UK universities & Bayesian health data science

Posted in Statistics with tags , , , , , , , , , , , , , on March 30, 2018 by xi'an

The EPSRC programme New Approaches to Bayesian Data Science: Tackling Challenges from the Health Sciences, directed by Paul Fearnhead, is offering five 3 or 4 year PDRA positions at the Universities of Bristol, Cambridge, Lancaster, Oxford, and Warwick. Here is the complete call:

Salary:   £29,799 to £38,833
Closing Date:   Thursday 26 April 2018
Interview Date:   Friday 11 May 2018

We invite applications for Post-Doctoral Research Associates to join the New Approaches to Bayesian Data Science: Tackling Challenges from the Health Sciences programme. This is an exciting, cross-disciplinary research project that will develop new methods for Bayesian statistics that are fit-for-purpose to tackle contemporary Health Science challenges: such as real-time inference and prediction for large scale epidemics; or synthesizing information from distinct data sources for large scale studies such as the UK Biobank. Methodological challenges will be around making Bayesian methods scalable to big-data and robust to (unavoidable) model errors.

This £3M programme is funded by EPSRC, and brings together research groups from the Universities of Lancaster, Bristol, Cambridge, Oxford and Warwick. There is either a 4 or a 3 year position available at each of these five partner institutions.

You should have, or be close to completing, a PhD in Statistics or a related discipline. You will be experienced in one or more of the following areas: Bayesian statistics, computational statistics, statistical machine learning, statistical genetics, inference for epidemics. You will have demonstrated the ability to develop new statistical methodology. We are particularly keen to encourage applicants with strong computational skills, and are looking to put together a team of researchers with skills that cover theoretical, methodological and applied statistics. A demonstrable ability to produce academic writing of the highest publishable quality is essential.

Applicants must apply through Lancaster University’s website for the Lancaster, Oxford, Bristol and Warwick posts.  Please ensure you state clearly which position or positions you wish to be considered for when applying. For applications to the MRC Biostatistics Unit, University of Cambridge vacancy please go to their website.

Candidates who are considering making an application are strongly encouraged to contact Professor Paul Fearnhead (p.fearnhead@lancaster.ac.uk), Sylvia Richardson (sylvia.richardson@mrc-bsu.cam.ac.uk), Christophe Andrieu (c.andrieu@bristol.ac.uk), Chris Holmes (c.holmes@stats.ox.ac.uk) or Gareth Roberts (Gareth.O.Roberts@warwick.ac.uk) to discuss the programme in greater detail.

We welcome applications from people in all diversity groups.

 

postdoc position in London plus Seattle

Posted in Statistics with tags , , , , , , , , , , , on March 21, 2018 by xi'an

Here is an announcement from Oliver Ratman for a postdoc position at Imperial College London with partners in Seattle, on epidemiology and new Bayesian methods for estimating sources of transmission with phylogenetics. As stressed by Ollie, no pre-requisites in phylogenetics are required, they are really looking for someone with solid foundations in Mathematics/Statistics, especially Bayesian Statistics, and good computing skills (R, github, MCMC, Stan). The search is officially for a Postdoc in Statistics and Pathogen Phylodynamics. Reference number is NS2017189LH. Deadline is April 07, 2018.

a summer of British conferences!

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , on January 18, 2018 by xi'an

resampling methods

Posted in Books, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , on December 6, 2017 by xi'an

A paper that was arXived [and that I missed!] last summer is a work on resampling by Mathieu Gerber, Nicolas Chopin (CREST), and Nick Whiteley. Resampling is used to sample from a weighted empirical distribution and to correct for very small weights in a weighted sample that otherwise lead to degeneracy in sequential Monte Carlo (SMC). Since this step is based on random draws, it induces noise (while improving the estimation of the target), reducing this noise is preferable, hence the appeal of replacing plain multinomial sampling with more advanced schemes. The initial motivation is for sequential Monte Carlo where resampling is rife and seemingly compulsory, but this also applies to importance sampling when considering several schemes at once. I remember discussing alternative schemes with Nicolas, then completing his PhD, as well as Olivier Cappé, Randal Douc, and Eric Moulines at the time (circa 2004) we were working on the Hidden Markov book. And getting then a somewhat vague idea as to why systematic resampling failed to converge.

In this paper, Mathieu, Nicolas and Nick show that stratified sampling (where a uniform is generated on every interval of length 1/n) enjoys some form of consistent, while systematic sampling (where the “same” uniform is generated on every interval of length 1/n) does not necessarily enjoy this consistency. There actually exists cases where convergence does not occur. However, a residual version of systematic sampling (where systematic sampling is applied to the residuals of the decimal parts of the n-enlarged weights) is itself consistent.

The paper also studies the surprising feature uncovered by Kitagawa (1996) that stratified sampling applied to an ordered sample brings an error of O(1/n²) between the cdf rather than the usual O(1/n). It took me a while to even understand the distinction between the original and the ordered version (maybe because Nicolas used the empirical cdf during his SAD (Stochastic Algorithm Day!) talk, ecdf that is the same for ordered and initial samples).  And both systematic and deterministic sampling become consistent in this case. The result was shown in dimension one by Kitagawa (1996) but extends to larger dimensions via the magical trick of the Hilbert curve.

more positions in the UK [postdoc & professor]

Posted in Statistics with tags , , , , , , , , , , , on October 13, 2017 by xi'an

I have received additional emails from England advertising for positions in Bristol, Durham, and London, so here they are, with links to the complete advertising!

  1. The University of Bristol is seeking to appoint a number of Chairs in any areas of Mathematics or Statistical Science, in support of a major strategic expansion of the School of Mathematics. Deadline is December 4.
  2. Durham University is opening a newly created position of Professor of Statistics, with research and teaching duties. Deadline is November 6.
  3. Oliver Ratman, in the Department of Mathematics at Imperial College London, is seeking a Research Associate in Statistics and Pathogen Phylodynamics. Deadline is October 30.