## my likelihood is dominating my prior [not!]

Posted in Kids, Statistics with tags , , , , , on August 29, 2019 by xi'an

An interesting misconception read on X validated today, with a confusion between the absolute value of the likelihood function and its variability. Which I have trouble explaining except possibly by the extrapolation from the discrete case and a confusion between the probability density of the data [scaled as a probability] and the likelihood function [scale-less]. I also had trouble convincing the originator of the question of the irrelevance of the scale of the likelihood per se, even when demonstrating that |$$𝚺|$$ could vanish from the posterior with no consequence whatsoever. It is only when I thought of the case when the likelihood is constant in $$𝜃$$ that I managed to make my case.

## are there a frequentist and a Bayesian likelihoods?

Posted in Statistics with tags , , , , , , , , , , on June 7, 2018 by xi'an

A question that came up on X validated and led me to spot rather poor entries in Wikipedia about both the likelihood function and Bayes’ Theorem. Where unnecessary and confusing distinctions are made between the frequentist and Bayesian versions of these notions. I have already discussed the later (Bayes’ theorem) a fair amount here. The discussion about the likelihood is quite bemusing, in that the likelihood function is the … function of the parameter equal to the density indexed by this parameter at the observed value.

“What we can find from a sample is the likelihood of any particular value of r, if we define the likelihood as a quantity proportional to the probability that, from a population having the particular value of r, a sample having the observed value of r, should be obtained.” R.A. Fisher, On the “probable error’’ of a coefficient of correlation deduced from a small sample. Metron 1, 1921, p.24

By mentioning an informal side to likelihood (rather than to likelihood function), and then stating that the likelihood is not a probability in the frequentist version but a probability in the Bayesian version, the W page makes a complete and unnecessary mess. Whoever is ready to rewrite this introduction is more than welcome! (Which reminded me of an earlier question also on X validated asking why a common reference measure was needed to define a likelihood function.)

This also led me to read a recent paper by Alexander Etz, whom I met at E.J. Wagenmakers‘ lab in Amsterdam a few years ago. Following Fisher, as Jeffreys complained about

“..likelihood, a convenient term introduced by Professor R.A. Fisher, though in his usage it is sometimes multiplied by a constant factor. This is the probability of the observations given the original information and the hypothesis under discussion.” H. Jeffreys, Theory of Probability, 1939, p.28

Alexander defines the likelihood up to a constant, which causes extra-confusion, for free!, as there is no foundational reason to introduce this degree of freedom rather than imposing an exact equality with the density of the data (albeit with an arbitrary choice of dominating measure, never neglect the dominating measure!). The paper also repeats the message that the likelihood is not a probability (density, missing in the paper). And provides intuitions about maximum likelihood, likelihood ratio and Wald tests. But does not venture into a separate definition of the likelihood, being satisfied with the fundamental notion to be plugged into the magical formula

posteriorprior×likelihood

## likelihood inflating sampling algorithm

Posted in Books, Statistics, University life with tags , , , , , , , , on May 24, 2016 by xi'an

My friends from Toronto Radu Craiu and Jeff Rosenthal have arXived a paper along with Reihaneh Entezari on MCMC scaling for large datasets, in the spirit of Scott et al.’s (2013) consensus Monte Carlo. They devised an likelihood inflated algorithm that brings a novel perspective to the problem of large datasets. This question relates to earlier approaches like consensus Monte Carlo, but also kernel and Weierstrass subsampling, already discussed on this blog, as well as current research I am conducting with my PhD student Changye Wu. The approach by Entezari et al. is somewhat similar to consensus Monte Carlo and the other solutions in that they consider an inflated (i.e., one taken to the right power) likelihood based on a subsample, with the full sample being recovered by importance sampling. Somewhat unsurprisingly this approach leads to a less dispersed estimator than consensus Monte Carlo (Theorem 1). And the paper only draws a comparison with that sub-sampling method, rather than covering other approaches to the problem, maybe because this is the most natural connection, one approach being the k-th power of the other approach.

“…we will show that [importance sampling] is unnecessary in many instances…” (p.6)

An obvious question that stems from the approach is the call for importance sampling, since the numerator of the importance sampler involves the full likelihood which is unavailable in most instances when sub-sampled MCMC is required. I may have missed the part of the paper where the above statement is discussed, but the only realistic example discussed therein is the Bayesian regression tree (BART) of Chipman et al. (1998). Which indeed constitutes a challenging if one-dimensional example, but also one that requires delicate tuning that leads to cancelling importance weights but which may prove delicate to extrapolate to other models.

## never mind the big data here’s the big models [workshop]

Posted in Kids, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , on December 22, 2015 by xi'an

Maybe the last occurrence this year of the pastiche of the iconic LP of the Sex Pistols!, made by Tamara Polajnar. The last workshop as well of the big data year in Warwick, organised by the Warwick Data Science Institute. I appreciated the different talks this afternoon, but enjoyed particularly Dan Simpson’s and Rob Scheichl’s. The presentation by Dan was so hilarious that I could not resist asking him for permission to post the slides here:

Not only hilarious [and I have certainly missed 67% of the jokes], but quite deep about the meaning(s) of modelling and his views about getting around the most blatant issues. Ron presented a more computational talk on the ways to reach petaflops on current supercomputers, in connection with weather prediction models used (or soon to be used) by the Met office. For a prediction area of 1 km². Along with significant improvements resulting from multiscale Monte Carlo and quasi-Monte Carlo. Definitely impressive! And a brilliant conclusion to the Year of Big Data (and big models).

## never mind the big data here’s the big models [workshop]

Posted in Kids, pictures, Statistics with tags , , , , , , , on December 10, 2015 by xi'an

A perfect opportunity to recycle the pastiche of the iconic LP of the Sex Pistols!, that Mark Girolami posted for the ATI Scoping workshop  last month in Warwick. There is an open workshop on the theme of big data/big models next week in Warwick, organised by the Warwick Data Science Institute. It will take place on December 15, from noon till 5:30pm in the Zeeman Building. Invited speakers are

• Robert Scheichl (University of Bath, Dept of Mathematical Sciences)
• Shiwei Lan (University of Warwick, Dept of Statistics)
• Konstantinos Zygalakis (University of Southampton, Dept of Mathematical Sciences)
Dan Simpson (University of Bath, Dept of Mathematical Sciences), with the enticing title of “To avoid fainting, keep repeating ‘It’s only a model’…”

## intractable likelihoods (even) for Alan

Posted in Kids, pictures, Statistics with tags , , , , , , , , , , , , on November 19, 2015 by xi'an

In connection with the official launch of the Alan Turing Institute (or ATI, of which Warwick is a partner), it funded an ATI Scoping workshop yesterday a week ago in Warwick around the notion(s) of intractable likelihood(s) and how this could/should fit within the themes of the Institute [hence the scoping]. This is one among many such scoping workshops taking place at all partners, as reported on the ATI website. Workshop that was quite relaxed and great fun, if only for getting together with most people (and friends) in the UK interested in the topic. But also pointing out some new themes I had not previously though of as related to ilike. For instance, questioning the relevance of likelihood for inference and putting forward decision theory under model misspecification, connecting with privacy and ethics [hence making intractable “good”!], introducing uncertain likelihood, getting more into network models, RKHS as a natural summary statistic, swarm of solutions for consensus inference… (And thanks to Mark Girolami for this homage to the iconic LP of the Sex Pistols!, that I played maniacally all over 1978…) My own two-cents into the discussion were mostly variations of other discussions, borrowing from ABC (and ABC slides) to call for a novel approach to approximate inference:

## Statistics slides (4)

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , on November 10, 2014 by xi'an

Here is the fourth set of slides for my third year statistics course, trying to build intuition about the likelihood surface and why on Earth would one want to find its maximum?!, through graphs. I am yet uncertain whether or not I will reach the point where I can teach more asymptotics so maybe I will also include asymptotic normality of the MLE under regularity conditions in this chapter…