19 dubious ways to compute the marginal likelihood

Posted in Books, Statistics with tags , , , , , , , , , , on December 11, 2018 by xi'an

A recent arXival on nineteen different [and not necessarily dubious!] ways to approximate the marginal likelihood of a given topology of a philogeny tree that reminded me of our San Antonio survey with Jean-Michel Marin. This includes a version of the Laplace approximation called Laplus (!), accounting for the fact that branch lengths on the tree are positive but may have a MAP at zero. Using a Beta, Gamma, or log-Normal distribution instead of a Normal. For importance sampling, the proposals are derived from either the Laplus (!) approximate distributions or from the variational Bayes solution (based on an Normal product). Harmonic means are still used here despite the obvious danger, along with a defensive version that mixes prior and posterior. Naïve Monte Carlo means simulating from the prior, while bridge sampling seems to use samples from prior and posterior distributions. Path and modified path sampling versions are those proposed in 2008 by Nial Friel and Tony Pettitt (QUT). Stepping stone sampling appears like another version of path sampling, also based on a telescopic product of ratios of normalising constants, the generalised version relying on a normalising reference distribution that need be calibrated. CPO and PPD in the above table are two versions based on posterior predictive density estimates.

When running the comparison between so many contenders, the ground truth is selected as the values returned by MrBayes in a massive MCMC experiment amounting to 7.5 billions generations. For five different datasets. The above picture describes mean square errors for the probabilities of split, over ten replicates [when meaningful], the worst case being naïve Monte Carlo, with nested sampling and harmonic mean solutions close by. Similar assessments proceed from a comparison of Kullback-Leibler divergences. With the (predicatble?) note that “the methods do a better job approximating the marginal likelihood of more probable trees than less probable trees”. And massive variability for the poorest methods:

The comparison above does not account for time and since some methods are deterministic (and fast) there is little to do about this. The stepping steps solutions are very costly, while on the middle range bridge sampling outdoes path sampling. The assessment of nested sampling found in the conclusion is that it “would appear to be an unwise choice for estimating the marginal likelihoods of topologies, as it produces poor approximate posteriors” (p.12). Concluding at the Gamma Laplus approximation being the winner across all categories! (There is no ABC solution studied in this paper as the model likelihood can be computed in this setup, contrary to our own setting.)

a book by C.Robert [not a book review]

Posted in Books, Kids, pictures, Statistics, University life with tags , , , , , , , , , on December 10, 2018 by xi'an

L’Armée Furieuse [book review]

Posted in Books, Travel with tags , , , , , , on December 9, 2018 by xi'an

“On dit que les Normands n’aiment pas beaucoup parler… Ce n’est pas qu’ils n’aiment pas parler, c’est qu’ils n’aiment pas répondre. Ce n’est pas la même chose.”

I picked this book by Fred Vargas at the airport mostly because the back cover mentioned Orbec a town near my hometown in rural Normandy. With a slight misspelling to avoid legal issues I presume. It made for a nice read in the long trip to Oaxaca even though it is filled with impossibilities and incoherences. The crux of the story is an interesting medieval myth called l’armée furieuse (the Wild Hunt) that tells of a spectral army crossing the North of France and picking dammed souls soon to die. The wild hunt is also called la mesnie or maisnie Hellequin, from the name of the Lord leading the spectral army. According to a English monk from a Norman monastery in the 1100’s. Myth that some in current era want to exploit to cover real crimes. As in the previous novels of Fred Vargas that I read there is an interesting undercurrent of exposing the machinery of a rural community, with highly unorthodox police officers. Not that I recognized much of my hometown atmosphere. And the Deus ex Machina represented by a local count [historically speaking, Orbec is only a barony] and the industrial plot were by far too implausible! (With a geographical inaccuracy of setting La Touques river nearby. And of mentioning a train station in Cernay, to end up on a very picky note.)

hue & cry [book review]

Posted in Statistics with tags , , , , , , on December 8, 2018 by xi'an

While visiting the Blackwell’s bookstore by the University of Edinburgh last June, I spotted this historical whodunit in the local interest section. Hue & Cry by Shirley McKay. It stayed on a to-read pile by my bed until a few weeks ago when I started reading it and got more and more engrossed in the story. While the style is not always at its best and the crime aspects are somewhat thin, I find the description of the Scottish society of the time (1570’s) fascinating (and hopefully accurate), especially the absolute dominion of the local Church (Kirk) on every aspect of life and the helplessness of women always under the threat of witchcraft accusations. Which could end up with the death penalty, as in thousands of cases. The book reminds me to some extent of the early Susanna Gregory’s books in that it also involves scholars, teaching well-off students with limited intellectual abilities, while bright but poorer students have to work for the college to make up for their lack of funds. As indicated above, the criminal part is less interesting as the main investigator unfolds the complicated plot without much of a hint. And convinces the juries rather too easily in my opinion. An overall fine novel, nonetheless!

polluters 3 [taxes] – government 0 [result] – climate minus 1 [or rather +2⁰]

Posted in pictures with tags , , , , , , , , on December 7, 2018 by xi'an

selected parameters from observations

Posted in Books, Statistics with tags , , , , , , , on December 7, 2018 by xi'an

I recently read a fairly interesting paper by Daniel Yekutieli on a Bayesian perspective for parameters selected after viewing the data, published in Series B in 2012. (Disclaimer: I was not involved in processing this paper!)

The first example is to differentiate the Normal-Normal mean posterior when θ is N(0,1) and x is N(θ,1) from the restricted posterior when θ is N(0,1) and x is N(θ,1) truncated to (0,∞). By restating the later as the repeated generation from the joint until x>0. This does not sound particularly controversial, except for the notion of selecting the parameter after viewing the data. That the posterior support may depend on the data is not that surprising..!

“The observation that selection affects Bayesian inference carries the important implication that in Bayesian analysis of large data sets, for each potential parameter, it is necessary to explicitly specify a selection rule that determines when inference  is provided for the parameter and provide inference that is based on the selection-adjusted posterior distribution of the parameter.” (p.31)

The more interesting distinction is between “fixed” and “random” parameters (Section 2.1), which separate cases where the data is from a truncated distribution (given the parameter) and cases where the joint distribution is truncated but misses the normalising constant (function of θ) for the truncated sampling distribution. The “mixed” case introduces an hyperparameter λ and the normalising constant integrates out θ and depends on λ. Which amounts to switching to another (marginal) prior on θ. This is quite interesting even though one can debate of the very notions of “random” and “mixed” “parameters”, which are those where the posterior most often changes, as true parameters. Take for instance Stephen Senn’s example (p.6) of the mean associated with the largest observation in a Normal mean sample, with distinct means. When accounting for the distribution of the largest variate, this random variable is no longer a Normal variate with a single unknown mean but it instead depends on all the means of the sample. Speaking of the largest observation mean is therefore misleading in that it is neither the mean of the largest observation, nor a parameter per se since the index [of the largest observation] is a random variable induced by the observed sample.

In conclusion, a very original article, if difficult to assess as it can be argued that selection models other than the “random” case result from an intentional modelling choice of the joint distribution.

 

support for Remain comes first in latest YouGov survey

Posted in pictures, Travel with tags , , , , , , , on December 6, 2018 by xi'an