## Archive for evidence

## unrejected null [xkcd]

Posted in Statistics with tags evidence, Nature, point null hypotheses, preregistered experiments, replication crisis, xkcd on July 18, 2018 by xi'an## new estimators of evidence

Posted in Books, Statistics with tags Bayesian Analysis, Connecticut, curse of dimensionality, estimating a constant, evidence, harmonic mean estimator, HPD region, importance sampling, marginal likelihood, Monte Carlo Statistical Methods, Old Man of Storr, Pima Indians, Storrs on June 19, 2018 by xi'an**I**n an incredible accumulation of coincidences, I came across yet another paper about evidence and the harmonic mean challenge, by Yu-Bo Wang, Ming-Hui Chen [same as in Chen, Shao, Ibrahim], Lynn Kuo, and Paul O. Lewis this time, published in Bayesian Analysis. *(Disclaimer: I was not involved in the reviews of any of these papers!)* Authors who arelocated in Storrs, Connecticut, in geographic and thematic connection with the original Gelfand and Dey (1994) paper! (Private joke about the Old Man of Storr in above picture!)

“The working parameter space is essentially the constrained support considered by Robert and Wraith (2009) and Marin and Robert (2010).”

The central idea is to use a more general function than our HPD restricted prior but still with a known integral. Not in the sense of control variates, though. The function of choice is a weighted sum of indicators of terms of a finite partition, which implies a compact parameter set Ω. Or a form of HPD region, although it is unclear when the volume can be derived. While the consistency of the estimator of the inverse normalising constant [based on an MCMC sample] is unsurprising, the more advanced part of the paper is about finding the optimal sequence of weights, as in control variates. But it is also unsurprising in that the weights are proportional to the inverses of the inverse posteriors over the sets in the partition. Since these are hard to derive in practice, the authors come up with a fairly interesting alternative, which is to take the value of the posterior at an arbitrary point of the relevant set.

The paper also contains an extension replacing the weights with functions that are integrable and with known integrals. Which is hard for most choices, even though it contains the regular harmonic mean estimator as a special case. And should also suffer from the curse of dimension when the constraint to keep the target almost constant is implemented (as in Figure 1).

The method, when properly calibrated, does much better than harmonic mean (not a surprise) and than Petris and Tardella (2007) alternative, but no other technique, on toy problems like Normal, Normal mixture, and probit regression with three covariates (no Pima Indians this time!). As an aside I find it hard to understand how the regular harmonic mean estimator takes longer than this more advanced version, which should require more calibration. But I find it hard to see a general application of the principle, because the partition needs to be chosen in terms of the target. Embedded balls cannot work for every possible problem, even with ex-post standardisation.

## unbiased consistent nested sampling via sequential Monte Carlo [a reply]

Posted in pictures, Statistics, Travel with tags auxiliary variable, Brisbane, evidence, marginal likelihood, nested sampling, Og, particle filter, QUT, unbiasedness on June 13, 2018 by xi'an*Rob Salomone sent me the following reply on my comments of yesterday about their recently arXived paper.*

“Which never occurred as the number one difficulty there, as the simplest implementation runs a Markov chain from the last removed entry, independently from the remaining entries. Even stationarity is not an issue sinceI believe that the first occurrence within the level set is distributed from the constrained prior.”

“And then, in a twist that is not clearly explained in the paper, the focus moves to an improved nested sampler that moves one likelihood value at a time, with a particle step replacing a singleparticle. (Things get complicated when several particles may take the very same likelihood value, but randomisation helps.) At this stage the algorithm is quite similar to the original nested sampler. Except for the unbiased estimation of the constants, thefinal constant, and the replacement of exponential weights exp(-t/N) by powers of (N-1/N)”

**is**a special case of SMC (with the weights replaced with a suboptimal choice).

## unbiased consistent nested sampling via sequential Monte Carlo

Posted in pictures, Statistics, Travel with tags auxiliary variable, Brisbane, evidence, marginal likelihood, nested sampling, Og, particle filter, QUT, unbiasedness on June 12, 2018 by xi'an

“Moreover, estimates of the marginal likelihood are unbiased.” (p.2)

Rob Salomone, Leah South, Chris Drovandi and Dirk Kroese (from QUT and UQ, Brisbane) recently arXived a paper that frames the nested sampling in such a way that marginal likelihoods can be unbiasedly (and consistently) estimated.

“Why isn’t nested sampling more popular with statisticians?” (p.7)

A most interesting question, especially given its popularity in cosmology and other branches of physics. A first drawback pointed out in the c is the requirement of independence between the elements of the sample produced at each iteration. Which never occurred as the number one difficulty there, as the simplest implementation runs a Markov chain from the last removed entry, independently from the remaining entries. Even stationarity is not an issue since I believe that the first occurrence within the level set is distributed from the constrained prior.

A second difficulty is the use of quadrature which turns integrand into step functions at random slices. Indeed, mixing Monte Carlo with numerical integration makes life much harder, as shown by the early avatars of nested sampling that only accounted for the numerical errors. (And which caused Nicolas and I to write our critical paper in Biometrika.) There are few studies of that kind in the literature, the only one I can think of being [my former PhD student] Anne Philippe‘s thesis twenty years ago.

The third issue stands with the difficulty in parallelising the method. Except by jumping k points at once, rather than going one level at a time. While I agree this makes life more complicated, I am also unsure about the severity of that issue as k nested sampling algorithms can be run in parallel and aggregated in the end, from simple averaging to something more elaborate.

The final blemish is that the nested sampling estimator has a stopping mechanism that induces a truncation error, again maybe a lesser problem given the overall difficulty in assessing the total error.

The paper takes advantage of the ability of SMC to produce unbiased estimates of a sequence of normalising constants (or of the normalising constants of a sequence of targets). For nested sampling, the sequence is made of the prior distribution restricted to an embedded sequence of level sets. With another sequence restricted to bands (likelihood between two likelihood boundaries). If all restricted posteriors of the second kind and their normalising constant are known, the full posterior is known. Apparently up to the main normalising constant, i.e. the marginal likelihood., *ℨ*, except that it is also the sum of all normalising constants. Handling this sequence by SMC addresses the four concerns of the four authors, apart from the truncation issue, since the largest likelihood bound need be set for running the algorithm.

When the sequence of likelihood bounds is chosen based on the observed likelihoods so far, the method becomes adaptive. Requiring again the choice of a stopping rule that may induce bias if stopping occurs too early. And then, in a twist that is not clearly explained in the paper, the focus moves to an improved nested sampler that moves one likelihood value at a time, with a particle step replacing a single particle. (Things get complicated when several particles may take the very same likelihood value, but randomisation helps.) At this stage the algorithm is quite similar to the original nested sampler. Except for the unbiased estimation of the constants, the final constant, and the replacement of exponential weights exp(-t/N) by powers of (N-1/N).

The remainder of this long paper (61 pages!) is dedicated to practical implementation, calibration and running a series of comparisons. A nice final touch is the thanks to the ‘Og for its series of posts on nested sampling, which “helped influence this work, and played a large part in inspiring it.”

In conclusion, this paper is certainly a worthy exploration of the nested sampler, providing further arguments towards a consistent version, with first and foremost an (almost?) unbiased resolution. The comparison with a wide range of alternatives remains open, in particular time-wise, if evidence is the sole target of the simulation. For instance, the choice of this sequence of targets in an SMC may be improved by another sequence, since changing one particle at a time does not sound efficient. The complexity of the implementation and in particular of the simulation from the prior under more and more stringent constraints need to be addressed.

## atheism: a very [very] short introduction [book review]

Posted in Books with tags agnosticism, atheism, beliefs, book review, David Hume, ethics, evidence, ex nihilo, existentialism, Friedrich Nietzsche, Julian Baggini, naturalism, Philosophy of religions, pholisophy, religion, very short introduction on November 3, 2017 by xi'an**A**fter the rather disappointing Edge of Reason, I gave a try at Baggini’s very brief introduction to atheism, which is very short. And equally very disappointing. Rather than approaching the topic from a (academic) philosophical perspective, *ex nihilo*, and while defending himself from doing so, the author indeed adopts a rather militant tone in trying to justify the arguments and ethics of atheism, setting the approach solely in a defensive opposition to religions. That is, in reverse, as an answer to faiths and creeds. Even when his arguments make complete sense, e.g., in the lack of support for agnosticism against atheism, the link with inductive reasoning (and Hume), and the logical [and obvious] disconnection between morality and religious attitudes.

“…once we accept the inductive method, we should, to be consistent, also accept that it points toward a naturalism that supports atheism…” (p.27)

While he mentions “militant atheism” as a fundamentalist position to be as avoided as the numerous religious versions, I find the whole exercise in this book missing the point of both an intellectual criticism of atheism [in the sense of Kant’s best seller!] and of the VSI series. Again, to define atheism as an answer to religions and to their irrationality is reducing the scope of this philosophical branch to a contrarian posture, rather than independently advancing a rationalist and scientific position on the entropic nature of life and the universe, one that does not require for a purpose or a higher cause. And to try to show it provides *better* answers to the *same* questions as those addressed by religions stoops down to their level.

“So it is not the case that atheism follows merely from some shallow commitment to the primacy of scientific inquiry.” (p.77)

The link therein with a philosophical analysis seems so weak that I deem the essay rather belongs to journalosophy. The very short history of atheism and its embarrassed debate on the attributed connections between atheism and some modern era totalitarianisms [found in the last chapter] are an illustration of this divergence from scholarly work. That the author felt the need to include pictures to illustrate his points says it all!

## WBIC, practically

Posted in Statistics with tags Bayes factor, Bayesian model selection, evidence, harmonic mean estimator, MCMC, nested sampling, Pima Indians, power posterior, thermodynamic integration, WBIC on October 20, 2017 by xi'an

“Thus far, WBIC has received no more than a cursory mention by Gelman et al. (2013)”

**I** had missed this 2015 paper by Nial Friel and co-authors on a practical investigation of Watanabe’s WBIC. Where WBIC stands for widely applicable Bayesian information criterion. The thermodynamic integration approach explored by Nial and some co-authors for the approximation of the evidence, thermodynamic integration that produces the log-evidence as an integral between temperatures t=0 and t=1 of a powered evidence, is eminently suited for WBIC, as the widely applicable Bayesian information criterion is associated with the specific temperature t⁰ that makes the power posterior equidistant, Kullback-Leibler-wise, from the prior and posterior distributions. And the expectation of the log-likelihood under this very power posterior equal to the (genuine) evidence. In fact, WBIC is often associated with the sub-optimal temperature 1/log(n), where n is the (effective?) sample size. (By comparison, if my minimalist description is unclear!, thermodynamic integration requires a whole range of temperatures and associated MCMC runs.) In an ideal Gaussian setting, WBIC improves considerably over thermodynamic integration, the larger the sample the better. In more realistic settings, though, including a simple regression and a logistic [Pima Indians!] model comparison, thermodynamic integration may do better for a given computational cost although the paper is unclear about these costs. The paper also runs a comparison with harmonic mean and nested sampling approximations. Since the integral of interest involves a power of the likelihood, I wonder if a safe version of the harmonic mean resolution can be derived from simulations of the genuine posterior. Provided the exact temperature t⁰ is known…

## marginal likelihoods from MCMC

Posted in Books, pictures, Statistics, University life with tags ABC, arXiv, Bayesian Methods in Cosmology, curse of dimensionality, evidence, INLA, k-nearest neighbour, marginal likelihood, nested sampling, Planck experiment, San Antonio, satellite on April 26, 2017 by xi'an**A** new arXiv entry on ways to approximate marginal likelihoods based on MCMC output, by astronomers (apparently). With an application to the 2015 Planck satellite analysis of cosmic microwave background radiation data, which reminded me of our joint work with the cosmologists of the Paris Institut d’Astrophysique ten years ago. In the literature review, the authors miss several surveys on the approximation of those marginals, including our San Antonio chapter, on Bayes factors approximations, but mention our ABC survey somewhat inappropriately since it is not advocating the use of ABC for such a purpose. (They mention as well variational Bayes approximations, INLA, powered likelihoods, if not nested sampling.)

The proposal of this paper is to identify the marginal *m* [actually denoted *a* there] as the normalising constant of an unnormalised posterior density. And to do so the authors estimate the posterior by a non-parametric approach, namely a k-nearest-neighbour estimate. With the additional twist of producing a sort of Bayesian posterior on the constant *m*. [And the unusual notion of number density, used for the unnormalised posterior.] The Bayesian estimation of m relies on a Poisson sampling assumption on the k-nearest neighbour distribution. (Sort of, since k is actually fixed, not random.)

If the above sounds confusing and imprecise it is because I am myself rather mystified by the whole approach and find it difficult to see the point in this alternative. The Bayesian numerics does not seem to have other purposes than producing a MAP estimate. And using a non-parametric density estimate opens a Pandora box of difficulties, the most obvious one being the curse of dimension(ality). This reminded me of the commented paper of Delyon and Portier where they achieve super-efficient convergence when using a kernel estimator, but with a considerable cost and a similar sensitivity to dimension.