## Bayes Factors for Forensic Decision Analyses with R [book review]

Posted in Books, R, Statistics with tags , , , , , , , , , , , , , on November 28, 2022 by xi'an

My friend EJ Wagenmaker pointed me towards an entire book on the BF by Bozza (from Ca’Foscari, Venezia), Taroni and Biederman. It is providing a sort of blueprint for using Bayes factors in forensics for both investigative and evaluative purposes. With R code and free access. I am of course unable to judge of the relevance of the approach for forensic science (I was under the impression that Bayesian arguments were usually not well-received in the courtroom) but find that overall the approach is rather one of repositioning the standard Bayesian tools within a forensic framework.

“The [evaluative] purpose is to assign a value to the result of a comparison between an item of unknown source and an item from a known source.”

And thus I found nothing shocking or striking from this standard presentation of Bayes factors, including the call to loss functions, if a bit overly expansive in its exposition. The style is also classical, with a choice of grey background vignettes for R coding parts that we also picked in our R books! If anything, I would have expected more realistic discussions and illustrations of prior specification across the hypotheses (see e.g. page 34), while the authors are mostly centering on conjugate priors and the (de Finetti) trick of the equivalent prior sample size. Bayes factors are mostly assessed using a conservative version of Jeffreys’ “scale of evidence”. The computational section of the book introduces MCMC (briefly) and mentions importance sampling, harmonic mean (with a minimalist warning), and Chib’s formula (with no warning whatsoever).

“The [investigative] purpose is to provide information in investigative proceedings (…) The scientist (…) uses the findings to generate hypotheses and suggestions for explanations of observations, in order to give guidance to investigators or litigants.”

Chapter 2 is about standard models: inferring about a proportion, with some Monte Carlo illustration,  and the complication of background elements, normal mean, with an improper prior making an appearance [on p.69] with no mention being made of the general prohibition of such generalised priors when using Bayes factors or even of the Lindley-Jeffreys paradox. Again, the main difference with Bayesian textbooks stands with the chosen examples.

Chapter 3 focus on evidence evaluation [not in the computational sense] but, again, the coverage is about standard models: processing the Binomial, multinomial, Poisson models, again though conjugates. (With the side remark that Fig 3.2 is rather unhelpful: when moving the prior probability of the null from zero to one, its posterior probability also moves from zero to one!) We are back to the Normal mean case with the model variance being known then unknown. (An unintentionally funny remark (p.96) about the dependence between mean and variance being seen as too restrictive and replaced with… independence!). At last (for me!), the book is pointing [p.99] out that the BF is highly sensitive to the choice of the prior variance (Lindley-Jeffreys, where art thou?!), but with a return of the improper prior (on said variance, p.102) with no debate on the ensuing validity of the BF. Multivariate Normals are also presented, with Wishart priors on the precision matrix, and more details about Chib’s estimate of the evidence. This chapter also contains illustrations of the so-called score-based BF which is simply (?) a Bayes factor using a distribution on a distance summary (between an hypothetical population and the data) and an approximation of the distributions of these summaries, provided enough data is available… I also spotted a potentially interesting foray into BF variability (Section 3.4.2), although not reaching all the way to a notion of BF posterior distributions.

Chapter 4 stands for Bayes factors for investigation, where alternative(s) is(are) less specified, as testing eg Basmati rice vs non-Basmati rice. But there is no non-parametric alternative considered in the book. Otherwise, it looks to me rather similar to Chapter 3, i.e. being back to binomial, multinomial models, with more discussions onm prior specification, more normal, or non-normal model, where the prior distribution is puzzingly estimated by a kernel density estimator, a portmanteau alternative (p.157), more multivariate Normals with Wishart priors and an entry on classification & discrimination.

[Disclaimer about potential self-plagiarism: this post or an edited version will eventually appear in my Books Review section in CHANCE. As appropriate for a book about Chance!]

## mixtures at BNP [slides]

Posted in Mountains, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , on October 29, 2022 by xi'an

After chatting with some BNP13 participants at the Puerto Montt airport, I gave in (!) to their kind request to put my slides on-line and here is the link to the slideshare depository. It was quite the nice coincidence that Sanjib Basu (whom I met in Purdue in 1987!) gave the invited talk in our session since we were building on the under-appreciated Basu-Chib approximation of the evidence. Overall, this was an exhilarating week and I now have to recover from this sensory overload. (Incidentally, and uninterestingly, I got swindled by not one but two taxis on my way back to Santiago!)

## Another harmonic mean

Posted in Books, Statistics, University life with tags , , , , , , , , on May 21, 2022 by xi'an

Yet another paper that addresses the approximation of the marginal likelihood by a truncated harmonic mean, a popular theme of mine. A 2020 paper by Johannes Reich, entitled Estimating marginal likelihoods from the posterior draws through a geometric identity and published in Monte Carlo Methods and Applications.

The geometric identity it aims at exploiting is that

$m(x) = \frac{\int_A \,\text d\theta}{\int_A \pi(\theta|x)\big/\pi(\theta)f(x|\theta)\,\text d\theta}$

for any (positive volume) compact set $A$. This is exactly the same identity as in an earlier and uncited 2017 paper by Ana Pajor, with the also quite similar (!) title Estimating the Marginal Likelihood Using the Arithmetic Mean Identity and which I discussed on the ‘Og, linked with another 2012 paper by Lenk. Also discussed here. This geometric or arithmetic identity is again related to the harmonic mean correction based on a HPD region A that Darren Wraith and myself proposed at MaxEnt 2009. And that Jean-Michel and I presented at Frontiers of statistical decision making and Bayesian analysis in 2010.

In this avatar, the set A is chosen close to an HPD region, once more, with a structure that allows for an exact computation of its volume. Namely an ellipsoid that contains roughly 50% of the simulations from the posterior (rather than our non-intersecting union of balls centered at the 50% HPD points), which assumes a Euclidean structure of the parameter space (or, in other words, depends on the parameterisation)In the mixture illustration, the author surprisingly omits Chib’s solution, despite symmetrised versions avoiding the label (un)switching issues. . What I do not get is how this solution gets around the label switching challenge in that set A remains an ellipsoid for multimodal posteriors, which means it either corresponds to a single mode [but then how can a simulation be restricted to a “single permutation of the indicator labels“?] or it covers all modes but also the unlikely valleys in-between.

## evidence estimation in finite and infinite mixture models

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , on May 20, 2022 by xi'an

Adrien Hairault (PhD student at Dauphine), Judith and I just arXived a new paper on evidence estimation for mixtures. This may sound like a well-trodden path that I have repeatedly explored in the past, but methinks that estimating the model evidence doth remain a notoriously difficult task for large sample or many component finite mixtures and even more for “infinite” mixture models corresponding to a Dirichlet process. When considering different Monte Carlo techniques advocated in the past, like Chib’s (1995) method, SMC, or bridge sampling, they exhibit a range of performances, in terms of computing time… One novel (?) approach in the paper is to write Chib’s (1995) identity for partitions rather than parameters as (a) it bypasses the label switching issue (as we already noted in Hurn et al., 2000), another one is to exploit  Geyer (1991-1994) reverse logistic regression technique in the more challenging Dirichlet mixture setting, and yet another one a sequential importance sampling solution à la  Kong et al. (1994), as also noticed by Carvalho et al. (2010). [We did not cover nested sampling as it quickly becomes onerous.]

Applications are numerous. In particular, testing for the number of components in a finite mixture model or against the fit of a finite mixture model for a given dataset has long been and still is an issue of much interest and diverging opinions, albeit yet missing a fully satisfactory resolution. Using a Bayes factor to find the right number of components K in a finite mixture model is known to provide a consistent procedure. We furthermore establish there the consistence of the Bayes factor when comparing a parametric family of finite mixtures against the nonparametric ‘strongly identifiable’ Dirichlet Process Mixture (DPM) model.

## [more than] everything you always wanted to know about marginal likelihood

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , , , on February 10, 2022 by xi'an

Earlier this year, F. Llorente, L. Martino, D. Delgado, and J. Lopez-Santiago have arXived an updated version of their massive survey on marginal likelihood computation. Which I can only warmly recommend to anyone interested in the matter! Or looking for a base camp to initiate a graduate project. They break the methods into four families

1. Deterministic approximations (e.g., Laplace approximations)
2. Methods based on density estimation (e.g., Chib’s method, aka the candidate’s formula)
3. Importance sampling, including sequential Monte Carlo, with a subsection connecting with MCMC
4. Vertical representations (mostly, nested sampling)

Besides sheer computation, the survey also broaches upon issues like improper priors and alternatives to Bayes factors. The parts I would have done in more details are reversible jump MCMC and the long-lasting impact of Geyer’s reverse logistic regression (with the noise contrasting extension), even though the link with bridge sampling is briefly mentioned there. There is even a table reporting on the coverage of earlier surveys. Of course, the following postnote of the manuscript

The Christian Robert’s blog deserves a special mention , since Professor C. Robert has devoted several entries of his blog with very interesting comments regarding the marginal likelihood estimation and related topics.

does not in the least make me less objective! Some of the final recommendations

• use of Naive Monte Carlo [simulate from the prior] should be always considered [assuming a proper prior!]
• a multiple-try method is a good choice within the MCMC schemes
• optimal umbrella sampling estimator is difficult and costly to implement , so its best performance may not be achieved in practice
• adaptive importance sampling uses the posterior samples to build a suitable normalized proposal, so it benefits from localizing samples in regions of high posterior probability while preserving the properties of standard importance sampling
• Chib’s method is a good alternative, that provide very good performances [but is not always available]
• the success [of nested sampling] in the literature is surprising.