## Bayes Factors for Forensic Decision Analyses with R [book review]

Posted in Books, R, Statistics with tags , , , , , , , , , , , , , on November 28, 2022 by xi'an

My friend EJ Wagenmaker pointed me towards an entire book on the BF by Bozza (from Ca’Foscari, Venezia), Taroni and Biederman. It is providing a sort of blueprint for using Bayes factors in forensics for both investigative and evaluative purposes. With R code and free access. I am of course unable to judge of the relevance of the approach for forensic science (I was under the impression that Bayesian arguments were usually not well-received in the courtroom) but find that overall the approach is rather one of repositioning the standard Bayesian tools within a forensic framework.

“The [evaluative] purpose is to assign a value to the result of a comparison between an item of unknown source and an item from a known source.”

And thus I found nothing shocking or striking from this standard presentation of Bayes factors, including the call to loss functions, if a bit overly expansive in its exposition. The style is also classical, with a choice of grey background vignettes for R coding parts that we also picked in our R books! If anything, I would have expected more realistic discussions and illustrations of prior specification across the hypotheses (see e.g. page 34), while the authors are mostly centering on conjugate priors and the (de Finetti) trick of the equivalent prior sample size. Bayes factors are mostly assessed using a conservative version of Jeffreys’ “scale of evidence”. The computational section of the book introduces MCMC (briefly) and mentions importance sampling, harmonic mean (with a minimalist warning), and Chib’s formula (with no warning whatsoever).

“The [investigative] purpose is to provide information in investigative proceedings (…) The scientist (…) uses the findings to generate hypotheses and suggestions for explanations of observations, in order to give guidance to investigators or litigants.”

Chapter 2 is about standard models: inferring about a proportion, with some Monte Carlo illustration,  and the complication of background elements, normal mean, with an improper prior making an appearance [on p.69] with no mention being made of the general prohibition of such generalised priors when using Bayes factors or even of the Lindley-Jeffreys paradox. Again, the main difference with Bayesian textbooks stands with the chosen examples.

Chapter 3 focus on evidence evaluation [not in the computational sense] but, again, the coverage is about standard models: processing the Binomial, multinomial, Poisson models, again though conjugates. (With the side remark that Fig 3.2 is rather unhelpful: when moving the prior probability of the null from zero to one, its posterior probability also moves from zero to one!) We are back to the Normal mean case with the model variance being known then unknown. (An unintentionally funny remark (p.96) about the dependence between mean and variance being seen as too restrictive and replaced with… independence!). At last (for me!), the book is pointing [p.99] out that the BF is highly sensitive to the choice of the prior variance (Lindley-Jeffreys, where art thou?!), but with a return of the improper prior (on said variance, p.102) with no debate on the ensuing validity of the BF. Multivariate Normals are also presented, with Wishart priors on the precision matrix, and more details about Chib’s estimate of the evidence. This chapter also contains illustrations of the so-called score-based BF which is simply (?) a Bayes factor using a distribution on a distance summary (between an hypothetical population and the data) and an approximation of the distributions of these summaries, provided enough data is available… I also spotted a potentially interesting foray into BF variability (Section 3.4.2), although not reaching all the way to a notion of BF posterior distributions.

Chapter 4 stands for Bayes factors for investigation, where alternative(s) is(are) less specified, as testing eg Basmati rice vs non-Basmati rice. But there is no non-parametric alternative considered in the book. Otherwise, it looks to me rather similar to Chapter 3, i.e. being back to binomial, multinomial models, with more discussions onm prior specification, more normal, or non-normal model, where the prior distribution is puzzingly estimated by a kernel density estimator, a portmanteau alternative (p.157), more multivariate Normals with Wishart priors and an entry on classification & discrimination.

[Disclaimer about potential self-plagiarism: this post or an edited version will eventually appear in my Books Review section in CHANCE. As appropriate for a book about Chance!]

## martingale posteriors

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , on November 7, 2022 by xi'an

A new Royal Statistical Society Read Paper featuring Edwin Fong, Chris Holmes, and Steve Walker. Starting from the predictive

$p(y_{n+1:+\infty}|y_{1:n})\ \ \ (1)$

rather than from the posterior distribution on the parameter is a fairly novel idea, also pursued by Sonia Petrone and some of her coauthors. It thus adopts a de Finetti’s perspective while adding some substance to the rather metaphysical nature of the original. It however relies on the “existence” of an infinite sample in (1) that assumes a form of underlying model à la von Mises or at least an infinite population. The representation of a parameter θ as a function of an infinite sequence comes as a shock first but starts making sense when considering it as a functional of the underlying distribution. Of course, trading (modelling) a random “opaque” parameter θ for (envisioning) an infinite sequence of random (un)observations may sound like a sure loss rather than as a great deal, but it gives substance to the epistemic uncertainty about a distributional parameter, even when a model is assumed, as in Example 1, which defines θ in the usual parametric way (i.e., the mean of the iid variables). Furthermore, the link with bootstrap and even more Bayesian bootstrap becomes clear when θ is seen this way.

Always a fan of minimal loss approaches, but (2.4) defines either a moment or a true parameter value that depends on the parametric family indexed by θ. Hence does not exist outside the primary definition of said parametric family. The following construct of the empirical cdf based on the infinite sequence as providing the θ function is elegant but what is its Bayesian justification? (I did not read Appendix C.2. in full detail but could not spot the prior on F.)

“The resemblance of the martingale posterior to a bootstrap estimator should not have gone unnoticed”

I am always fan of minimal loss approaches, but I wonder at (2.4), as it defines either a moment or a true parameter value that depends on the parametric family indexed by θ. Hence it does not exist outside the primary definition of said parametric family, which limits its appeal. The following construct of the empirical cdf based on the infinite sequence as providing the θ function is elegant and connect with bootstrap, but I wonder at its Bayesian justification. (I did not read Appendix C.2. in full detail but could not spot a prior on F.)

While I completely missed the resemblance, it is indeed the case that, if the predictive at each step is build from the earlier “sample”, the support is not going to evolve. However, this is not particularly exciting as the Bayesian non-parametric estimator is most rudimentary. This seems to bring us back to Rubin (1981) ?! A Dirichlet prior is mentioned with no further detail. And I am getting confused at the complete lack of structure, prior, &tc. It seems to contradict the next section:

“While the prescription of (3.1) remains a subjective task, we find it to be no more subjective than the selection of a likelihood function”

Copulas!!! Again, I am very glad to see copulas involved in the analysis. However, I remain unclear as to why Corollary 1 implies that any sequence of copulas could do the job. Further, why does the Gaussian copula appear as the default choice? What is the computing cost of the update (4.4) after k steps? Similarly (4.7) is using a very special form of copula, with independent-across-dimension increments. I am also missing a guided tour on the implementation, as it sounds explosive in book-keeping and multiplying, while relying on a single hyperparameter in (4.5.2)?

In the illustration section, the use of the galaxy dataset may fail to appeal to Radford Neal, in a spirit similar to Chopin’s & Ridgway’s call to leave the Pima Indians alone, since he delivered a passionate lecture on the inappropriateness of a mixture model for this dataset (at ICMS in 2001). I am unclear as to where the number of modes is extracted from the infinite predictive. What is $\theta$ in this case?

Copulas!!! Although I am unclear why Corollary 1 implies that any sequence of copulas does the job. And why the Gaussian copula appears as the default choice. What is the computing cost of the update (4.4) after k steps? Similarly (4.7) is using a very special form of copula, with independent-across-dimension increments. Missing a guided tour on the implementation, as it sounds explosive in book-keeping and multiplying. A single hyperparameter (4.5.2)?

## BNP13

Posted in Mountains, pictures, Running, Statistics, Travel with tags , , , , , , , , , , , , , , , , on October 28, 2022 by xi'an

BNP13 is set in this incredible location on a massive lake (almost as large as Lac Saint Jean!) facing several tantalizing snow-capped volcanoes… My trip from Paris to Puerto Varas was quite smooth if relatively longish (but I slept close to 8 hours on the first leg and busied myself with Biometrika submissions the rest of the way). Leaving from Paris at midnight proved a double advantage as this was one of the last flights leaving, with hardly anyone in the airport. On Sunday, I arrived early enough to take a quick dip in Lake Llanquihue which was fairly cold and choppy!

Overall the conference is quite exhilarating as all talks are of interest and often covering on-going research. This may be one of the most engaging meetings I have attended in the past years! Plus a refreshing variety of topics and seniority in the speakers.

To start with a bang!, Sonia Petrone (Bocconi) gave a very nice plenary lecture in the most auspicious manner, covering her recent works on Bayesian prediction as an alternative way to run Bayesian inference (in connection with the incoming Read Paper by Fong et al.). She covered so much ground that I got lost before long (jetlag did not help!). However, an interesting feature underlying her talk is that, under exchangeability, the sequence of predictives converges to a random probability measure, a de Finetti way to construct the prior that is based on predictives. Avoiding in a sense the model and the prior on the parameters of that process. (The parameter is derived from the infinite exchangeable [or conditionally iid] sequence, but the sequence of predictives need be defined.) The drawback is that this approach involves infinite sequences, with practical truncation to a finite horizon being an approximation whose precision / error may prove elusive to characterise. The predictive approach also allows to recover a limiting Normal distribution (not a Bernstein-von Mises type!) and hence credible intervals on parameters and distributions.

While this is indeed a BNP conference (!), I was surprised to see lot of talks paying attention to clustering and even to mixtures, with again a recurrent imprecision on the meaning of a cluster. (Maybe this was already the case for BNP11 in Paris but I may have been too busy helping with catering to notice!) For instance, Brian Trippe (MIT) gave a quick intro on his (AISTATS 2022) work on parallel MCMC with coupling. As unbiased MCMC strongly improving upon naïve parallel MCMC relative to the computing cost. With an interesting example where coupling is agnostic to the labeling of random partitions in clustering problems, involving optimal transport, manageable in O(K³log(K)) time when K is the number of clusters.

## day one at ISBA 22

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , , , , , , , on June 29, 2022 by xi'an

Started the day with a much appreciated swimming practice in the [alas warm⁺⁺⁺] outdoor 50m pool on the Island with no one but me in the slooow lane. And had my first ride with the biXi system, surprised at having to queue behind other bikes at red lights! More significantly, it was a great feeling to reunite at last with so many friends I had not met for more than two years!!!

My friend Adrian Raftery gave the very first plenary lecture on his work on the Bayesian approach to long-term population projections, which was recently  a work censored by some US States, then counter-censored by the Supreme Court [too busy to kill Roe v. Wade!]. Great to see the use of Bayesian methods validated by the UN Population Division [with at least one branch of the UN

Stephen Lauritzen returning to de Finetti notion of a model as something not real or true at all, back to exchangeability. Making me wonder when exchangeability is more than a convenient assumption leading to the Hewitt-Savage theorem. And sufficiency. I mean, without falling into a Keynesian fallacy, each point of the sample has unique specificities that cannot be taken into account in an exchangeable model. Nice to hear some measure theory, though!!! Plus a comment on the median never being sufficient, recouping an older (and presumably not original) point of mine. Stephen’s (or Fisher’s?) argument being that the median cannot be recursively computed!

Antonietta Mira and I had our ABC session this afternoon with Cecilia Viscardi, Sirio Legramanti, and Massimiliano Tamborino (Warwick) as speakers. Cecilia linked ABC with normalising flows, in collaboration with Dennis Prangle (whose earlier paper on this connection was presented as the first One World ABC seminar). Thus using past simulations to approximate the posterior by a neural network, possibly with a significant increase in computing time when compared with more rudimentary SMC-ABC methods in larger dimensions. Sirio considered summary-free ABC based on discrepancies like Rademacher complexity. Which more or less contains MMD, Kullback-Leibler, Wasserstein and more, although it seems to be dependent on the parameterisation of the observations. An interesting opening at the end was that this approach could apply to non iid settings. Massi presented a paper coauthored with Umberto that had just been arXived. On sequential ABC with a dependence on the summary statistic (hence guided). Further bringing copulas into the game, although this forces another choice [for the marginals] in the method.

Tamara Broderick talked about a puzzling leverage effect of some observations in economic studies where a tiny portion of individuals may modify the significance or the sign of a coefficient, for which I cannot tell whether the data or the reliance on statistical significance are to blame. Robert Kohn presented mixture-of-Gaussian copulas [not to be confused with mixture of Gaussian-copulas!] and Nancy Reid concluded my first [and somewhat exhausting!] day at ISBA with a BFF talk on the different statistical paradigms take on confidence (for which the notion of calibration seems to remain frequentist).

Side comments: First, most people in the conference are wearing masks, which is great! Also, I find it hard to read slides from the screen, which I presume is an age issue (?!) Even more aside, I had Korean lunch in a place that refused to serve me a glass of water, which I find amazing.

## statistical illiteracy

Posted in Statistics with tags , , , , , , , , , , , on October 27, 2020 by xi'an

An opinion tribune in the Guardian today about the importance of statistical literacy in these COVIdays, entitled “Statistical illiteracy isn’t a niche problem. During a pandemic, it can be fatal“, by Carlo Rovelli (a physics professor on Luminy campus) which, while well-intended, is not particularly helping. For instance, the tribune starts with a story of a cluster of a rare disease happening in a lab along with the warning that [Poisson] clusters also occur with uniform sampling. But.. being knowledgeable about the Poisson process may help in reducing the psychological stress within the lab only if the cluster size is compatible with the prevalence of the disease in the neighbourhood. Obviously, a poor understanding of randomness and statistical tools has not help with the handling of the pandemics by politicians, decision-makers, civil servants and doctors (although I would have added the fundamental misconception about scientific models which led most people to confuse the map with the territory and later cry wolf…)

Rovelli also cites Bruno de Finetti as “the key to understanding probability”, as a representation of one’s beliefs rather than a real thing. While I agree with this Bayesian perspective, I am unsure it will percolate well enough with the Guardian audience. And bring more confidence in the statistical statements made by experts…

It is only when I finished reading the column that I realised it was adapted from a book soon to appear by the author. And felt slightly cheated. [Obviously, I did not read it so this is NOT a book review!]