## Bayes Factors for Forensic Decision Analyses with R [book review]

My friend EJ Wagenmaker pointed me towards an entire book on the BF by Bozza (from Ca’Foscari, Venezia), Taroni and Biederman. It is providing a sort of blueprint for using Bayes factors in forensics for both investigative and evaluative purposes. With R code and free access. I am of course unable to judge of the relevance of the approach for forensic science (I was under the impression that Bayesian arguments were usually not well-received in the courtroom) but find that overall the approach is rather one of repositioning the standard Bayesian tools within a forensic framework.

“The [evaluative] purpose is to assign a value to the result of a comparison between an item of unknown source and an item from a known source.”

And thus I found nothing shocking or striking from this standard presentation of Bayes factors, including the call to loss functions, if a bit overly expansive in its exposition. The style is also classical, with a choice of grey background vignettes for R coding parts that we also picked in our R books! If anything, I would have expected more realistic discussions and illustrations of prior specification across the hypotheses (see e.g. page 34), while the authors are mostly centering on conjugate priors and the (de Finetti) trick of the equivalent prior sample size. Bayes factors are mostly assessed using a conservative version of Jeffreys’ “scale of evidence”. The computational section of the book introduces MCMC (briefly) and mentions importance sampling, harmonic mean (with a minimalist warning), and Chib’s formula (with no warning whatsoever).

“The [investigative] purpose is to provide information in investigative proceedings (…) The scientist (…) uses the findings to generate hypotheses and suggestions for explanations of observations, in order to give guidance to investigators or litigants.”

Chapter 2 is about standard models: inferring about a proportion, with some Monte Carlo illustration,  and the complication of background elements, normal mean, with an improper prior making an appearance [on p.69] with no mention being made of the general prohibition of such generalised priors when using Bayes factors or even of the Lindley-Jeffreys paradox. Again, the main difference with Bayesian textbooks stands with the chosen examples.

Chapter 3 focus on evidence evaluation [not in the computational sense] but, again, the coverage is about standard models: processing the Binomial, multinomial, Poisson models, again though conjugates. (With the side remark that Fig 3.2 is rather unhelpful: when moving the prior probability of the null from zero to one, its posterior probability also moves from zero to one!) We are back to the Normal mean case with the model variance being known then unknown. (An unintentionally funny remark (p.96) about the dependence between mean and variance being seen as too restrictive and replaced with… independence!). At last (for me!), the book is pointing [p.99] out that the BF is highly sensitive to the choice of the prior variance (Lindley-Jeffreys, where art thou?!), but with a return of the improper prior (on said variance, p.102) with no debate on the ensuing validity of the BF. Multivariate Normals are also presented, with Wishart priors on the precision matrix, and more details about Chib’s estimate of the evidence. This chapter also contains illustrations of the so-called score-based BF which is simply (?) a Bayes factor using a distribution on a distance summary (between an hypothetical population and the data) and an approximation of the distributions of these summaries, provided enough data is available… I also spotted a potentially interesting foray into BF variability (Section 3.4.2), although not reaching all the way to a notion of BF posterior distributions.

Chapter 4 stands for Bayes factors for investigation, where alternative(s) is(are) less specified, as testing eg Basmati rice vs non-Basmati rice. But there is no non-parametric alternative considered in the book. Otherwise, it looks to me rather similar to Chapter 3, i.e. being back to binomial, multinomial models, with more discussions onm prior specification, more normal, or non-normal model, where the prior distribution is puzzingly estimated by a kernel density estimator, a portmanteau alternative (p.157), more multivariate Normals with Wishart priors and an entry on classification & discrimination.

## day five at ISBA 22

Woke up even earlier today! Which left me time to work on switching to Leonard Cohen’s song titles for my slide frametitles this afternoon (last talk of the whole conference!), run once again to Mon(t) Royal as all pools are closed (Happy Canada Day!, except to “freedom convoy” antivaxxxers.) Which led to me meeting a raccoon by the side of the path (and moroons feeding wildlife).

Had an exciting time at the morning session, where Giacomo Zanella (formerly Warwick) talked on a mixture approach to leave-one-out predictives, with pseudo-harmonic mean representation, averaging inverse density across all observations. Better than harmonic? Some assumptions allow for finite variance, although I am missing the deep argument (in part due to Giacomo’s machine-gun delivery pace!) Then Alicia Corbella (Warwick) presented a promising entry into PDMP by proposing an automated zig-zag sampler. Pointing out on the side to Joris Bierkens’ webpage on the state-of-the-art PDMP methodology. In this approach, joint with with my other Warwick colleagues Simon Spencer and Gareth Roberts, the zig-zag sampler relies on automatic differentiation and sub-sampling and bound derivation, with “no further information on the target needed”. And finaly Chris Carmona presented a joint work with Geoff Nicholls that is merging merging cut posteriors and variational inference to create a meta posterior. Work and talk were motivated by a nice medieval linguistic problem where the latent variables impact the (convergence of the) MCMC algorithm [as in our k-nearest neighbour experience]. Interestingly using normalising [neural spline] flows. The pseudo-posterior seems to depend very much on their modularization rate η, which penalises how much one module influences the next one.

In the aft, I attended sort of by chance [due to a missing speaker in the copula session] to the end of a session on migration modelling, with a talk by Jason Hilton and Martin Hinsch focussing on the 2015’s mass exodus of Syrians through the Mediterranean,  away from the joint evils of al-Hassad and ISIS. As this was a tragedy whose modelling I had vainly tried to contribute to, I was obviously captivated and frustrated (leaning of the IOM missing migrant project!) Fitting the agent-based model was actually using ABC, and most particularly our ABC-PMC!!!

My own and final session had Gareth (Warwick) presenting his recent work with Jun Yang and Kryzs Łatuszyński (Warwick) on the stereoscopic projection improvement over regular MCMC, which involves turning the target into a distribution supported by an hypersphere and hence considering a distribution with compact support and higher efficiency. Kryzs had explained the principle while driving back from Gregynog two months ago. The idea is somewhat similar to our origaMCMC, which I presented at MCqMC 2016 in Stanford (and never completed), except our projection was inside a ball. Looking forward the adaptive version, in the making!

And to conclude this subjective journal from the ISBA conference, borrowing this title by (Westmount born) Leonard Cohen, “Hey, that’s not a way to say goodbye”… To paraphrase Bilbo Baggins, I have not interacted with at least half the participants half as much as I would have liked. But this was still a reunion, albeit in the new Normal. Hopefully, the conference will not have induced a massive COVID cluster on top of numerous scientific and social exchanges! The following days will tell. Congrats to the ISBA 2022 organisers for achieving a most successful event in these times of uncertainty. And looking forward the 2024 next edition in Ca’Foscari, Venezia!!!

## capture-recapture rediscovered

A recent Science paper applies capture-recapture to estimating how much medieval literature has been lost, using ancient lists of works and comparing with the currently know corpus. To deduce at a 91% loss. Which begets the next question of how many ancient lists have been lost! Or how many of the observed ones are sheer copies of the others. First I thought I had no access to the paper so could not comment on the specific data and accounting for the uneven and unrandom sampling behind this modelling… But I still would not share the anti-modelling bias of this Harvard historian, given the superlative record of Anne Chao in capture-recapture methodology!

“The paper seems geared more toward systems theorists and statisticians, says Daniel Smail, a historian at Harvard University who studies medieval social and cultural history, and the authors haven’t done enough to establish why cultural production should follow the same rules as life systems. But for him, the bigger question is: Given that we already have catalogs of ancient texts, and previous estimates were pretty close to the model’s new one, what does the new work add?”

Once at Ca’Foscari, I realised the local network gave me access to the paper. The description of the Chao1 method, as far as I can tell, does not describe how the problematic collection of catalogs where duplicates (recaptures) can be observed is taken into account. For one thing, the collection is far from iid since some catalogs must have built on earlier ones. It is also surprising imho that the authors spend space on discussing unbiasedness when a more crucial issue is the randomness assumption behind the collected data.

## Ca’ Foscari closed due to 19nCoV scare

An email from the Rettore I just received on my Ca’ Foscari account, announcing the University is closed over all its campi due to some cases of Coronavirus in the Veneto region:

Care colleghe e cari colleghi tutti, care studentesse e cari studenti,

abbiamo avuto ieri e oggi la notizia di diversi casi di infezione da Coronavirus 19nCoV in Veneto, una situazione che prevede misure appropriate e la massima attenzione.

Invito innanzitutto l’intera comunità accademica a seguire con grande scrupolo le prescrizioni che l’unità di crisi regionale ha emanato, ed emanerà sulla base della evoluzione del contagio. Si raccomandano in particolare le seguenti misure di prevenzione (nota OMS e linea indirizzo MUR):
• Lavare spesso le mani con acqua e sapone o gel disinfettanti
• Quando si tossisce o starnutisce, coprire la bocca e il naso con il gomito o fazzoletto usa e getta, lavandosi poi le mani
• Evitare il contatto con chiunque abbia febbre e tosse.

Riguardo alle attività accademiche, al fine di ridurre le possibilità di contagio, si dispone – secondo le indicazioni del Presidente Luca Zaia e in coordinamento con le università del Veneto –  la sospensione delle lezioni e degli esami in tutte le sedi dell’università, a Venezia, Mestre, Treviso e Roncade dal 24/02 al 29/02 compresi. Biblioteche e aule studio saranno chiuse dal 23/02 al 01/03.
Il recupero delle lezioni e degli esami verrà comunicato quanto prima sul sito web di Ca’ Foscari e sui canali di comunicazione ufficiali.
Per il personale tutto,  docente e non docente, le attività si svolgeranno regolarmente, fatte salve le ordinanze locali che vincolino la mobilità delle persone.
L’Ateneo è in continuo contatto con l’unità di crisi e con i ministeri competenti, e provvederà ad aggiornare le misure oggi vigenti sulla base dell’evoluzione della situazione.
Il Rettore
Michele Bugliesi