**W**hen fishing for an illustration for this post on Google, I came upon this Bayesian methods for hackers cover, a book about which I have no clue whatsoever (!) but that mentions probabilistic programming. Which serves as a perfect (?!) introduction to the call for discussion in Bayesian Analysis of the incoming Bayesian conjugate gradient method by Jon Cockayne, Chris Oates (formerly Warwick), Ilse Ipsen and Mark Girolami (still partially Warwick!). Since indeed the paper is about probabilistic numerics à la Mark and co-authors. Surprisingly dealing with solving the deterministic equation Ax=b by Bayesian methods. The method produces a posterior distribution on the solution x⁰, given a fixed computing effort, which makes it pertain to the anytime algorithms. It also relates to an earlier 2015 paper by Christian Hennig where the posterior is on A⁻¹ rather than x⁰ (which is quite a surprising if valid approach to the problem!) The computing effort is translated here in computations of projections of random projections of Ax, which can be made compatible with conjugate gradient steps. Interestingly, the choice of the prior on x is quite important, including setting a low or high convergence rate… **Deadline is August 04!**

## Archive for Bayesian Analysis

## Bayesian conjugate gradients [open for discussion]

Posted in Books, pictures, Statistics, University life with tags Bayesian Analysis, Bayesian methods for hackers, discussion paper, probabilistic numerics, probabilistic programming, University of Warwick on June 25, 2019 by xi'an## from tramway to Panzer (or back!)…

Posted in Books, pictures, Statistics with tags Bayesian Analysis, German tank problem, Laplace succession rule, order statistics, The Bayesian Choice, tramway problem, tramways on June 14, 2019 by xi'an **A**lthough it is usually presented as *the tramway problem*, namely estimating the number of tram or bus lines in a city given observing one line number, including The Bayesian Choice by yours truly, the original version of the problem is about German tanks, Panzer V tanks to be precise, which total number *M* was to be estimated by the Allies from their observation of serial numbers of a number *k* of tanks. The Riddler is restating the problem when the only available information is made of the smallest, 22, and largest, 144, numbers, with no information about the number *k* itself. I am unsure what the Riddler means by “best” estimate, but a posterior distribution on *M* (and *k*) can be certainly be constructed for a prior like *1/k x 1/M²* on *(k,M)*. (Using M² to make sure the posterior mean does exist.) The joint distribution of the order statistics is

which makes the computation of the posterior distribution rather straightforward. Here is the posterior surface (with an unfortunate rendering of an artefactual horizontal line at 237!), showing a concentration near the lower bound M=144. The posterior mode is actually achieved for M=144 and k=7, while the posterior means are (rounded as) M=169 and k=9.

## leave Bayes factors where they once belonged

Posted in Statistics with tags Bayes factors, Bayesian Analysis, Bayesian decision theory, cross validated, prior comparison, prior predictive, prior selection, The Bayesian Choice, The Beatles, using the data twice, xkcd on February 19, 2019 by xi'an**I**n the past weeks I have received and read several papers (and X validated entries)where the Bayes factor is used to compare priors. Which does not look right to me, not on the basis of my general dislike of Bayes factors!, but simply because this seems to clash with the (my?) concept of Bayesian model choice and also because data should not play a role in that situation, from being used to select a *prior*, hence at least twice to run the inference, to resort to a *single* parameter value (namely the one behind the data) to decide between two distributions, to having no asymptotic justification, to eventually favouring the prior concentrated on the maximum likelihood estimator. And more. But I fear that this reticence to test for prior adequacy also extends to the prior predictive, or Box’s p-value, namely the probability under this prior predictive to observe something “more extreme” than the current observation, to quote from David Spiegelhalter.

## Bayesian intelligence in Warwick

Posted in pictures, Statistics, Travel, University life, Wines with tags ABC, AI, artificial intelligence, Bayesian Analysis, Bayesian intelligence, CRiSM, effective dimension, estimating constants, Monte Carlo integration, neural network, paradoxes, seminar, University of Warwick on February 18, 2019 by xi'an**T**his is an announcement for an exciting CRiSM Day in Warwick on 20 March 2019: with speakers

10:00-11:00 Xiao-Li Meng (Harvard): “Artificial Bayesian Monte Carlo Integration: A Practical Resolution to the Bayesian (Normalizing Constant) Paradox”

11:00-12:00 Julien Stoehr (Dauphine): “Gibbs sampling and ABC”

14:00-15:00 Arthur Ulysse Jacot-Guillarmod (École Polytechnique Fedérale de Lausanne): “Neural Tangent Kernel: Convergence and Generalization of Deep Neural Networks”

15:00-16:00 Antonietta Mira (Università della Svizzera italiana e Università degli studi dell’Insubria): “Bayesian identifications of the data intrinsic dimensions”

[whose abstracts are on the workshop webpage] and free attendance. The title for the workshop mentions Bayesian Intelligence: this obviously includes human intelligence and not just AI!

## statistics in Nature [a tale of the two Steves]

Posted in Books, pictures, Statistics with tags Bayesian Analysis, causality, clinical trials, frequentism, Nature, p-value hacking, placebo effect, statistical evidence, Stephen Senn, variability on January 15, 2019 by xi'an**I**n the 29 November issue of Nature, Stephen Senn (formerly at Glasgow) wrote an article about the pitfalls of personalized medicine, for the statistics behind the reasoning are flawed.

“What I take issue with is the de facto assumption that the differential response to a drug is consistent for each individual, predictable and based on some stable property, such as a yet-to-be-discovered genetic variant.”S. Senn

One (striking) reason being that the studies rest on a sort of low-level determinism that does not account for many sources of variability. Over-confidence in causality results. Stephen argues that improvement lies in insisting on repeated experiments on the same subjects (with an increased challenge in modelling since this requires longitudinal models with dependent observations). And to “drop the use of dichotomies”, favouring instead continuous modeling of measurements.

And in the 6 December issue, Steven Goodman calls (in the World view tribune) for probability statements to be attached as confidence indices to scientific claims. That he takes great pain to distinguish from p-values and links with Bayesian analysis. (Bayesian analysis that Stephen regularly objects to.) While I applaud the call, I am quite pessimistic about the follow-up it will generate, the primary reply being that posterior probabilities can be manipulated as well as p-values. And that Bayesian probabilities are not “real” probabilities (dixit Don Fraser or Deborah Mayo).

## talks at CIRM with special tee-shirts

Posted in Books, pictures, Statistics, University life with tags Þe Norse face, Bayesian Analysis, Centre International de Rencontres Mathématiques, CIRM, CNRS, HMC, JASP, logo, Luminy, Marseiile, master class, Monte Carlo Statistical Methods, STAN, tee-shirt, Université Aix Marseille, videoed lectures, ye Norse farce on November 21, 2018 by xi'an## X entropy

Posted in Books, Kids, pictures, Statistics, Travel, University life with tags Bayesian Analysis, Bayesian econometrics on November 16, 2018 by xi'an**A**nother discussion on X validated related to the maximum entropy priors and their dependence on the dominating measure μ chosen to define the score. With the same electrical engineering student as previously. In the wee hours at Casa Matematicà Oaxaca. As I took the [counter-]example of a Lebesgue dominating measure versus a Normal density times the Lebesgue measure producing the same maximum entropy distribution [with obviously the same density wrt to the Lebesgue measure] when the constraints involve the second moment, this confused the student and I spent some time constructing another example with different outcomes, when the Lebesgue measure versus the [artificial] dx/√|x| measure.

I am actually surprised at how limited the discussion of that point occurs in the literature (or at least in my googling attempt). Just a mention made in Bayesian Analysis in Statistics and Econometrics.