In an X validated question, an interesting proposal was made: at each (component-wise) step of a Gibbs sampler, replace simulation from the exact full conditional with simulation from an alternate density and weight the resulting simulation with a term made of a product of (a) the previous weight (b) the ratio of the true conditional over the substitute for the new value and (c) the inverse ratio for the earlier value of the same component. Which does not work for several reasons:

the reweighting is doomed by its very propagation in that it keeps multiplying ratios of expectation one, which means an almost sure chance of degenerating;

the weights are computed for a previous value that has not been generated from the same proposal and is anyway already properly weighted;

due to the change in dimension produced by Gibbs, the actual target is the full conditional, which involves an intractable normalising constant;

there is no guarantee for the weights to have finite variance, esp. when the proposal has thinner tails than the target.

as can be readily checked by a quick simulation experiment. The funny thing is that a proper importance weight can be constructed when envisioning the sequence of Gibbs steps as a Metropolis proposal (in the dimension of the target).

A very short book (128 pages, but with a very high price!) I received from CRC Press is Henk Tijms’ Surprises in Probability (Seventeen Short Stories).Henk Tijms is an emeritus professor of econometrics at the Vrije University in Amsterdam and he wrote these seventeen pieces either for the Dutch Statistical Society magazine or for a blog he ran for the NYt. (The video of A Night in Casablanca above is only connected to this blog through Chico mimicking the word surprise as soup+rice.)

The author mentions that the book can be useful for teachers and indeed this is a collection of surprising probability results, surprising in the sense that the numerical probabilities are not necessarily intuitive. Most illustrations involve betting of one sort or another, with only basic (combinatorial) probability distributions involved. Readers should not worry about even this basic probability background since most statements are exposed without a proof. Most examples are very classical, from the prisoner’s problem, to the Monty Hall paradox, to the birthday problem, to Benford’s distribution of digits, to gambler’s ruin, gambler’s fallacy, and the St Petersbourg paradox, to the secretary’s problem and stopping rules. The most advanced notion is the one of (finite state) Markov chains. As martingales are only mentionned in connection with pseudo-probabilist schemes for winning the lottery. For which (our very own!) Jeff Rosenthal makes an appearance, thanks to his uncovering of the Ontario Lottery scam!

“In no other branch of mathematics is it so easy for experts to blunder as in probability theory.” Martin Gardner

A few stories have entries about Bayesian statistics, with mentions made of the O.J. Simpson, Sally Clark and Lucia de Berk miscarriages of justice, although these mentions make the connection most tenuous. Simulation is also mentioned as a manner of achieving approximations to more complex probabilities. But not to the point of discussing surprises about simulation, which could have been the case with the simulation of rare events.

Ten most beautiful probability formulas (Story 10) reminded me of Ian Steward 17 formulas that changed the World. Obviously at another scale and in a much less convincing way. To wit, the Normal (or Gauss) density, Bayes’ formula, the gambler’s ruin formula, the squared-root formula (meaning standard deviation decreases as √n), Kelly’s betting formula (?), the asymptotic law of distribution of prime numbers (??), another squared-root formula for the one-dimensional random walk, the newsboy formula (?), the Pollaczek-Khintchine formula (?), and the waiting-time formula. I am not sure I would have included any of these…

All in all this is a nice if unsurprising database for illustrations and possibly exercises in elementary probability courses, although it will require some work from the instructor to link the statements to their proof. As one would expect from blog entries. But this makes for a nice reading, especially while traveling and I hope some fellow traveler will pick the book from where I left it in Mexico City airport.

Read this article on Mark Girolami (Warwick), now Lloyd’s Register Foundation / Royal Academy of Engineering Research Chair in Data Centric Engineering, who is starting a new project on the monitoring of the first 3D-printed bridge, soon to be installed in Amsterdam, by creating a virtual twin, fed by sensors from the real bridge, in order to check for safety and integrity. I like this notion of data-centric engineering! (Which sounds like the revenge of the statistician, at least in the ancient era of French engineering schools, when statistics was not considered a part of engineering.)

Another question on X validated got me highly interested for a while, as I had considered myself the problem in the past, until I realised while discussing with Murray Pollock in Warwick that there was no general answer: when a density f is represented as an infinite series decomposition into weighted densities, some weights being negative, is there an efficient way to generate from such a density? One natural approach to the question is to look at the mixture with positive weights, f⁺, since it gives an upper bound on the target density. Simulating from this upper bound f⁺ and accepting the outcome x with probability equal to the negative part over the sum of the positive and negative parts f⁻(x)/f(x) is a valid solution. Except that it is not implementable if

the positive and negative parts both involve infinite sums with no exploitable feature that can turn them into finite sums or closed form functions,

the sum of the positive weights is infinite, which is the case when the series of the weights is not absolutely converging.

Even when the method is implementable it may be arbitrarily inefficient in the sense that the probability of acceptance is equal to to the inverse of the sum of the positive weights and that simulating from the bounding mixture in the regular way uses the original weights which may be unrelated in size with the actual importance of the corresponding components in the actual target. Hence, when expressed in this general form, the problem cannot allow for a generic solution.

Obviously, if more is known about the components of the mixture, as for instance the sequence of weights being alternated, there exist specialised methods, as detailed in the section of series representations in Devroye’s (1985) simulation bible. For instance, in the case when positive and negative weight densities can be paired, in the sense that their weighted difference is positive, a latent index variable can be included. But I cannot think of a generic method where the initial positive and negative components are used for simulation, as it may on the opposite be the case that no finite sum difference is everywhere positive.

Quentin F. Gronau, Henrik Singmann and Eric-Jan Wagenmakers have arXived a detailed documentation about their bridgesampling R package. (No wonder that researchers from Amsterdam favour bridge sampling!) [The package relates to a [52 pages] tutorial on bridge sampling by Gronau et al. that I will hopefully comment soon.] The bridge sampling methodology for marginal likelihood approximation requires two Monte Carlo samples for a ratio of two integrals. A nice twist in this approach is to use a dummy integral that is already available, with respect to a probability density that is an approximation to the exact posterior. This means avoiding the difficulties with bridge sampling of bridging two different parameter spaces, in possibly different dimensions, with potentially very little overlap between the posterior distributions. The substitute probability density is chosen as Normal or warped Normal, rather than a t which would provide more stability in my opinion. The bridgesampling package also provides an error evaluation for the approximation, although based on spectral estimates derived from the coda package. The remainder of the document exhibits how the package can be used in conjunction with either JAGS or Stan. And concludes with the following words of caution:

“It should also be kept in mind that there may be cases in which the bridge sampling procedure may not be the ideal choice for conducting Bayesian model comparisons. For instance, when the models are nested it might be faster and easier to use the Savage-Dickey density ratio (Dickey and Lientz 1970; Wagenmakers et al. 2010). Another example is when the comparison of interest concerns a very large model space, and a separate bridge sampling based computation of marginal likelihoods may take too much time. In this scenario, Reversible Jump MCMC (Green 1995) may be more appropriate.”