## Jeffreys priors for mixtures [or not]

Posted in Books, Statistics, University life with tags , , , , , on July 25, 2017 by xi'an

Clara Grazian and I have just arXived [and submitted] a paper on the properties of Jeffreys priors for mixtures of distributions. (An earlier version had not been deemed of sufficient interest by Bayesian Analysis.) In this paper, we consider the formal Jeffreys prior for a mixture of Gaussian distributions and examine whether or not it leads to a proper posterior with a sufficient number of observations.  In general, it does not and hence cannot be used as a reference prior. While this is a negative result (and this is why Bayesian Analysis did not deem it of sufficient importance), I find it definitely relevant because it shows that the default reference prior [in the sense that the Jeffreys prior is the primary choice in nonparametric settings] does not operate in this wide class of distributions. What is surprising is that the use of a Jeffreys-like prior on a global location-scale parameter (as in our 1996 paper with Kerrie Mengersen or our recent work with Kaniav Kamary and Kate Lee) remains legit if proper priors are used on all the other parameters. (This may be yet another illustration of the tequilla-like toxicity of mixtures!)

Francisco Rubio and Mark Steel already exhibited this difficulty of the Jeffreys prior for mixtures of densities with disjoint supports [which reveals the mixture latent variable and hence turns the problem into something different]. Which relates to another point of interest in the paper, derived from a 1988 [Valencià Conference!] paper by José Bernardo and Javier Giròn, where they show the posterior associated with a Jeffreys prior on a mixture is proper when (a) only estimating the weights p and (b) using densities with disjoint supports. José and Javier use in this paper an astounding argument that I had not seen before and which took me a while to ingest and accept. Namely, the Jeffreys prior on a observed model with latent variables is bounded from above by the Jeffreys prior on the corresponding completed model. Hence if the later leads to a proper posterior for the observed data, so does the former. Very smooth, indeed!!!

Actually, we still support the use of the Jeffreys prior but only for the mixture mixtures, because it has the property supported by Judith and Kerrie of a conservative prior about the number of components. Obviously, we cannot advocate its use over all the parameters of the mixture since it then leads to an improper posterior.

## and the travelling salesman is…

Posted in Books, pictures, Statistics, University life with tags , , , on July 21, 2017 by xi'an

Here is another attempt at using StippleGen on… Alan Turing‘s picture. My reason for attempting a travelling salesman rendering of this well-known picture towards creating a logo for PCI Comput Stats, the peer community project I am working on this summer. With the help of the originators of PCI Evol Biol.

## what makes variables randoms [book review]

Posted in Books, Mountains, Statistics with tags , , , , , , on July 19, 2017 by xi'an

When the goal of a book is to make measure theoretic probability available to applied researchers for conducting their research, I cannot but applaud! Peter Veazie’s goal of writing “a brief text that provides a basic conceptual introduction to measure theory” (p.4) is hence most commendable. Before reading What makes variables random, I was uncertain how this could be achieved with a limited calculus background, given the difficulties met by our third year maths students. After reading the book, I am even less certain this is feasible!

“…it is the data generating process that makes the variables random and not the data.”

Chapter 2 is about basic notions of set theory. Chapter 3 defines measurable sets and measurable functions and integrals against a given measure μ as

$\sup_\pi \sum_{A\in\pi}\inf_{\omega\in A} f(\omega)\mu(A)$

which I find particularly unnatural compared with the definition through simple functions (esp. because it does not tell how to handle 0x∞). The ensuing discussion shows the limitation of the exercise in that the definition is only explained for finite sets (since the notion of a partition achieving the supremum on page 29 is otherwise meaningless). A generic problem with the book, in that most examples in the probability section relate to discrete settings (see the discussion of the power set p.66). I also did not see a justification as to why measurable functions enjoy well-defined integrals in the above sense. All in all, to see less than ten pages allocated to measure theory per se is rather staggering! For instance,

$\int_A f\text{d}\mu$

does not appear to be defined at all.

“…the mathematical probability theory underlying our analyses is just mathematics…”

Chapter 4 moves to probability measures. It distinguishes between objective (or frequentist) and subjective measures, which is of course open to diverse interpretations. And the definition of a conditional measure is the traditional one, conditional on a set rather than on a σ-algebra. Surprisingly as this is in my opinion one major reason for using measures in probability theory. And avoids unpleasant issues such as Bertrand’s paradox. While random variables are defined in the standard sense of real valued measurable functions, I did not see a definition of a continuous random variables or of the Lebesgue measure. And there are only a few lines (p.48) about the notion of expectation, which is so central to measure-theoretic probability as to provide a way of entry into measure theory! Progressing further, the σ-algebra induced by a random variable is defined as a partition (p.52), a particularly obscure notion for continuous rv’s. When the conditional density of one random variable given the realisation of another is finally introduced (p.63), as an expectation reconciling with the set-wise definition of conditional probabilities, it is in a fairly convoluted way that I fear will scare newcomers out of their wit. Since it relies on a sequence of nested sets with positive measure, implying an underlying topology and the like, which somewhat shows the impossibility of the overall task…

“In the Bayesian analysis, the likelihood provides meaning to the posterior.”

Statistics is hurriedly introduced in a short section at the end of Chapter 4, assuming the notion of likelihood is already known by the readers. But nitpicking (p.65) at the representation of the terms in the log-likelihood as depending on an unspecified parameter value θ [not to be confused with the data-generating value of θ, which does not appear clearly in this section]. Section that manages to include arcane remarks distinguishing maximum likelihood estimation from Bayesian analysis, all this within a page! (Nowhere is the Bayesian perspective clearly defined.)

“We should no more perform an analysis clustered by state than we would cluster by age, income, or other random variable.”

The last part of the book is about probabilistic models, drawing a distinction between data generating process models and data models (p.89), by which the author means the hypothesised probabilistic model versus the empirical or bootstrap distribution. An interesting way to relate to the main thread, except that the convergence of the data distribution to the data generating process model cannot be established at this level. And hence that the very nature of bootstrap may be lost on the reader. A second and final chapter covers some common or vexing problems and the author’s approach to them. Revolving around standard errors, fixed and random effects. The distinction between standard deviation (“a mathematical property of a probability distribution”) and standard error (“representation of variation due to a data generating process”) that is followed for several pages seems to boil down to a possible (and likely) model mis-specification. The chapter also contains an extensive discussion of notations, like indexes (or indicators), which seems a strange focus esp. at this location in the book. Over 15 pages! (Furthermore, I find quite confusing that a set of indices is denoted there by the double barred I, usually employed for the indicator function.)

“…the reader will probably observe the conspicuous absence of a time-honoured topic in calculus courses, the “Riemann integral”… Only the stubborn conservatism of academic tradition could freeze it into a regular part of the curriculum, long after it had outlived its historical importance.” Jean Dieudonné, Foundations of Modern Analysis

In conclusion, I do not see the point of this book, from its insistence on measure theory that never concretises for lack of mathematical material to an absence of convincing examples as to why this is useful for the applied researcher, to the intended audience which is expected to already quite a lot about probability and statistics, to a final meandering around linear models that seems at odds with the remainder of What makes variables random, without providing an answer to this question. Or to the more relevant one of why Lebesgue integration is preferable to Riemann integration. (Not that there does not exist convincing replies to this question!)

## ABC at sea and at war

Posted in Books, pictures, Statistics, Travel with tags , , , , , , , , , , , on July 18, 2017 by xi'an

While preparing crêpes at home yesterday night, I browsed through the  most recent issue of Significance and among many goodies, I spotted an article by McKay and co-authors discussing the simulation of a British vs. German naval battle from the First World War I had never heard of, the Battle of the Dogger Bank. The article was illustrated by a few historical pictures, but I quickly came across a more statistical description of the problem, which was not about creating wargames and alternate realities but rather inferring about the likelihood of the actual income, i.e., whether or not the naval battle outcome [which could be seen as a British victory, ending up with 0 to 1 sunk boat] was either a lucky strike or to be expected. And the method behind solving this question was indeed both Bayesian and ABC-esque! I did not read the longer paper by McKay et al. (hard to do while flipping crêpes!) but the description in Significance was clear enough to understand that the six summary statistics used in this ABC implementation were the number of shots, hits, and lost turrets for both sides. (The answer to the original question is that indeed the British fleet was lucky to keep all its boats afloat. But it is also unlikely another score would have changed the outcome of WWI.) [As I found in this other history paper, ABC seems quite popular in historical inference! And there is another completely unrelated arXived paper with main title The Fog of War…]

## Le Monde puzzle [#1016]

Posted in Books, Kids with tags , , , on July 16, 2017 by xi'an

An even more straightforward Le Monde mathematical puzzle that took a few minutes to code in the train to Cambridge:

1. Breaking {1,…,8} into two sets of four integrals, what is (or are) the division into two groups of equal size such that the sums of the squared terms from each are equal? Same question for the set {21,…,28}.
2.  Considering the integers from 1 to 12, how many divisions into two groups of size six satisfy the above property? Same question when the two groups are of different sizes.

The first code is

nop=TRUE
while (nop){
s=sample(1:8)
nop=(sum(s[1:4]^2)!=sum(s[5:8]^2))}


with result

1 6 4 7

while the second set leads to the unique [drifted] solution (up to symmetries)

21 24 26 27


and the divisions for the larger set {1,…,12} is unique in the equal case, and are four in the unequal case.

## RNG impact on MCMC [or lack thereof]

Posted in Books, R, Statistics, Travel, University life with tags , , , , , , , on July 13, 2017 by xi'an

Following the talk at MCM 2017 about the strange impact of the random generator on the outcome of an MCMC generator, I tried in Montréal airport the following code on the banana target of Haario et al. (1999), copied from Soetaert and Laine and using the MCMC function of the FME package:

library(FME)
Banana <- function (x1, x2) {
return(x2 - (x1^2+1)) }
pmultinorm <- function(vec, mean, Cov) {
diff <- vec - mean
ex <- -0.5*t(diff) %*% solve(Cov) %*% diff
rdet <- sqrt(det(Cov))
power <- -length(diff)*0.5
return((2.*pi)^power / rdet * exp(ex)) }
BananaSS <- function (p) {
P <- c(p[1], Banana(p[1], p[2]))
Cov <- matrix(nr = 2, data = c(1, 0.9, 0.9, 1))
N=1e3
ejd=matrix(0,4,N)
RNGkind("Mars")
for (t in 1:N){
MCMC <- modMCMC(f = BananaSS, p = c(0, 0.7),
jump = diag(nrow = 2, x = 5), niter = 1e3)
ejd[1,t]=mean((MCMC$pars[-1,2]-MCMC$pars[1,2])^2)}


since this divergence from the initial condition seemed to reflect the experiment of the speaker at MCM 2017. Unsurprisingly, no difference came from using the different RNGs in R (which may fail to contain those incriminated by the study)…

## easy riddle

Posted in Books, Kids, R with tags , , , , , on July 12, 2017 by xi'an

From the current Riddler, a problem that only requires a few lines of code and a few seconds of reasoning. Or not.

N households each stole the earnings from one of the (N-1) other households, one at a time. What is the probability that a given household is not burglarised? And what are the expected final earnings of each household in the list, assuming they all start with \$1?

The first question is close to Feller’s enveloppe problem in that

$\left(1-\frac{1}{N-1}\right)^{N-1}$

is close to exp(-1) for N large. The second question can easily be solved by an R code like

N=1e3;M=1e6
fina=rep(1,N)
for (v in 1:M){
ordre=sample(1:N)
vole=sample(1:N,N,rep=TRUE)
while (min(abs(vole-(1:N)))==0)
vole[abs(vole-(1:N))==0]=sample(1:N,
sum(vole-(1:N)==0))
cash=rep(1,N)
for (t in 1:N){
cash[ordre[t]]=cash[ordre[t]]+cash[vole[t]];cash[vole[t]]=0}
fina=fina+cash[ordre]}


which returns a pretty regular exponential-like curve, although I cannot figure the exact curve beyond the third burglary. The published solution gives the curve

${\frac{N-2}{N-1}}^{999}\times 2+{\frac{1}{N-1}}^{t-1}\times{\frac{N-1}{N}}^{N-t}\times\frac{N}{N-1}$

corresponding to the probability of never being robbed (and getting on average an extra unit from the robbery) and of being robbed only before robbing someone else (with average wealth N/(N-1)).