not converging to London for an [extra]ordinary Read Paper

Posted in Kids, Statistics, University life, Books, Travel, pictures with tags , , , , , , , on November 21, 2014 by xi'an

On December 10, I will alas not travel to London to attend the Read Paper on sequential quasi-Monte Carlo presented by Mathieu Gerber and Nicolas Chopin to The Society, as I fly instead to Montréal for the NIPS workshops… I am quite sorry to miss this event, as this is a major paper which brings quasi-Monte Carlo methods into mainstream statistics. I will most certainly write a discussion and remind Og’s readers that contributed (800 words) discussions are welcome from everyone, the deadline for submission being January 02.

Bayesian evidence and model selection

Posted in Statistics on November 20, 2014 by xi'an

Another arXived paper with a topic close to my interests, posted by Knuth et al. today. Namely, Bayesian model selection. However, after reading the paper in Gainesville, I am rather uncertain about its prospects, besides providing an entry to the issue (for physicists?). Indeed, the description of (Bayesian) evidence is concentrating on rough approximations, in a physics perspective, with a notion of Occam’s factor that measures the divergence to the maximum likelihood. (As usual when reading physics literature, I am uncertain as to why one should consider always approximations.) The numerical part mentions the tools of importance sampling and Laplace approximations, path sampling and nested sampling. The main part of the paper consists in applying those tools to signal processing models. One of them is a mixture example where nested sampling is used to evaluate the most likely number of components. Using uniform priors over non-specified hypercubes. In an example about photometric signal from an exoplanet, two models are distinguished by evidences of 37,764 and 37,765, with another one at 37,748. It seems to me that this very proximity simply prevents the comparison of those models, even without accounting for the Monte Carlo variability. And does not suffice to conclude about a scientific theory (“effectively characterize exoplanetary systems”). Which leads to my current thinking, already expressed on that blog, that Bayes factors and posterior probabilities should be replaced with an alternative, including uncertainty about the very Bayes factor (or evidence).

differences between Bayes factors and normalised maximum likelihood

Posted in Books, Kids, Statistics, University life with tags , , , , on November 19, 2014 by xi'an

A recent arXival by Heck, Wagenmaker and Morey attracted my attention: Three Qualitative Differences Between Bayes Factors and Normalized Maximum Likelihood, as it provides an analysis of the differences between Bayesian analysis and Rissanen’s Optimal Estimation of Parameters that I reviewed a while ago. As detailed in this review, I had difficulties with considering the normalised likelihood

$p(x|\hat\theta_x) \big/ \int_\mathcal{X} p(y|\hat\theta_y)\,\text{d}y$

as the relevant quantity. One reason being that the distribution does not make experimental sense: for instance, how can one simulate from this distribution? [I mean, when considering only the original distribution.] Working with the simple binomial B(n,θ) model, the authors show the quantity corresponding to the posterior probability may be constant for most of the data values, produces a different upper bound and hence a different penalty of model complexity, and may differ in conclusion for some observations. Which means that the apparent proximity to using a Jeffreys prior and Rissanen’s alternative does not go all the way. While it is a short note and only focussed on producing an illustration in the Binomial case, I find it interesting that researchers investigate the Bayesian nature (vs. artifice!) of this approach…

importance sampling schemes for evidence approximation [revised]

Posted in Statistics, University life with tags , , , , , , , on November 18, 2014 by xi'an

After a rather intense period of new simulations and versions, Juong Een (Kate) Lee and I have now resubmitted our paper on (some) importance sampling schemes for evidence approximation in mixture models to Bayesian Analysis. There is no fundamental change in the new version but rather a more detailed description of what those importance schemes mean in practice. The original idea in the paper is to improve upon the Rao-Blackwellisation solution proposed by Berkoff et al. (2002) and later by Marin et al. (2005) to avoid the impact of label switching on Chib’s formula. The Rao-Blackwellisation consists in averaging over all permutations of the labels while the improvement relies on the elimination of useless permutations, namely those that produce a negligible conditional density in Chib’s (candidate’s) formula. While the improvement implies truncated the overall sum and hence induces a potential bias (which was the concern of one referee), the determination of the irrelevant permutations after relabelling next to a single mode does not appear to cause any bias, while reducing the computational overload. Referees also made us aware of many recent proposals that conduct to different evidence approximations, albeit not directly related with our purpose. (One was Rodrigues and Walker, 2014, discussed and commented in a recent post.)

a probabilistic proof to a quasi-Monte Carlo lemma

Posted in Books, Statistics, Travel, University life with tags , , , , , on November 17, 2014 by xi'an

As I was reading in the Paris métro a new textbook on Quasi-Monte Carlo methods, Introduction to Quasi-Monte Carlo Integration and Applications, written by Gunther Leobacher and Friedrich Pillichshammer, I came upon the lemma that, given two sequences on (0,1) such that, for all i’s,

$|u_i-v_i|\le\delta\quad\text{then}\quad\left|\prod_{i=1}^s u_i-\prod_{i=1}^s v_i\right|\le 1-(1-\delta)^s$

and the geometric bound made me wonder if there was an easy probabilistic proof to this inequality. Rather than the algebraic proof contained in the book. Unsurprisingly, there is one based on associating with each pair (u,v) a pair of independent events (A,B) such that, for all i’s,

$A_i\subset B_i\,,\ u_i=\mathbb{P}(A_i)\,,\ v_i=\mathbb{P}(B_i)$

and representing

$\left|\prod_{i=1}^s u_i-\prod_{i=1}^s v_i\right| = \mathbb{P}(\cap_{i=1}^s A_i) - \mathbb{P}(\cap_{i=1}^s B_i)\,.$

Obviously, there is no visible consequence to this remark, but it was a good way to switch off the métro hassle for a while! (The book is under review and the review will hopefully be posted on the ‘Og as soon as it is completed.)

Le Monde puzzle [#887bis]

Posted in Kids, R, Statistics, University life with tags , , on November 16, 2014 by xi'an

As mentioned in the previous post, an alternative consists in finding the permutation of {1,…,N} by “adding” squares left and right until the permutation is complete or no solution is available. While this sounds like the dual of the initial solution, it brings a considerable improvement in computing time, as shown below. I thus redefined the construction of the solution by initialising the permutation at random (it could start at 1 just as well)

perfect=(1:trunc(sqrt(2*N)))^2
perm=friends=(1:N)
t=1
perm[t]=sample(friends,1)
friends=friends[friends!=perm[t]]


and then completing only with possible neighbours, left

out=outer(perfect-perm[t],friends,"==")
if (max(out)==1){
t=t+1
perm[t]=sample(rep(perfect[apply(out,1,
max)==1],2),1)-perm[t-1]
friends=friends[friends!=perm[t]]}


or right

out=outer(perfect-perm[1],friends,"==")
if (max(out)==1){
t=t+1
perf=sample(rep(perfect[apply(out,1,
max)==1],2),1)-perm[1]
perm[1:t]=c(perf,perm[1:(t-1)])
friends=friends[friends!=perf]}


(If you wonder about why the rep in the sample step, it is a trick I just found to avoid the insufferable feature that sample(n,1) is equivalent to sample(1:n,1)! It costs basically nothing but bypasses reprogramming sample() each time I use it… I am very glad I found this trick!) The gain in computing time is amazing:

> system.time(for (i in 1:50) pick(15))
utilisateur     système       écoulé
5.397       0.000       5.395
> system.time(for (i in 1:50) puck(15))
utilisateur     système      écoulé
0.285       0.000       0.287


An unrelated point is that a more interesting (?) alternative problem consists in adding a toroidal constraint, namely to have the first and the last entries in the permutation to also sum up to a perfect square. Is it at all possible?

Le Monde puzzle [#887]

Posted in Books, Kids, R, Statistics with tags , , , on November 15, 2014 by xi'an

A simple combinatorics Le Monde mathematical puzzle:

N is a golden number if the sequence {1,2,…,N} can be reordered so that the sum of any consecutive pair is a perfect square. What are the golden numbers between 1 and 25?

Indeed, from an R programming point of view, all I have to do is to go over all possible permutations of {1,2,..,N} until one works or until I have exhausted all possible permutations for a given N. However, 25!=10²⁵ is a wee bit too large… Instead, I resorted once again to brute force simulation, by first introducing possible neighbours of the integers

  perfect=(1:trunc(sqrt(2*N)))^2
friends=NULL
le=1:N
for (perm in 1:N){
amis=perfect[perfect>perm]-perm
amis=amis[amis<N]
le[perm]=length(amis)
friends=c(friends,list(amis))
}


and then proceeding to construct the permutation one integer at time by picking from its remaining potential neighbours until there is none left or the sequence is complete

orderin=0*(1:N)
t=1
orderin[1]=sample((1:N),1)
for (perm in 1:N){
friends[[perm]]=friends[[perm]]
[friends[[perm]]!=orderin[1]]
le[perm]=length(friends[[perm]])
}
while (t<N){
if (length(friends[[orderin[t]]])==0)
break()
if (length(friends[[orderin[t]]])>1){
orderin[t+1]=sample(friends[[orderin[t]]],1)}else{
orderin[t+1]=friends[[orderin[t]]]
}
for (perm in 1:N){
friends[[perm]]=friends[[perm]]
[friends[[perm]]!=orderin[t+1]]
le[perm]=length(friends[[perm]])
}
t=t+1}


and then repeating this attempt until a full sequence is produced or a certain number of failed attempts has been reached. I gained in efficiency by proposing a second completion on the left of the first integer once a break occurs:

while (t<N){
if (length(friends[[orderin[1]]])==0)
break()
orderin[2:(t+1)]=orderin[1:t]
if (length(friends[[orderin[2]]])>1){
orderin[1]=sample(friends[[orderin[2]]],1)}else{
orderin[1]=friends[[orderin[2]]]
}
for (perm in 1:N){
friends[[perm]]=friends[[perm]]
[friends[[perm]]!=orderin[1]]
le[perm]=length(friends[[perm]])
}
t=t+1}


(An alternative would have been to complete left and right by squared numbers taken at random…) The result of running this program showed there exist permutations with the above property for N=15,16,17,23,25,26,…,77.  Here is the solution for N=49:

25 39 10 26 38 43 21 4 32 49 15 34 30 6 3 22 42 7 9 27 37 12 13 23 41 40 24 1 8 28 36 45 19 17 47 2 14 11 5 44 20 29 35 46 18 31 33 16 48

As an aside, the authors of Le Monde puzzle pretended (in Tuesday, Nov. 12, edition) that there was no solution for N=23, while this sequence

22 3 1 8 17 19 6 10 15 21 4 12 13 23 2 14 11 5 20 16 9 7 18

sounds fine enough to me… I more generally wonder at the general principle behind the existence of such sequences. It sounds quite probable that they exist for N>24. (The published solution does not bring any light on this issue, so I assume the authors have no mathematical analysis to provide.)