Archive for July, 2010

Vancouver skyline [1]

Posted in pictures, Travel with tags , on July 31, 2010 by xi'an


Posted in Mountains, Travel, University life with tags , , , , , , on July 31, 2010 by xi'an

Before leaving for Vancouver, I just bough this pair of Garmont dragontail approach shoes. They will hopefully prove useful in the approach walk to the Smoke Bluffs in Squamish today, where I plan to go climbing with Julien (once again with a brand new 10mm Mammut rope), weather and shape (15 hours of travelling) permitting… And later on the trails in Yosemite (no climbing there, I am afraid!).

Actually, the flight to Vancouver was packed, which means I could not even open my macbook… So I looked instead at the translation of Introducing Monte Carlo Methods with R into French, read the Economist from start till end (with an interesting discussion on the impact of fast-speed trains on freight transportation!), slept, watched the incredible landscape of Greenland with its immense icefields and deep red striated rock formations, and started an Icelandic crime novel. Plus talked with my neighbour of arithmetic Monte Carlo as  I saw him proving a theorem about commutative algebra: he happened to be a professor in computer science at the University of Victoria. The Bose earphones did marvel to cancel the engine noise in the plane, if not the screams of the kids in the next row…


Posted in R, Statistics, University life with tags , , , , , , , , on July 30, 2010 by xi'an

A most interesting paper by Adrian Raftery and Le Bao appeared in the Early View section of Biometrics.  It aims at better predictions for HIV prevalence—in the original UNAIDS implementation, a naïve SIR procedure was used, based on the prior as importance function, which sometimes resulted in terrible degeneracy—, but its methodological input is about incremental mixture importance sampling (IMIS), thus relates to the general topic of adaptive Monte Carlo methods I am interested in. (And to some extent to our recent AMIS paper.) Actually, a less elaborate (and less related) version of the IMIS algorithm first appeared in a 2006 paper by Steele, Raftery and Edmond in JCGS in the setting of finite mixture likelihoods and I somehow managed to miss it…

Raftery and Bao propose to replace SIR with an iterative importance sampling technique developed in 2003 by Steele et al. that has some similarities with population Monte Carlo (PMC). (A negligible misrepresentation of PMC in the current paper is that our method does not use “the prior as importance function'”.) In its current format, the IMIS algorithm starts from a first guess (e.g., the prior distribution) and builds a sequence of Gaussian (or Gaussian mixture) approximations whose parameters are estimated from the current population, while all simulation are merged together at each step, using a mixture stabilising weight

\pi(\theta_i^s|x) / \omega_0 p_0(\theta_i^0)+\sum_r \omega_r \hat q_r(\theta_i^s)

where the weights \omega_r depend on the number of simulations at step r. This pattern also appears in our adaptive multiple importance sampling (AMIS) algorithm developed in this arXiv paper with Jean-Marie Cornuet, Jean-Michel Marin and Antonietta Mira, and in the original paper by Owen and Zhou (2000, JASA) that inspired us. Raftery and Bo extend the methodology to an IMIS with optimisation at the initial stage, while AMIS incorporates the natural population Monte Carlo stepwise optimisation developed in Douc et al. (2008, Annals of Statistics) that brings the proposal kernel closer to the target after each iteration. The application of the simulations to conduct model choice found in the current paper and in Steele et al. can also be paralleled with the one using population Monte Carlo we conducted for cosmological data in MNRAS.

Interestingly, Raftery and Bo (and also Steele et al.) refer to the defensive mixture paper of Hesterberg (1995, Technometrics), which has been very influential in my research on importance sampling, and (less directly) to Owen and Zhou (2000, JASA), who did propose the deterministic mixture scheme that inspired AMIS. Besides the foundational papers of Oh and Berger (1991, JASA) and West (1993, J. Royal Statistical Society Series B), they also mention a paper by Raghavan and Cox (1998, J. Statistical Simulation & Computation) I was not aware of, which introduces as well a mixture of importance proposals as a variance stabilising technique.

Le Monde puzzle [29]

Posted in Books with tags , , on July 29, 2010 by xi'an

Although I am not receiving Le Monde magazine by mail any longer (maybe a consequence of highly critical posts?!), I bought it at a newsagent last Saturday and found a fairly easy mathematical puzzle. Or is it that easy? The question is to decide about the unicity of the solution to the equation

\left[\begin{matrix} 1 &0 &1 &0 &\ldots &0 &0\\ 0 &1 &0 &1 &\ldots &0 &0\\ 0 &0 &1 &0 &\ldots &0 &0\\ \vdots &\vdots &\vdots &\vdots &\ddots &\vdots &\vdots \\ 1 &0 &0 &0 &\ldots &1 &0\\ 0 &1 &0 &0 &\ldots &0 &1\end{matrix}\right]\, \mathbf{x} = \mathbf{y}

when the matrix is (p,p). When p=9,10,11, the matrix is invertible, so there is no problem. But, looking at p=12, it is no longer invertible! When running a quick check like

for (p in 3:30){ A=diag(p) A[(1:p)+p*(((1:p)+1)%%(p))]=1 print(p) b=try(solve(A))}

we actually spot that p=4,8,12,16,20,24,28,…, i.e. all multiples of 4, lead to a non-invertible matrix. It is easy to see why for the (4,4) matrix

\mathbf{A}_4 = \left[\begin{matrix} 1 &0 &1 &0\\ 0 &1 &0 &1\\ 1 &0 &1 &0\\ 0 &1 &0 &1\end{matrix}\right]

since the first and second lines are repeated. The (8,8) matrix is equivalent to the bloc matrix

\left[\begin{matrix} \mathbf{A}_4 & \mathbf{O}\\ \mathbf{B} &\mathbf{I_4}\end{matrix}\right],

hence with the same determinant as \mathbf{A}_4… In the general case, the determinant of the matrix is a circular determinant. Because most entries (but two) are zero, the determinant can easily be written as

\prod_{j=0}^{p-1} \left\{1 + e^{\iota 4\pi j/p}\right\},

where ι is the basic complex number. Therefore, if, and only if, p is a multiple of 4, j in the above product can take the value  n/4, leading to a zero in the product… I can see why this makes a good puzzle, because this case of the multiples of 4 is quite counterintuitive!

A summer of books

Posted in Books, R, Statistics, University life with tags , , , , , , , on July 28, 2010 by xi'an

The summer started with a research in pair session in CiRM on the R edition of Bayesian Core, but I am also involved two other book projects. The first one was mentioned in a previous post, namely the translation of Introducing Monte Carlo Methods with R into French. I have now recovered all translated chapters, involving not less than six translators! (My son completed his two last chapters while in CiRM, another benefit of the fortnight there!) So I need to get over those chapters to ensure some minimal homogeneity in the style and the notations. Not an immense amount of work given the near perfect productions of Robin Ryder, Julyan Arbel and Pierre Jacob, but still needs to be done (a perIntroducting Monte Carlofect opportunity for the long flight to Vancouver!) The second book is an edited volume following the exciting meeting on mixtures last March in Edinburgh. I have so far received twelve of the fifteen chapters from the contributors and hope against all odds to pack the volume for Vancouver, in order to discuss with the Wiley representative there. (I also hope it will be possible to include a picture I took during my trip to Ben Nevis as the cover picture..!)

On raw milk

Posted in Kids, Travel with tags , , , , , on July 27, 2010 by xi'an

While it is almost impossible to find raw milk (ie unpasteurised) cheese in North America, especially for those cheese aged last than 60 days, there are still a few producers making Camembert from raw milk in Normandy. (There is indeed a risk of contamination by listeria, E. Coli and salmonella in unpasteurised products, but the taste and structure of those cheese are clearly above those of pasteurised camemberts.) The risk of contamination is rather restricted by the limited production of those small manufacturers and the sales are mostly limited to Normandy and cheesemongers in the Paris area. I was thus quite surprised to find the other day a raw milk camembert sold in the local supermarket, meaning a higher production volume.

The explanation stands in the lack of A.O.C. label on the box, meaning that the Isigny-Sainte Mère company uses a processing that is not accepted as traditional by the A.O.C. for Camembert… The company is (legitimately) worried about being sued in case of food poisoning and is thus using a filtering device that avoids pasteurisation while removing bacterias. Now, the resulting cheese is very close to the “real” thing and, while I prefer the confidential brands of Carel and Jort, it still tastes very good!

Ps-Now, one may wonder about my sudden passion for Camembert! The reason is that, as a teenager looking for summer jobs, I used to make (raw milk) camemberts for a company (that no longer manufactures them)… Hence a certain lasting interest in the product.

Asher’s enigma

Posted in R, Statistics with tags , , , on July 26, 2010 by xi'an

On his Probability and statistics blog, Matt Asher put a funny question (with my rephrasing):

Take a unit square. Now pick two spots at random along the perimeter, uniformly. For each of these two locations, pick another random point from one of the three other sides of the square and draw the segment. What is the probability the two segments intersect? And what is the distribution for the intersection points?

The (my) intuition for the first question was 1/2, but a quick computation led to another answer. The key to the computation is to distinguish whether or not both segments share one side of the square. They do with probability

\dfrac{2}{4}\times 1 + \dfrac{2}{4}\times\dfrac{2}{3} = \dfrac{5}{6},

in which case they intersect with probability 1/2. They occupy the four sides with probability 1/6, in which case they intersect with probability 1/3. So the final answer is 17/36 (as posted by several readers and empirically found by Matt). The second question is much more tricky: the histogram of the distribution of the coordinates is peaked towards the boundaries, thus reminding me of an arc-sine distribution, but there is a bump in the middle as well. Computing the coordinates of the intersection depending on the respective positions of the endpoints of both segments and simulating those distributions led me to histograms that looked either like beta B(a,a) distributions, or like beta B(1,a) distributions, or like beta B(a,1) distributions… Not exactly, though. So not even a mixture of beta distributions is enough to explain the distribution of the intersection points… For instance, the intersection points corresponding to segments were both segments start from the same side and end up in the opposite side are distributed as


where all u‘s are uniform on (0,1) and under the constraint (u_2-u_1)(u_4-u_3)<0. The following graph shows how well a beta distribution fits in that case. (Not perfectly, though!)
The R code is


Similarly, if the two segments start from the same side but end up on different sides, the distribution of one coordinate is given by


under the constraint u_3<u_1. The outcome is once again almost distributed as a beta:
The corresponding R code is