## MCMC with strings and branes

Posted in Books, pictures, Statistics with tags , , , , , on June 6, 2016 by xi'an

“Roughly speaking, the idea [of MCMC] is that the thermal fluctuations of a particle moving in an energy landscape provides a conceptually elegant way to sample from a target distribution.”

A short version of their earlier long paper was arXived last week by Heckman et al. The starting point of the paper is to consider simultaneously M parallel samplers by envisioning the M-uple as a single object. This reminded me of the attempt we had made in our 1995 pinball sampler paper with Kerrie Mengersen, where we introduced a repulsive scheme to keep the particles apart and individually stationary against the same target. The joint target being defined as the product of the individual targets. As a single Markov chain, the MCMC sampler can take advantage of the parallel chains to possibly improve the efficiency when compared with running M parallel chains. Possibly only, because the target has moved to a space that is M times larger…

“…although the physics of point particles underlies much of our modern understanding of natural phenomena, it has proven fruitful to consider objects such as strings and branes with finite extent in p spatial dimensions.”

The details of the proposal are somewhat unclear in that the notion of brane remains a mystery to me. It sounds like a sort of random graph over the indices of the particles, but endowed with further (magical?) physical properties. As defined in the paper the suburban algorithm picks a random (neighbourhood) graph at each iteration that is used in the proposal over the particle system (or ensemble). As justified by the authors, the fact that this choice is independent of the current state of the Markov chain implies stationarity. If not efficiency, when compared with the independent parallel MCMC scheme. And because there are so many ways in taking into account the neighbours. (I did not see (11) as a particularly relevant implementation of the algorithm, mixing a random walk move with another random walk on the time differences.) Actually, I have somewhat of a worry about the term “nearest neighbour” which may be defined by the graph (which is fine) or by the configuration of the particle system at time t (which is not fine).

“The effective dimension rather than the overall topology of the grid plays the dominant role in the performance of the algorithm.”

The limited influence of the grid topology is quite understandable in that the chain targets an iid sample, so there is no reason one index value is more relevant than another. All particles in the vector are interchangeable in this respect and in the long run only the number of connected particles should matter. A more interesting feature is that the suburban sampler seems to perform best for strings, i.e. when the number of connected particles is larger than two. This somewhat agrees with my initial remark that dealing with the M particles as a single object should slow down convergence because of the dimension increase. This is one reason why we did not pursue our pinball sampler any further, as it seemed to converge quite slowly.

“To summarize: With too few friends one drifts into oblivion, but with too many friends one becomes a boring conformist.”

The notion of generating a sample all at once is quite appealing, especially because of the iid nature of the target, but I am not convinced the approach followed in this paper is sufficiently involved for this purpose. Alex Shestopaloff and Radford Neal’s recent work on ensemble MCMC is certainly pushing things further in the analysis of efficient moves. I also wonder if a repulsive component shouldn’t be added to the target as in pinball sampling, borrowing maybe from determinental processes, towards a more thorough exploration of the target. Even though it remains unclear that this will be more efficient than running M parallel chains.

## AISTATS 2016 [#2]

Posted in Kids, pictures, Running, Statistics, Travel, University life, Wines with tags , , , , , , , , , , , , , , , on May 13, 2016 by xi'an

The second and third days of AISTATS 2016 passed like a blur, with not even the opportunity to write my impressions in real time! Maybe long tapa breaks are mostly to blame for this… In any case, we had two further exciting plenary talks about privacy-preserving data analysis by Kamalika Chaudhuri and crowdsourcing and machine learning by Adam Tauman Kalai. The talk by Kamalika was covering recent results by Kamalika and coauthors about optimal privacy preservation in classification and a generalisation to correlated data, with the neat notion of a Markov Quilt.  Other talks that same day also dwelt on this privacy issue, but I could not be . The talk by Adam was full of fun illustrations on humans training learning systems (with the unsolved difficulty of those humans deliberately mis-training the system, as exhibited recently by the short-lived Microsoft Tay experiment).

Both poster sessions were equally exciting, with the addition of MLSS student posters on the final day. Among many, I particularly enjoyed Iain Murray’s pseudo-marginal slice sampling, David Duvenaud’s fairly intriguing use of early stopping for non-parametric inference,  Garrett Bernstein’s work on aggregated Markov chains, Ye Wang’s scalable geometric density estimation [with a special bonus for his typo on the University of Turing, instead of Torino], Gemma Moran’s and Chengtao Li’s posters on determinantal processes, and Matej Balog’s Mondrian forests with a Laplace kernel [envisioning potential applications for ABC]. Again, just to mention a few…

The participants [incl. myself] also took one evening off to visit a sherry winery in Jerez, with a well-practiced spiel on the story of the company, with some building designed by Gutave Eiffel, and with a wine-tasting session. As I personally find this type of brandy too strong in alcohol, I am not a big fan of sherry but it was nonetheless an amusing trip! With no visible after-effects the next morning, since the audience was as large as usual for Adam’s talk [although I did not cross a machine-learning soul on my 6am run…]

In short, I enjoyed very much AISTATS 2016 and remain deeply impressed by the efficiency of the selection process and the amount of involvement of the actors of this selection, as mentioned earlier on the ‘Og. Kudos!

## simulating determinantal processes

Posted in Statistics, Travel with tags , , , , , , , , , , on December 6, 2013 by xi'an

In the plane to Atlanta, I happened to read a paper called Efficient simulation of the Ginibre point process by Laurent Decreusefond, Ian Flint, and Anaïs Vergne (from Telecom Paristech). “Happened to” as it was a conjunction of getting tipped by my new Dauphine colleague (and fellow blogger!) Djalil Chaffaï about the paper, having downloaded it prior to departure, and being stuck in a plane (after watching the only Chinese [somewhat] fantasy movie onboard, Saving General Yang).

This is mostly a mathematics paper. While indeed a large chunk of it is concerned with the rigorous definition of this point process in an abstract space, the last part is about simulating such processes. They are called determinantal (and not detrimental as I was tempted to interpret on my first read!) because the density of an n-set (x1x2,…,xn) is given by a kind of generalised Vandermonde determinant

$p(x_1,\ldots,x_n) = \dfrac{1}{n!} \text{det} \left( T(x_i,x_j) \right)$

where T is defined in terms of an orthonormal family,

$T(x,y) = \sum_{i=1}^n \psi_i(x) \overline{\psi_i(y)}.$

(The number n of points can be simulated via an a.s. finite Bernoulli process.) Because of this representation, the sequence of conditional densities for the xi‘s (i.e. x1, x2 given x1, etc.) can be found in closed form. In the special case of the Ginibre process, the ψi‘s are of the form

$\psi_i(z) =z^m \exp\{-|z|^2/2\}/\sqrt{\pi m!}$

and the process cannot be simulated for it has infinite mass, hence an a.s. infinite number of points. Somehow surprisingly (as I thought this was the point of the paper), the authors then switch to a truncated version of the process that always has a fixed number N of points. And whose density has the closed form

$p(x_1,\ldots,x_n) = \dfrac{1}{\pi^N} \prod_i \frac{1}{i!} \exp\{-|z_i|^2/2\}\prod_{i

It has an interestingly repulsive quality in that points cannot get close to one another. (It reminded me of the pinball sampler proposed by Kerrie Mengersen and myself at one of the Valencia meetings and not pursued since.) The conclusion (of this section) is anticlimactic, though,  in that it is known that this density also corresponds to the distribution of the eigenvalues of an Hermitian matrix with standardized complex Gaussian entries. The authors mentions that the fact that the support is the whole complex space Cn is a difficulty, although I do not see why.

The following sections of the paper move to the Ginibre process restricted to a compact and then to the truncated Ginibre process restricted to a compact, for which the authors develop corresponding simulation algorithms. There is however a drag in that the sequence of conditionals, while available in closed-form, cannot be simulated efficiently but rely on a uniform accept-reject instead. While I am certainly missing most of the points in the paper, I wonder if a Gibbs sampler would not be an interesting alternative given that the full (last) conditional is a Gaussian density…