voices – strange shores [book reviews]

Posted in Books, Mountains, Travel with tags , , , , , , on July 5, 2014 by xi'an

Following my recent trip to Iceland, I read two more books by Arnaldur Indriðason, Voices (Röddin, 2003) and Strange Shores (Furðustrandir, 2010).

As usual, Indriðason’s books are more about the past (of characters as well as of the whole country) than about current times. Voices does not switch from this pattern, the more because it is one of the earliest Inspector Erlendur’s books. Besides the murder of an hotel employee at the fringe of homelessness, lies the almost constant questioning in Indriðason’s books of the difficult or even impossible relations between parents and children and/or between siblings, and of the long-lasting consequences of this generation gap. The murder iitself is but a pretext to investigations on that theme and the murder resolution is far from the central point of the book. The story itself is thus less compelling than others I have read, maybe because the main character spends so much time closeted in his hotel room. But it nonetheless fits well within the Erlendur series. And although it is unrelated with the story, the cover reminded me very much of the Gullfoss waterfalls.

The second book, Strange Shores, is the farthest to a detective stories in the whole series. Indeed, Erlendur is back to his childhood cottage in Eastern Iceland, looking for a resolution of his childhood trauma, loosing his younger brother during a snowstorm. He also investigates another snowstorm disappearance, interrogating the few survivors and reluctant witnesses from that time. Outside any legal mandate. Sometimes very much outside! While the story is not completely plausible, both in the present and in the past, it remains a striking novel, even on its own. (Although it could read better after the earlier novels in the series.) Not only the resolution of the additional disappearance brings additional pain and no comfort to those involved, but the ending of Erlendur’s own quest is quite ambiguous. As the book reaches its final pages, I could not decide if he had reached redemption and deliverance and the potential to save his own children, or he was beyond redemption, reaching another circle of Hell. As explained by the author in an interview, this is intentional and not not the consequence of my poor understanding: ” Readers of Strange Shores are not quite certain what to make of the ending regarding Erlendur, and I’m quite happy to leave them in the dark!”. If the main character of this series focussing more on missing persons than on detective work, what’s next?!

delayed acceptance with prefetching

Posted in Books, Kids, Statistics, Travel, University life with tags , , , , , , , on June 12, 2014 by xi'an

In a somewhat desperate rush (started upon my return from Iceland and terminated on my return from Edinburgh), Marco Banterle, Clara Grazian and I managed to complete and submit our paper by last Friday evening… It is now arXived as well. The full title of the paper is Accelerating Metropolis-Hastings algorithms: Delayed acceptance with prefetching and the idea behind the generic acceleration is (a) to divide the acceptance step into parts, towards a major reduction in computing time that outranks the corresponding reduction in acceptance probability and (b) to exploit this division to build a dynamic prefetching algorithm. The division is to break the prior x likelihood target into a product such that some terms are much cheaper than others. Or equivalently to represent the acceptance-rejection ratio in the Metropolis-Hastings algorithm as

$\dfrac{\pi(\theta)\,q(\theta,\eta)}{\pi(\eta)q(\eta,\theta)} = \prod_{k=1}^d \rho_k(\eta,\theta)$

again with significant differences in the computing cost of those terms. Indeed, this division can be exploited by checking for each term sequentially, in the sense that the overall acceptance probability

$\prod_{k=1}^d \min\left\{\rho_k(\eta,\theta),1\right\}$

is associated with the right (posterior) target! This lemma can be directly checked via the detailed balance condition, but it is also a consequence of a 2005 paper by Andrès Christen and Colin Fox on using approximate transition densities (with the same idea of gaining time: in case of an early rejection, the exact target needs not be computed). While the purpose of the recent [commented] paper by Doucet et al. is fundamentally orthogonal to ours, a special case of this decomposition of the acceptance step in the Metropolis–Hastings algorithm can be found therein. The division of the likelihood into parts also allows for a precomputation of the target solely based on a subsample, hence gaining time and allowing for a natural prefetching version, following recent developments in this direction. (Discussed on the ‘Og.) We study the novel method within two realistic environments, the fi rst one made of logistic regression targets using benchmarks found in the earlier prefetching literature and a second one handling an original analysis of a parametric mixture model via genuine Jeffreys priors. [As I made preliminary notes along those weeks using the 'Og as a notebook, several posts on the coming days will elaborate on the above.]

bridging the gap between machine learning and statistics

Posted in pictures, Statistics, Travel, University life with tags , , , , , , on May 10, 2014 by xi'an

Today in Warwick, I had a very nice discussion with Michael Betancourt on many statistical and computational issues but at one point in the conversation we came upon the trouble of bridging the gap between the machine learning and statistics communities. While a conference like AISTATS is certainly contributing to this, it does not reach the main bulk of the statistics community. Since, in Reykjavik, we had discussed the corresponding difficulty of people publishing a longer and “more” statistical paper in a “more” statistical journal, once the central idea was published in a machine learning conference proceeding like NIPS or AISTATS. we had this idea that creating a special fast-track in a mainstream statistics journal for a subset of those papers, using for instance a tailor-made committee in that original conference, or creating an annual survey of the top machine learning conference proceedings rewritten in a more” statistical way (and once again selected by an ad hoc committee) would help, at not too much of a cost for inducing machine learners to make the extra-effort of switching to another style. From there, we enlarged the suggestion to enlist a sufficient number of (diverse) bloggers in each major conference towards producing quick but sufficiently informative entries on their epiphany talks (if any), possibly supported by the conference organisers or the sponsoring societies. (I am always happy to welcome any guest blogger in conferences I attend!)

AISTATS 2014 (tee-shirt)

Posted in Kids, pictures, Statistics, Travel, University life with tags , , , , , on May 6, 2014 by xi'an

It took me a fairly long while to realise there was a map of Iceland as a tag-cloud at the back of the AISTATS 2014 tee-shirt! As it was far too large for me, I thought about leaving it at the conference desk last week. I did bring it back for someone the proper size though and discovered the above when unfolding the tee… Nice but still not my size!

Reykjavik street art

Posted in Kids, pictures, Running, Travel with tags , , , , on May 3, 2014 by xi'an

controlled thermodynamic integral for Bayesian model comparison [reply]

Posted in Books, pictures, Running, Statistics, University life with tags , , , , , , , , , , , , on April 30, 2014 by xi'an

Chris Oates wrotes the following reply to my Icelandic comments on his paper with Theodore Papamarkou, and Mark Girolami, reply that is detailed enough to deserve a post on its own:

Thank you Christian for your discussion of our work on the Og, and also for your helpful thoughts in the early days of this project! It might be interesting to speculate on some aspects of this procedure:

(i) Quadrature error is present in all estimates of evidence that are based on thermodynamic integration. It remains unknown how to exactly compute the optimal (variance minimising) temperature ladder “on-the-fly”; indeed this may be impossible, since the optimum is defined via a boundary value problem rather than an initial value problem. Other proposals for approximating this optimum are compatible with control variates (e.g. Grosse et al, NIPS 2013, Friel and Wyse, 2014). In empirical experiments we have found that the second order quadrature rule proposed by Friel and Wyse 2014 leads to substantially reduced bias, regardless of the specific choice of ladder.

(ii) Our experiments considered first and second degree polynomials as ZV control variates. In fact, intuition specifically motivates the use of second degree polynomials: Let us presume a linear expansion of the log-likelihood in θ. Then the implied score function is constant, not depending on θ. The quadratic ZV control variates are, in effect, obtained by multiplying the score function by θ. Thus control variates can be chosen to perfectly correlate with the log-likelihood, leading to zero-variance estimators. Of course, there is an empirical question of whether higher-order polynomials are useful when this Taylor approximation is inappropriate, but they would require the estimation of many more coefficients and in practice may be less stable.

(iii) We require that the control variates are stored along the chain and that their sample covariance is computed after the MCMC has terminated. For the specific examples in the paper such additional computation is a negligible fraction of the total computational, so that we did not provide specific timings. When non-diffegeometric MCMC is used to obtain samples, or when the score is unavailable in closed-form and must be estimated, the computational cost of the procedure would necessarily increase.

For the wide class of statistical models with tractable likelihoods, employed in almost all areas of statistical application, the CTI we propose should provide state-of-the-art estimation performance with negligible increase in computational costs.

AISTATS 2014 [day #3]

Posted in Mountains, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , on April 28, 2014 by xi'an

The third day at AISTATS 2014 started with Michael Jordan giving his plenary lecture, or rather three short talks on “Big Data” privacy, communication risk, and (bag of) bootstrap. I had not previously heard Michael talking about the first two topics and further found interesting the attempt to put computation into the picture (a favourite notion of Michael’s), however I was a bit surprised at the choice of a minimax criterion. Indeed, getting away from the minimax criterion was one of the major reasons I move to the B side of the Force. Because it puts exactly the same importance on every single value of the parameter. Even the most impossible ones. I was also a wee bit surprised at the optimal solution produced by this criterion: in a multivariate binary data setting (e.g., multiple drugs usage), the optimal privacy solution was to create a random binary vector and pick at random between this vector and its complement, depending on which one is closest to the observable. The loss of information seems formidable if the dimension of the vector is large. (Implementing ABC as a privacy [privacizing?] strategy would sound better if less optimal…) The next session was about deep learning, of which I knew [and know nothing], but the talk by Yoshua Bengio raised very relevant questions, like how to learn where the main part of the mass of a probability distribution is, besides pointing at a recent survey of his’. The survey points at some notions that I master and some that I don’t, but a cursory reading does not lead me to put an intuitive meaning on deep learning.

The last session of the day and of the conference was on more statistical issues, like a Gaussian process modelling of a spatio-temporal dataset on Afghanistan attacks by Guido Sanguinetti, the use of Rao-Blackwellisation and control variate to build black-box variational inference by Rajesh Ranganath, the construction of  conditional exponential families on mixed graphs by Pradeep Ravikumar, and a presentation of probabilistic programming with Anglican by Frank Wood that I had already seen in Banff. In particular, I found the result on the existence of joint exponential families on graphs when defined by those full conditionals quite exciting!

The second poster session was in the early evening, with many more posters (and plenty of food and drinks!), as it also included the (non-refereed) MLSS posters. Among the many interesting ones I spotted, a way to hit-and-run for quasi-concave densities, estimating mixtures with negative weights, a failing particle algorithm for a flu epidemics, an exact EP algorithm, and a fairly intense discussion around Richard Wilkinson’s poster on Gaussian process ABC algorithm (that I discussed on the ‘Og a while ago).