Archive for the Books Category

Fool’s quest [book review]

Posted in Books, Kids with tags , , , , , , , on October 23, 2016 by xi'an

Although I bought this second volume in the Fitz and the Fool trilogy quite a while ago, I only came to read it very recently. And enjoyed it unreservedly! While the novel builds upon the universe Hobb created in the liveship traders trilogy (forget the second trilogy!) and the Assassin and Fool trilogies, the story is compelling enough to bring out excitement and longing for further adventures of Fitz and the Fool. Many characters that were introduced in the earlier volume suddenly take on substance and meaning, while the main characters are no longer heroes of past eras, but also acquire further depth and subtlety. Even long-lasting ones like Chade. I cannot tell whether this new dimension of the plights affecting the Six Duchies and its ruler, King Verity, was conceived from the start or came later to the author, but it really fits seamlessly and increases by several orders of magnitude the epic feeling of the creation. Although it is hard to rank this book against the very first ones, like Royal Assassin, I feel this is truly one of the best of Hobb’s books, with the right mixture of action, plotting, missed opportunities and ambiguous angles about the main characters. So many characters truly come to life in this volume that I bemoan the sluggish pace of the first one even more now. While one could see Fool’s Quest as the fourteenth book in the Realm of the Elderlings series, and hence hint at senseless exploitation of the same saga, there are just too many new threads and perspective there to maintain this posture. A wonderful book and a rarity of a middle book being so. I am clearly looking forward the third instalment!

a grim knight [cont’d]

Posted in Books, Kids, pictures, R, Statistics with tags , , , , , , , on October 20, 2016 by xi'an

As discussed in the previous entry, there are two interpretations to this question from The Riddler:

“…how long is the longest path a knight can travel on a standard 8-by-8 board without letting the path intersect itself?”

riddlerechckas to what constitutes a path. As a (terrible) chess player, I would opt for the version on the previous post, the knight moving two steps in one direction and one in the other (or vice-versa), thus occupying three squares on the board. But one can consider instead the graph of the moves of that knight, as in the above picture and as in The Riddler. And since I could not let the problem go I also wrote an R code (as clumsy as the previous one!) to explore at random (with some minimal degree of annealing) the maximum length of a self-avoiding knight canter. riddlerechkThe first maximal length I found this way is 32, although I also came by hand to a spiralling solution with a length of 33.

riddlerechckRunning the R code longer over the weekend however led to a path of length 34, while the exact solution to the riddle is 35, as provided by the Riddler (and earlier in many forms, including Martin Gardner’s and Donald Knuth’s).

[An unexpected side-effect of this riddle was ending up watching half of Bergman’s Seventh Seal in Swedish…]

scalable Langevin exact algorithm

Posted in Books, Statistics, Travel, University life with tags , , , , , , , , , , , on October 18, 2016 by xi'an

“By employing a modification to existing naïve subsampling techniques we can obtain an algorithm which is still exact but has sub-linear iterative cost as a function of data size.”

A few weeks ago Murray Pollock, Paul Fearnhead, Adam Johansen and Gareth Roberts (all from Warwick except for Paul) arXived a paper The Scalable Langevin Exact Algorithm: Bayesian Inference for Big Data. (This was also the topic of Murray’s talk last year at JSM in Seattle.) One major advance found in the paper is the derivation of an “exact” algorithm that is sub-linear in the data size. As discussed in the introduction, the current approaches to large data problems either suffer from being approximate (like divide-and-conquer methods) or do not achieve significant reduction in the computing time, being of order O(n). The authors mention Teh and Welling (2011) sand their tochastic gradient approximation to the Langevin diffusion, when the gradient is based on a subsample. Without the Metropolis correction that would ensure an exact target but at a cost of order O(n). (Which makes the technique rather difficult to recommend.)

A novel [for me] notion at the core of this paper is the concept of quasi-stationary distribution, which is the limiting distribution of a Markov chain X[t] conditional on a Markov stopping time [being larger than t]. The approach is based on diffusions with appropriate stationary distributions like the Langevin diffusion. (Actually, as in most papers I have read and remember, the current paper only considers the Langevin diffusion.) In order to avoid the issues with unadjusted and Metropolis-adjusted Langevin schemes, a killed Brownian motion is created, which means a Brownian motion conditional of being alive till time T when the instantaneous killing rate is a given function of the chain, Φ(X[t]), related with the stationary measure of the Langevin diffusion ν. Under appropriate conditions, the density of this killed Brownian motion converges [in T] to √ν. Which immediately hints at creating a new Langevin diffusion targeting ν² instead of ν. And killing it with the proper rate, which can be done by thinning a Poisson process. Simulating the original path can be done by path-space rejection sampling, following the technique set by Gareth Roberts and co-authors more than ten years ago. Based on finite dimensional realisations of the path on [0,T]. And including the killing part can be done by importance sampling and checking that the simulated killing time is larger than the current (exponentially simulated) time.

One practical difficulty in the implementation of this neat principle is the derivation of the normalising constant, which evaluation degrades with the time horizon T. The solution adopted in the paper is through a sequential Monte Carlo method, using another discretisation of the time interval [0,T] (relying on the original one would get too costly?). As for subsampling, since the survival probability for the Brownian motion is based on an unbiased estimator, subsampling does not hurt if conducted in a random manner. Although this increases the variance on principle, the use of a control variate computed just once helps in reducing the complexity to O(1).

This is a tough paper and I have not gone through the effort of trying to implement it, but this is an original and innovative construct I would like to monitor in further details on a toy example, maybe next week while in Warwick. Or at least to discuss it with the authors.

Nature highlights

Posted in Books, Kids, pictures, Statistics with tags , , , , , , , , , , , , , on October 16, 2016 by xi'an

Among several interesting (general public) entries and the fascinating article reconstituting the death of Lucy by a fall from a tree, I spotted in the current Sept. 22 issue of Nature two short summaries involving statistical significance, one in linguistics about repeated (and significant) links between some sounds and some concepts (like ‘n’ and ‘nose’) shared between independent languages, another about the (significant) discovery of a π meson and a K meson. The first anonymous editorial, entitled “Algorithm and blues“, was rather gloomy about the impact of proprietary algorithms on our daily life and on our democracies (or what is left of them), like the reliance on such algorithms to grant loan or determining the length of a sentence (based on the estimated probability of re-offending). The article called for more accountability of such tools, from going completely open-source to allowing for some form of strong auditing. This reminded me of the current (regional) debate about the algorithm allocating Greater Paris high school students to local universities and colleges based on their grades, wishes, and available positions. The apparent randomness and arbitrariness of those allocations prompted many (parents) to complain about the algorithm and ask for its move to the open. (Besides the pun in the title, the paper also contained a line about “affirmative algorithmic action”!) There was also a perfectly irrelevant tribune from a representative of the Church of England about its desire to give a higher profile to science in the/their church. Whatever. And I also was bemused by a news article on the difficulty to build a genetic map of Australia Aboriginals due to cultural reticence of Aboriginals to the use of body parts from their communities in genetic research. While I understand and agree with the concept of data privacy, so that to restrain to expose personal information, it is much less clear [to me] why data collected a century ago should come under such protections if it does not create a risk of exposing living individuals. It reminded me of this earlier Nature news article about North-America Aboriginals claiming right to a 8,000 year old skeleton. On a more positive side, this news part also mentioned the first catalogue produced by the Gaia European Space Agency project, from the publication of more than a billion star positions to the open access nature of the database, in that the Gaia team had hardly any prior access to such wealth of data. A special issue part of the journal was dedicated to the impact of social inequalities in the production of (future) scientists, but this sounds rather shallow, at least at the level of the few pages produced on the topic and it did not mention a comparison with other areas of society, where they are also most obviously at work!

Hastings without Metropolis

Posted in Books, Kids, Travel with tags , , , , , , , on October 14, 2016 by xi'an

Today marks the 950th anniversary of the battle of Hastings, when Guillaume, Duke of Normandy and pretendent to the throne of England following the death of the childless King Edward the Confessor in January 1066, defeated Harold Godwinson, recently elected King and even more recently the victor of a battle against another pretended, Harald Hardrada of Norway. (I had always thought that Guillaume had fought the battle the day his fleet of hundreds of Viking drakkars landed, but he arrived two weeks earlier and had time to build a fort in Hastings.) One of the consequences of this victory would be significant changes in the structure and vocabulary of the English language. [One may wonder at why I am mentioning this anniversary but been “born and raised” in the heart of Guillaume’s Norman kingdom prompted some long-lasting interest in the Norman invasion.]

like a rollin’ stone..!

Posted in Books, pictures with tags , , on October 13, 2016 by xi'an

importance sampling by kernel smoothing [experiment]

Posted in Books, R, Statistics with tags , , , , , , on October 13, 2016 by xi'an

Following my earlier post on Delyon and Portier’s proposal to replacing the true importance distribution ƒ with a leave-one-out (!) kernel estimate in the importance sampling estimator, I ran a simple one-dimensional experiment to compare the performances of the traditional method with this alternative. The true distribution is a N(0,½) with an importance proposal a N(0,1) distribution, the target is the function h(x)=x⁶ [1-0.9 sin(3x)], n=2643 is the number of simulations, and the density is estimated via the call to the default density() R function. The first three boxes are for the regular importance sampler, and the kernel and the corrected kernel versions of Delyon and Portier, while the second set of three considers the self-normalised alternatives. In all kernel versions, the variability is indeed much lower than with importance sampling, but the bias is persistent, with no clear correction brought by the first order proposal in the paper, while those induce a significant increase in computing time:

> benchmark(
+ for (t in 1:100){
+   x=sort(rnorm(N));fx=dnorm(x)
+  imp1=dnorm(x,sd=.5)/fx})

replicas elapsed relative user.child sys.child
1        100     7.948    7.94       0.012
> benchmark(
+ for (t in 1:100){
+   x=sort(rnorm(N));hatf=density(x)
+   hatfx=approx(hatf$x,hatf$y, x)$y
+   imp2=dnorm(x,sd=.5)/hatfx})
replicas elapsed relative user.child sys.child
1        100      19.272  18.473     0.94

> benchmark(
+ for (t in 1:100){
+   x=sort(rnorm(N));hatf=density(x)
+   hatfx=approx(hatf$x,hatf$y, x)$y
+   bw=hatf$bw
+   for (i in 1:N) Kx[i]=1-sum((dnorm(x[i],
+     mean=x[-i],sd=bw)-hatfx[i])^2)/NmoNmt/hatfx[i]^2
+   imp3=dnorm(x,sd=.5)*Kx/hatfx})

replicas elapsed relative user.child sys.child
1        100     11378.38  7610.037  17.239

which follows from the O(n) cost in deriving the kernel estimate for all observations (and I did not even use the leave-one-out option…) The R computation of the variance is certainly not optimal, far from it, but those enormous values give an indication of the added cost of the step, which does not even seem productive in terms of variance reduction… [Warning: the comparison is only done over one model and one target integrand, thus does not pretend at generality!]