Archive for the R Category

a grim knight [cont’d]

Posted in Books, Kids, pictures, R, Statistics with tags , , , , , , , on October 20, 2016 by xi'an

As discussed in the previous entry, there are two interpretations to this question from The Riddler:

“…how long is the longest path a knight can travel on a standard 8-by-8 board without letting the path intersect itself?”

riddlerechckas to what constitutes a path. As a (terrible) chess player, I would opt for the version on the previous post, the knight moving two steps in one direction and one in the other (or vice-versa), thus occupying three squares on the board. But one can consider instead the graph of the moves of that knight, as in the above picture and as in The Riddler. And since I could not let the problem go I also wrote an R code (as clumsy as the previous one!) to explore at random (with some minimal degree of annealing) the maximum length of a self-avoiding knight canter. riddlerechkThe first maximal length I found this way is 32, although I also came by hand to a spiralling solution with a length of 33.

riddlerechckRunning the R code longer over the weekend however led to a path of length 34, while the exact solution to the riddle is 35, as provided by the Riddler (and earlier in many forms, including Martin Gardner’s and Donald Knuth’s).

[An unexpected side-effect of this riddle was ending up watching half of Bergman’s Seventh Seal in Swedish…]

an attempt at EP-ABC from scratch, nothing more… [except for a few bugs]

Posted in R, Statistics, University life with tags , , , , , , on October 19, 2016 by xi'an

Following a request from one of the reviewers of our chapter Likelihood-free model choice, I tried to run EP-ABC on a toy problem and to compare it with the outcome of a random forest ABC. Literally starting from scratch, namely from the description found in Simon and Nicolas’ JASA paper.  To run my test, I chose as my modelled data an exponential Exp(λ) sample of size 20, with a Gaussian N(0,1) prior on the log parameter (here replicated 100 times):

#prior values

Then I applied the formulas found in the paper for approximating the evidence, first defining the log normalising constant for an unnormalised Gaussian density as on the top of p.318


[which exhibited a typo in the paper, as Simon Barthelmé figured out after emails exchanges that the right formulation was the inverse]


and iterating the site updates as described in Sections 2.2, 3.1 and Algorithms 1 and 3:

 #global and ith natural parameters for Gaussian approximations
 #constants Ci in (2.6)
 for (t in 1:Nep){
  for (i in sample(1:n)){
  #site i update
   Qm=Q-Qi[i] #m for minus i
   #ABC sample
#update by formula (3.3)
  if (Mh>1e3){
#enough proposals accepted
#update of Ci as in formula (2.7)
#with correction bottom of p.319
  normz[i]=log(Mh/M/2/eps)- Psi(er,Q)+Psi(erm,Qm)
#approximation of the log evidence as on p.318
return(sum(normz)+Psi(er,Q)-Psi(er0,Q0)) }

except that I made an R beginner’s mistake (!) when calling the normal simulation to be


to be compared with the genuine evidence under a conjugate Exp(1) prior [values of the evidence should differ but should not differ that much]


After correcting for those bugs and too small sample sizes, thanks to Simon kind-hearted help!, running this code results in minor discrepancies between both quantities:

M=1e4 #number of ABC samples
propep=.1 #tolerance factor
for (t in 1:100){
 #tolerance scaled by data

as shown by the comparison below between the evidence and the EP-ABC approximations to the evidence, called epabence (the dashed line is the diagonal):

epabcObviously, this short (if lengthy at my scale) experiment is not intended to conclude that EP-ABC work or does not work. (Simon also provided additional comparison with the genuine evidence, that is under the right prior, and with the zero Monte-Carlo version of the EP-ABC evidence, achieving a high concentration over the diagonal.) I just conclude that the method does require a certain amount of calibration to become stable. Calibrations steps that are clearly spelled out in the paper (adaptive choice of M, stabilisation by quasi-random samples, stabilisation by stochastic optimisation steps and importance sampling), but also imply that EP-ABC is also “far from routine use because it make take days to tune on real problems” (pp.315-316). Thinking about the reasons for such discrepancy over the past days, one possible explanation I see for the impact of the calibration parameters is that the validation of EP if any occurs at a numerical rather than Monte Carlo level, hence that the Monte Carlo error must be really small to avoid interfering with the numerical aspects.

tractable Bayesian variable selection: beyond normality

Posted in R, Statistics, University life with tags , , , , , , , on October 17, 2016 by xi'an

bird tracks in the first snow of Winter, Feb. 05, 2012 David Rossell and Francisco Rubio (both from Warwick) arXived a month ago a paper on non-normal variable selection. They use two-piece error models that preserve manageable inference and allow for simple computational algorithms, but also characterise the behaviour of the resulting variable selection process under model misspecification. Interestingly, they show that the existence of asymmetries or heavy tails leads to power losses when using the Normal model. The two-piece error distribution is made of two halves of location-scale transforms of the same reference density on the two sides of the common location parameter. In this paper, the density is either Gaussian or Laplace (i.e., exponential?). In both cases the (log-)likelihood has a nice compact expression (although it does not allow for a useful sufficient statistic). One is the L¹ version versus the other which is the L² version. Which is the main reason for using this formalism based on only two families of parametric distributions, I presume. (As mentioned in an earlier post, I do not consider those distributions as mixtures because the component of a given observation can always be identified. And because as shown in the current paper, maximum likelihood estimates can be easily derived.) The prior construction follows the non-local prior principles of Johnson and Rossell (2010, 2012) also discussed in earlier posts. The construction is very detailed and hence highlights how many calibration steps are needed in the process.

“Bayes factor rates are the same as when the correct model is assumed [but] model misspecification often causes a decrease in the power to detect truly active variables.”

When there are too many models to compare at once, the authors propose a random walk on the finite set of models (which does not require advanced measure-theoretic tools like reversible jump MCMC). One interesting aspect is that moving away from the normal to another member of this small family is driven by the density of the data under the marginal densities, which means moving only to interesting alternatives. But also sticking to the normal only for adequate datasets. In a sense this is not extremely surprising given that the marginal likelihoods (model-wise) are available. It is also interesting that on real datasets, one of the four models is heavily favoured against the others, be it Normal (6.3) or Laplace (6.4). And that the four model framework returns almost identical values when compared with a single (most likely) model. Although not immensely surprising when acknowledging that the frequency of the most likely model is 0.998 and 0.998, respectively.

“Our framework represents a middle-ground to add flexibility in a parsimonious manner that remains analytically and computationally tractable, facilitating applications where either p is large or n is too moderate to fit more flexible models accurately.”

Overall, I find the experiment quite conclusive and do not object [much] to this choice of parametric family in that it is always more general and generic than the sempiternal Gaussian model. That we picked in our Bayesian Essentials, following tradition. In a sense, it would be natural to pick the most general possible parametric family that allows for fast computations, if this notion does make any sense…

grim knight [a riddle]

Posted in Kids, pictures, R with tags , , , on October 14, 2016 by xi'an

The Riddler of this week had a riddle that is a variation of the knight tour problem, namely

“…how long is the longest path a knight can travel on a standard 8-by-8 chessboard without letting the path intersect itself?”

the riddle being then one of a self-avoiding random walk [kind]… As I could not get back to sleep last night, I spent a couple hours (!) on this riddle, programming a random walk [or more accurately, a random canter]. This is a brute-force approach in that I pick any acceptable move with the same probability and stop when there is no further move available. [The title refers to the recommendation to avoid the rim of the chessboard with a knight: “a knight on the rim is grim”…]

while (stop){

with my function nexx a rather clumsy 50 lines business of selecting one acceptable move from the current position curr. This function returns the proposed move as well as the updated board with zeros in squares already visited by the knight. Which highlights the ambiguity in the question, namely how one defines the path of a knight? For an acceptable knight move from A to B, there are two possible paths: either take two steps in one direction and one in the orthogonal direction or the opposite. I thus pick one of the two (at random) and prohibit further visits to those squares. An alternative meaning of the question could be that the line joining A to B cannot be crossed ever again, which excludes less moves (but is more cumbersome to code). Anyway, with the former interpretation of a path, repeating the self-avoiding moves led to a maximum of 19 moves, with one solution exhibited below. (Since (64-1)/3=21, it is conceivable that the true maximum is 20 or even 21. In the path representation below, it seems possible to include yet another move by going to (4,1) instead of (4,5). But this is apparently excluded by the square representation on the right. Why is why the path representation is somewhat confusing!)


Today, namely on October 15, I received a solution of length 21, hence covering the entire board without ever using the same square twice. It was sent to me by Paul-Henry Cournède (a geographical neighbour!) and is “obvious” once you see it. Which may be why the alternative interpretation of “path” was chosen in The Riddler. And why my rhs representation is clearly misleading!


importance sampling by kernel smoothing [experiment]

Posted in Books, R, Statistics with tags , , , , , , on October 13, 2016 by xi'an

Following my earlier post on Delyon and Portier’s proposal to replacing the true importance distribution ƒ with a leave-one-out (!) kernel estimate in the importance sampling estimator, I ran a simple one-dimensional experiment to compare the performances of the traditional method with this alternative. The true distribution is a N(0,½) with an importance proposal a N(0,1) distribution, the target is the function h(x)=x⁶ [1-0.9 sin(3x)], n=2643 is the number of simulations, and the density is estimated via the call to the default density() R function. The first three boxes are for the regular importance sampler, and the kernel and the corrected kernel versions of Delyon and Portier, while the second set of three considers the self-normalised alternatives. In all kernel versions, the variability is indeed much lower than with importance sampling, but the bias is persistent, with no clear correction brought by the first order proposal in the paper, while those induce a significant increase in computing time:

> benchmark(
+ for (t in 1:100){
+   x=sort(rnorm(N));fx=dnorm(x)
+  imp1=dnorm(x,sd=.5)/fx})

replicas elapsed relative user.child sys.child
1        100     7.948    7.94       0.012
> benchmark(
+ for (t in 1:100){
+   x=sort(rnorm(N));hatf=density(x)
+   hatfx=approx(hatf$x,hatf$y, x)$y
+   imp2=dnorm(x,sd=.5)/hatfx})
replicas elapsed relative user.child sys.child
1        100      19.272  18.473     0.94

> benchmark(
+ for (t in 1:100){
+   x=sort(rnorm(N));hatf=density(x)
+   hatfx=approx(hatf$x,hatf$y, x)$y
+   bw=hatf$bw
+   for (i in 1:N) Kx[i]=1-sum((dnorm(x[i],
+     mean=x[-i],sd=bw)-hatfx[i])^2)/NmoNmt/hatfx[i]^2
+   imp3=dnorm(x,sd=.5)*Kx/hatfx})

replicas elapsed relative user.child sys.child
1        100     11378.38  7610.037  17.239

which follows from the O(n) cost in deriving the kernel estimate for all observations (and I did not even use the leave-one-out option…) The R computation of the variance is certainly not optimal, far from it, but those enormous values give an indication of the added cost of the step, which does not even seem productive in terms of variance reduction… [Warning: the comparison is only done over one model and one target integrand, thus does not pretend at generality!]

Journal of Open Source Software

Posted in Books, R, Statistics, University life with tags , , , , , , , , on October 4, 2016 by xi'an

A week ago, I received a request for refereeing a paper for the Journal of Open Source Software, which I have never seen (or heard of) before. The concept is quite interesting with a scope much broader than statistical computing (as I do not know anyone in the board and no-one there seems affiliated with a Statistics department). Papers are very terse, describing the associated code in one page or two, and the purpose of refereeing is to check the code. (I was asked to evaluate an MCMC R package but declined for lack of time.) Which is a pretty light task if the code is friendly enough to operate right away and provide demos. Best of luck to this endeavour!

approximate lasso

Posted in pictures, R, Statistics with tags , , , on October 2, 2016 by xi'an

approxassoHere is a representation of the precision of a kernel density estimate (second axis) against the true value of the density (first axis), which looks like a lasso of sorts, hence the title. I am not sure this tells much, except that the estimated values are close to the true values and that a given value of f(x) is associated with two different estimates, predictably…