399 safe[[r] for now]

Posted in Kids, Mountains, pictures, Travel with tags , , , , , , , , , , , on October 14, 2018 by xi'an

Argentan half-marathon [1:26:03, 22nd/329, 4th V2/63, 15⁰]

Posted in pictures, Running with tags , , , , , , , , , on October 13, 2018 by xi'an

Despite failing to reach the podium this year in my V2 category, a predictable first as I get closer to move to the next category, V3, or “senior grandmaster”, and missing by a few seconds my 1:25 target, I was rather happy with this half-marathon, having run at my pace the whole race, with none of the low passages I had in previous races and instead feeling quite well in the last kilometers, where I left the first female runner. Weather was perfect, with no sun, no rain, and no wind in the second half. The three V2 runners in front of me were much faster (1:22, 1:24, and 1:25:21) than those of the previous years, an illustration of the law of small numbers found in these races…

Juan Antonio Cano Sanchez (1956-2018)

Posted in Statistics, University life with tags , , , , , , , , on October 12, 2018 by xi'an

I have just learned the very sad news that Juan Antonio Cano, from Universidad de Murcia, with whom Diego Salmerón and I wrote two papers on integral priors, has passed away, after a long fight against a kidney disease. Having communicated with him recently, I am quite shocked by him passing away as I was not aware of his poor health. The last time we met was at the O’Bayes 2015 meeting in Valencià, with a long chat in the botanical gardens of the Universitat de Valencià. Juan Antonio was a very kind and unassuming person, open and friendly, with a continued flow of research in Objective Bayes methodology and in particular on integral priors. Hasta luego, Juan Antonio!

accelerating HMC by learning the leapfrog scale

Posted in Books, Statistics with tags , , , , , , , , on October 12, 2018 by xi'an

In this new arXiv submission that was part of Changye Wu’s thesis [defended last week],  we try to reduce the high sensitivity of the HMC algorithm to its hand-tuned parameters, namely the step size ε  of the discretisation scheme, the number of steps L of the integrator, and the covariance matrix of the auxiliary variables. By calibrating the number of steps of the Leapfrog integrator towards avoiding both slow mixing chains and wasteful computation costs. We do so by learning from the No-U-Turn Sampler (NUTS) of Hoffman and Gelman (2014) which already automatically tunes both the step size and the number of leapfrogs.

The core idea behind NUTS is to pick the step size via primal-dual averaging in a burn-in (warmup, Andrew would say) phase and to build at each iteration a proposal based on following a locally longest path on a level set of the Hamiltonian. This is achieved by a recursive algorithm that, at each call to the leapfrog integrator, requires to evaluate both the gradient of the target distribution and the Hamiltonianitself. Roughly speaking an iteration of NUTS costs twice as much as regular HMC with the same number of calls to the integrator. Our approach is to learn from NUTS the scale of the leapfrog length and use the resulting empirical distribution of the longest leapfrog path to randomly pick the value of  L at each iteration of an HMC scheme. This obviously preserves the validity of the HMC algorithm.

While a theoretical comparison of the convergence performances of NUTS and this eHMC proposal seem beyond our reach, we ran a series of experiments to evaluate these performances, using as a criterion an ESS value that is calibrated by the evaluation cost of the logarithm of target density function and of its gradient, as this is usually the most costly part of the algorithms. As well as a similarly calibrated expected square jumping distance. Above is one such illustration for a stochastic volatility model, the first axis representing the targeted acceptance probability in the Metropolis step. Some of the gains in either ESS or ESJD are by a factor of ten, which relates to our argument that NUTS somewhat wastes computation effort using a uniformly distributed proposal over the candidate set, instead of being close to its end-points, which automatically reduces the distance between the current position and the proposal.

Le Monde puzzle [#1070]

Posted in Books, Kids, R, University life with tags , , , , , , , on October 11, 2018 by xi'an

Rewording Le Monde mathematical puzzle  fifth competition problem

For the 3×3 tables below, what are the minimal number of steps to move from left to rights when the yellow tokens can only be move to an empty location surrounded by two other tokens?

In the 4×4 table below, there are 6 green tokens. How many steps from left to right?

Painful and moderately mathematical, once more… For the first question, a brute force simulation of random valid moves of length less than 100 returns solutions in 4 steps:

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
     1    1    1    0    0    1    0    0    1
     1    0    1    0    1    1    0    0    1
     0    0    1    1    1    1    0    0    1
     0    0    1    1    0    1    0    1    1
     0    0    1    0    0    1    1    1    1

But this is not an acceptable move because of the “other” constraint. Imposing this constraint leads to a solution in 9 steps, but is this the lowest bound?! It actually took me most of the weekend (apart from a long drive to and from a short half-marathon!) to figure out a better strategy than brute force random exploration: the trick I eventually figured out is to start from the finishing (rightmost) value F of the grid and look at values with solutions available in 1,2,… steps. This is not exactly dynamic programming, but it keeps the running time under control if there is a solution associated with the starting (leftmost) value S. (Robin proceeded reverse-wise, which in retrospect is presumably equivalent, if faster!) The 3×3 grid has 9 choose 5, ie 126, possible configurations with 5 coins, which means the number of cases remains under control. And even so for the 4×4 grid with 6 coins, with 8008 configurations. This led to a 9 step solution for n=3 and the proposed starting grid in yellow:

[1] 1 1 1 0 0 1 0 0 1
[1] 1 1 0 0 1 1 0 0 1
[1] 1 1 0 1 1 0 0 0 1
[1] 0 1 0 1 1 1 0 0 1
[1] 0 1 1 1 0 1 0 0 1
[1] 1 1 1 1 0 0 0 0 1
[1] 0 1 1 1 1 0 0 0 1
[1] 0 0 1 1 1 0 0 1 1
[1] 0 0 1 1 0 0 1 1 1
[1] 0 0 1 0 0 1 1 1 1

and a 19 step solution for n=4:

[1] 1 0 0 0 0 1 0 0 0 1 1 1 0 0 0 1
[1] 1 0 0 0 0 1 0 0 0 0 1 1 0 0 1 1
[1] 1 0 0 0 0 1 1 0 0 0 1 1 0 0 1 0
[1] 1 1 0 0 0 1 1 0 0 0 0 1 0 0 1 0
[1] 1 1 0 0 0 0 1 1 0 0 0 1 0 0 1 0
[1] 1 1 1 0 0 0 1 1 0 0 0 0 0 0 1 0
[1] 1 0 1 1 0 0 1 1 0 0 0 0 0 0 1 0
[1] 1 1 1 1 0 0 1 0 0 0 0 0 0 0 1 0
[1] 1 1 0 1 0 1 1 0 0 0 0 0 0 0 1 0
[1] 1 0 0 1 1 1 1 0 0 0 0 0 0 0 1 0
[1] 0 0 0 1 1 1 1 0 0 0 1 0 0 0 1 0
[1] 0 0 0 1 1 1 0 0 0 1 1 0 0 0 1 0
[1] 0 0 0 1 1 1 0 0 1 1 0 0 0 0 1 0
[1] 0 0 0 1 0 1 0 0 1 1 0 0 0 1 1 0
[1] 0 0 0 1 0 1 0 0 1 0 0 0 1 1 1 0
[1] 0 0 0 1 1 1 0 0 1 0 0 0 1 0 1 0
[1] 0 0 0 1 0 1 0 0 1 1 0 0 1 0 1 0
[1] 0 0 0 1 0 1 0 0 0 1 1 0 1 0 1 0
[1] 0 0 0 1 0 1 1 0 0 1 1 0 1 0 0 0

The first resolution takes less than a minute and the second one just a few minutes (or less than my short morning run!). Surprisingly, using this approach does not require more work, which makes me wonder at the solution Le Monde journalists will propose. Given the (misguided) effort put into my resolution, seeing a larger number of points for this puzzle is no wonder.

against the death penalty

Posted in pictures with tags , , on October 10, 2018 by xi'an

simulated summary statistics [in the sky]

Posted in Statistics with tags , , , , , , , on October 10, 2018 by xi'an

Thinking it was related with ABC, although in the end it is not!, I recently read a baffling cosmology paper by Jeffrey and Abdalla. The data d there means an observed (summary) statistic, while the summary statistic is a transform of the parameter, μ(θ), which calibrates the distribution of the data. With nuisance parameters. More intriguing to me is the sentence that the correct likelihood of d is indexed by a simulated version of μ(θ), μ'(θ), rather than by μ(θ). Which seems to assume that the pseudo- or simulated data can be produced for the same value of the parameter as the observed data. The rest of the paper remains incomprehensible for I do not understand how the simulated versions are simulated.

“…the corrected likelihood is more than a factor of exp(30) more probable than the uncorrected. This is further validation of the corrected likelihood; the model (i.e. the corrected likelihood) shows a better goodness-of-fit.”

The authors further ressort to Bayes factors to compare corrected and uncorrected versions of the likelihoods, which leads (see quote) to picking the corrected version. But are they comparable as such, given that the corrected version involves simulations that are treated as supplementary data? As noted by the authors, the Bayes factor  unsurprisingly goes to one as the number M of simulations grows to infinity, as supported by the graph below.