The nice logo of MCqMC 2016 was a collection of eight series of QMC dots on the unit (?) cube. The organisers set a competition to identify the principles behind those quasi-random sets and as I had no idea for most of them I entered very random sets unconnected with algorithmia, for which I got an honourable mention and a CD prize (if not the conference staff tee-shirt I was coveting!) Art Owen sent me back my entry, posted below and hopefully (or not!) readable.
To sort of make up for the failed attempt at Monte Rosa, we stayed an extra day and took a hike in Vale d’Aosta, starting from Cogne where we had a summer school a few years ago. And from where we started for another failed attempt at La Grivola. It was a brilliant day and we climbed to the Rifugio Vittorio Stella (2588m) [along with many many other hikers], then lost the crowds to the Colle della Rossa (3195m), which meant a 1700m easy climb. By the end of the valley, we came across steinbocks (aka bouquetins, stambecchi) resting in the sun by a creek and unfazed by our cameras. (Abele Blanc told us later that they are usually staying there, licking whatever salt they can find on the stones.)
The final climb to the pass was a bit steeper but enormously rewarding, with views of the Western Swiss Alps in full glory (Matterhorn, Combin, Breithorn) and all to ourselves. From there it was a downhill hike all the way back to our car in Cogne, 1700m, with no technical difficulty once we had crossed the few hundred meters of residual snow. And with the added reward of seeing several herds of the shy chamois mountain goat.
Except that my daughter’s rental mountaineering shoes started to make themselves heard and that she could barely walk downwards. (She eventually lost her big toe nails!) It thus took us forever to get down (despite me running to the car and back to get lighter shoes) and we came to the car at 8:30, too late to contemplate a drive back to Paris.
“One of our contribution comes from the mathematical analysis of the consequence of conditioning the parameters of interest on consistent statistics and intrinsically inconsistent statistics”
Xiaolong Zhong and Malay Ghosh have just arXived an ABC paper focussing on the convergence of the method. And on the use of sufficient dimension reduction techniques for the construction of summary statistics. I had not heard of this approach before so read the paper with interest. I however regret that the paper does not link with the recent consistency results of Liu and Fearnhead and of Daniel Frazier, Gael Martin, Judith Rousseau and myself. When conditioning upon the MLE [or the posterior mean] as the summary statistic, Theorem 1 states that the Bernstein-von Mises theorem holds, missing a limit in the tolerance ε. And apparently missing conditions on the speed of convergence of this tolerance to zero although the conditioning event involves the true value of the parameter. This makes me wonder at the relevance of the result. The part about partial posteriors and the characterisation of limiting posterior distributions stats with the natural remark that the mean of the summary statistic must identify the whole parameter θ to achieve consistency, a point central to our 2014 JRSS B paper. The authors suggest using a support vector machine to derive the summary statistics, an idea already exploited by Heiko Strathmann et al.. There is no consistency result of relevance for ABC in that second and final part, which ends up rather abruptly. Overall, while the paper contributes to the current reflection on the convergence properties of ABC, the lack of scaling of the tolerance ε calls for further investigations.
[Disclaimer: I am not involved in handling this paper as an AE or as a referee for the Annals of Statistics!]
Last week, Vakilzadeh, Beck and Abrahamsson arXived a paper entitled “Using Approximate Bayesian Computation by Subset Simulation for Efficient Posterior Assessment of Dynamic State-Space Model Classes”. It follows an earlier paper by Beck and co-authors on ABC by subset simulation, paper that I did not read. The model of interest is a hidden Markov model with continuous components and covariates (input), e.g. a stochastic volatility model. There is however a catch in the definition of the model, namely that the observable part of the HMM includes an extra measurement error term linked with the tolerance level of the ABC algorithm. Error term that is dependent across time, the vector of errors being within a ball of radius ε. This reminds me of noisy ABC, obviously (and as acknowledged by the authors), but also of some ABC developments of Ajay Jasra and co-authors. Indeed, as in those papers, Vakilzadeh et al. use the raw data sequence to compute their tolerance neighbourhoods, which obviously bypasses the selection of a summary statistic [vector] but also may drown signal under noise for long enough series.
“In this study, we show that formulating a dynamical system as a general hierarchical state-space model enables us to independently estimate the model evidence for each model class.”
Subset simulation is a nested technique that produces a sequence of nested balls (and related tolerances) such that the conditional probability to be in the next ball given the previous one remains large enough. Requiring a new round of simulation each time. This is somewhat reminding me of nested sampling, even though the two methods differ. For subset simulation, estimating the level probabilities means that there also exists a converging (and even unbiased!) estimator for the evidence associated with different tolerance levels. Which is not a particularly natural object unless one wants to turn it into a tolerance selection principle, which would be quite a novel perspective. But not one adopted in the paper, seemingly. Given that the application section truly compares models I must have missed something there. (Blame the long flight from San Francisco to Sydney!) Interestingly, the different models as in Table 4 relate to different tolerance levels, which may be an hindrance for the overall validation of the method.
I find the subsequent part on getting rid of uncertain prediction error model parameters of lesser [personal] interest as it essentially replaces the marginal posterior on the parameters of interest by a BIC approximation, with the unsurprising conclusion that “the prior distribution of the nuisance parameter cancels out”.