Just mentioning that a second version of our paper has been arXived and submitted to JMLR, the main input being the inclusion of a reference to the abcrf package. And just repeating our best selling arguments that (i) forests do not require a preliminary selection of the summary statistics, since an arbitrary number of summaries can be used as input for the random forest, even when including a large number of useless white noise variables; (b) there is no longer a tolerance level involved in the process, since the many trees in the random forest define a natural if rudimentary distance that corresponds to being or not being in the same leaf as the observed vector of summary statistics η(y); (c) the size of the reference table simulated from the prior (predictive) distribution does not need to be as large as for in usual ABC settings and hence this approach leads to significant gains in computing time since the production of the reference table usually is the costly part! To the point that deriving a different forest for each univariate transform of interest is truly a minor drag in the overall computing cost of the approach.
A very interesting issue of Nature I read this morning while having breakfast. A post-brexit read of a pre-brexit issue. Apart from the several articles arguing against Brexit and its dire consequences on British science [but preaching to the converted for which percentage of the Brexit voters does read Nature?!], a short vignette on the differences between fields for the average time spent for refereeing a paper (maths takes twice as long as social sciences and academics older than 65 half the time of researchers under 36!). A letter calling for action against predatory publishers. And the first maths paper published since I started reading Nature on an almost-regular basis: it studies mean first-passage time for non-Markov random walks. Which are specified as time-homogeneous increments. It is sort of a weird maths paper in that I do not see where the maths novelty stands and why the paper only contains half a dozen formulas… Maybe not a maths paper after all.
The latest [June] issue of Statistics & Computing is full of interesting Bayesian and Monte Carlo entries, some of which are even open access!
As my daughter is working at a McDonald’s close to Paris-Dauphine [as a summer job], I did a neighbour visit two days ago and had a salad there! While there was nothing exciting about the salad, it was my first meal at McDonald’s for at least twenty-five years (although I may have had an occasional tea there in the meanwhile) and there was nothing wrong either. Judging solely from my daughter’s (limited) experience, I am actually impressed by the degree of Taylorism in the preparation and handling of food and the management of staff. Not that I am contemplating getting back to this chain in the next twenty years, for the food served there remains junk food, but the industrial size of the company means that health and safety regulations and labour laws are more likely to be respected there than in a small local restaurant. Again judging solely from my daughter’s experience.
In an apt contrast, we went to celebrate her admission to the medical school last weekend and picked a bento restaurant in Le Marais that had good press. And was open on a Sunday evening. The place is called Nanashis and looks like an immense railways dinning hall. Somewhat noisy but ultimately not unpleasant. And very good if pricey soba cold noodles. (Just avoid the wine. And possibly the deserts since our homemade matcha cake can compete with theirs!)
A paper on control variates by Chris Oates, Mark Girolami (Warwick) and Nicolas Chopin (CREST) appeared in a recent issue of Series B. I had read and discussed the paper with them previously and the following is a set of comments I wrote at some stage, to be taken with enough gains of salt since Chris, Mark and Nicolas answered them either orally or in the paper. Note also that I already discussed an earlier version, with comments that are not necessarily coherent with the following ones! [Thanks to the busy softshop this week, I resorted to publish some older drafts, so mileage can vary in the coming days.]
First, it took me quite a while to get over the paper, mostly because I have never worked with reproducible kernel Hilbert spaces (RKHS) before. I looked at some proofs in the appendix and at the whole paper but could not spot anything amiss. It is obviously a major step to uncover a manageable method with a rate that is lower than √n. When I set my PhD student Anne Philippe on the approach via Riemann sums, we were quickly hindered by the dimension issue and could not find a way out. In the first versions of the nested sampling approach, John Skilling had also thought he could get higher convergence rates before realising the Monte Carlo error had not disappeared and hence was keeping the rate at the same √n speed.
The core proof in the paper leading to the 7/12 convergence rate relies on a mathematical result of Sun and Wu (2009) that a certain rate of regularisation of the function of interest leads to an average variance of order 1/6. I have no reason to mistrust the result (and anyway did not check the original paper), but I am still puzzled by the fact that it almost immediately leads to the control variate estimator having a smaller order variance (or at least variability). On average or in probability. (I am also uncertain on the possibility to interpret the boxplot figures as establishing super-√n speed.)
Another thing I cannot truly grasp is how the control functional estimator of (7) can be both a mere linear recombination of individual unbiased estimators of the target expectation and an improvement in the variance rate. I acknowledge that the coefficients of the matrices are functions of the sample simulated from the target density but still…
Another source of inner puzzlement is the choice of the kernel in the paper, which seems too simple to be able to cover all problems despite being used in every illustration there. I see the kernel as centred at zero, which means a central location must be know, decreasing to zero away from this centre, so possibly missing aspects of the integrand that are too far away, and isotonic in the reference norm, which also seems to preclude some settings where the integrand is not that compatible with the geometry.
I am equally nonplussed by the existence of a deterministic bound on the error, although it is not completely deterministic, depending on the values of the reproducible kernel at the points of the sample. Does it imply anything restrictive on the function to be integrated?
A side remark about the use of intractable in the paper is that, given the development of a whole new branch of computational statistics handling likelihoods that cannot be computed at all, intractable should possibly be reserved for such higher complexity models.
Quite a coincidence! I just came across another bug in Lynch’s (2007) book, Introduction to Applied Bayesian Statistics and Estimation for Social Scientists. Already discussed here and on X validated. While working with one participant to the post-ISBA softshop, we were looking for efficient approaches to simulating correlation matrices and came [by Google] across the above R code associated with a 3×3 correlation matrix, which misses the additional constraint that the determinant must be positive. As shown e.g. by the example
> eigen(matrix(c(1,-.8,.7,-.8,1,.6,.7,.6,1),ncol=3)) $values  1.8169834 1.5861960 -0.4031794
having all correlations between -1 and 1 is not enough. Just. Not. Enough.
While I have no idea of how the results of the Brexit referendum of last Thursday will be interpreted, I am definitely worried by the possibility (and consequences) of an exit and wonder why those results should inevitably lead to Britain leaving the EU. Indeed, referenda are not legally binding in the UK and Parliament could choose to ignore the majority opinion expressed by this vote. For instance, because of the negative consequences of a withdrawal. Or because the differential is too little to justify such a dramatic change. In this, it relates to hypothesis testing in that only an overwhelming score can lead to the rejection of a natural null hypothesis corresponding to the status quo, rather than the posterior probability being above a mere ½. Which is the decision associated with a 0-1 loss function. Of course, the analogy can be attacked from many sides, from a denial of democracy (simple majority being determined by a single extra vote) to a lack of randomness in the outcome of the referendum (since everyone in the population is supposed to have voted). But I still see some value in requiring major societal changes to be backed by more than a simple majority. All this musing is presumably wishful thinking since every side seems eager to move further (away from one another), but it would great if it could take place.