Archive for the Books Category

Statistics & Computing [toc]

Posted in Books, Statistics with tags , , , , on June 29, 2016 by xi'an

The latest [June] issue of Statistics & Computing is full of interesting Bayesian and Monte Carlo entries, some of which are even open access!

 

control functionals for Monte Carlo integration

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , on June 28, 2016 by xi'an

img_2451A paper on control variates by Chris Oates, Mark Girolami (Warwick) and Nicolas Chopin (CREST) appeared in a recent issue of Series B. I had read and discussed the paper with them previously and the following is a set of comments I wrote at some stage, to be taken with enough gains of salt since Chris, Mark and Nicolas answered them either orally or in the paper. Note also that I already discussed an earlier version, with comments that are not necessarily coherent with the following ones! [Thanks to the busy softshop this week, I resorted to publish some older drafts, so mileage can vary in the coming days.]

First, it took me quite a while to get over the paper, mostly because I have never worked with reproducible kernel Hilbert spaces (RKHS) before. I looked at some proofs in the appendix and at the whole paper but could not spot anything amiss. It is obviously a major step to uncover a manageable method with a rate that is lower than √n. When I set my PhD student Anne Philippe on the approach via Riemann sums, we were quickly hindered by the dimension issue and could not find a way out. In the first versions of the nested sampling approach, John Skilling had also thought he could get higher convergence rates before realising the Monte Carlo error had not disappeared and hence was keeping the rate at the same √n speed.

The core proof in the paper leading to the 7/12 convergence rate relies on a mathematical result of Sun and Wu (2009) that a certain rate of regularisation of the function of interest leads to an average variance of order 1/6. I have no reason to mistrust the result (and anyway did not check the original paper), but I am still puzzled by the fact that it almost immediately leads to the control variate estimator having a smaller order variance (or at least variability). On average or in probability. (I am also uncertain on the possibility to interpret the boxplot figures as establishing super-√n speed.)

Another thing I cannot truly grasp is how the control functional estimator of (7) can be both a mere linear recombination of individual unbiased estimators of the target expectation and an improvement in the variance rate. I acknowledge that the coefficients of the matrices are functions of the sample simulated from the target density but still…

Another source of inner puzzlement is the choice of the kernel in the paper, which seems too simple to be able to cover all problems despite being used in every illustration there. I see the kernel as centred at zero, which means a central location must be know, decreasing to zero away from this centre, so possibly missing aspects of the integrand that are too far away, and isotonic in the reference norm, which also seems to preclude some settings where the integrand is not that compatible with the geometry.

I am equally nonplussed by the existence of a deterministic bound on the error, although it is not completely deterministic, depending on the values of the reproducible kernel at the points of the sample. Does it imply anything restrictive on the function to be integrated?

A side remark about the use of intractable in the paper is that, given the development of a whole new branch of computational statistics handling likelihoods that cannot be computed at all, intractable should possibly be reserved for such higher complexity models.

another wrong entry

Posted in Books, Kids, R, Statistics, University life with tags , , , , , , on June 27, 2016 by xi'an

Quite a coincidence! I just came across another bug in Lynch’s (2007) book, Introduction to Applied Bayesian Statistics and Estimation for Social Scientists. Already discussed here and on X validated. While working with one participant to the post-ISBA softshop, we were looking for efficient approaches to simulating correlation matrices and came [by Google] across the above R code associated with a 3×3 correlation matrix, which misses the additional constraint that the determinant must be positive. As shown e.g. by the example

> eigen(matrix(c(1,-.8,.7,-.8,1,.6,.7,.6,1),ncol=3))
$values
[1] 1.8169834 1.5861960 -0.4031794

having all correlations between -1 and 1 is not enough. Just. Not. Enough.

art brut

Posted in Books, pictures with tags , , , , on June 25, 2016 by xi'an

another riddle

Posted in Books, Kids with tags , , , on June 22, 2016 by xi'an

A riddle on The Riddler that pertains to game theory [and does not truly require coding]:

Ten pirates have ten gold pieces to divide and each have a unique rank, from the captain on down. The captain puts forth the first plan to divide up the gold, whereupon everyone votes. If at least half the pirates vote for the plan, it is enacted, and the gold is distributed accordingly. If the plan gets fewer than half the votes, however, the captain is killed, the second-in-command is promoted, and the process starts over.

Pirates always vote by the following rules, with the earliest rule taking precedence in a conflict:

  1. Self-preservation: A pirate values his life above all else.
  2. Greed: A pirate seeks as much gold as possible.
  3. Bloodthirst: Failing a threat to his life or bounty, a pirate always votes to kill.

Under this system, how do the pirate divide up their gold?

An obviously final configuration is when there are only two pirates left since whatever division the first pirate proposes, including the one where he gets all the coins, the second (and last) one cannot oppose. Hence the last pirate must avoid reaching a two person configuration. On the opposite the one-to-last pirate should aim at this solution, but he is the only one! This means that for any configuration but one the last and second to last pirates will outvote the one to last. Meaning that the optimum for the second to one pirate is a partition of (9,0,1). Moving to the general case, when there remains 2p crew members, the pirate in command must secure a positive vote from  (p-1) other pirates below him, which means those (p-1) pirates must get one coin more than what they would get on the next round. This amounts to picking the (p-1) pirates with no reward on the next round and giving them one coin each, leading to a reward of 11-p for the current leader. When there remains 2p-1 crew members, the optimal solution is identical, namely to award one coin each for the (p-1) pirates with no reward on the next round. With a total of 10 pirates, the optimal configuration for the captain is to separate the 10 coins into (6,0,1,0,1,0,1,0,1,0).

ISBA 2016 [#2]

Posted in Books, pictures, Running, Statistics, Travel, University life, Wines with tags , , , , , , , , , , on June 15, 2016 by xi'an

Today I attended Persi Diaconis’ de Finetti’s ISBA Lecture and not only because I was an invited discussant, by all means!!! Persi was discussing his views on Bayesian numerical analysis. As already expressed in his 1988 paper. Which now appears as a foundational precursor to probabilistic numerics. And which is why I had a very easy time in preparing my discussion as I mostly borrowed from my NIPS slides. With some degree of legitimacy since I was already a discussant there. Anyway,  here is the most novel slide in the discussion, built upon my realisation that the principle behind nested sampling is fairly generic for integral approximation, rather than being restricted to marginal likelihood approximation.

persidiscussionAmong many interesting things, Persi’s talk made me think anew about infinite variance importance sampling. And about the paper by Souraj Chatterjee and Persi that I discussed a few months ago. In that some regularisation of those “useless” importance estimates can stem from prior modelling. Not as an aside, let me add I am very grateful to the ISBA 2016 organisers and to the chair of the de Finetti lecture committee for their invitation to discuss this talk!

Terry Tao on Bayes… and Trump

Posted in Books, Kids, Statistics, University life with tags , , , , , , , on June 13, 2016 by xi'an

“From the perspective of Bayesian probability, the grade given to a student can then be viewed as a measurement (in logarithmic scale) of how much the posterior probability that the student’s model was correct has improved over the prior probability.” T. Tao, what’s new, June 1

Jean-Michel Marin pointed out to me the recent post of Terry Tao on setting a subjective prior for allocating partial credits to multiple answer questions. (Although I would argue that the main purpose of multiple answer questions is to expedite grading!) The post considers only true-false questionnaires and the case when the student produces a probabilistic assessment of her confidence in the answer. In the format of a probability p for each question. The goal is then to devise a grading principle, f, such that f(p) goes to the right answer and f(1-p) to the wrong answer. This sounds very much like scoring weather forecasters and hence designing proper scoring rules. Which reminds me of the first time I heard a talk about this: it was in Purdue, circa 1988, and Morrie DeGroot gave a talk on scoring forecasters, based on a joint paper he had written with Susie Bayarri. The scoring rule is proper if the expected reward leads to pick p=q when p is the answer given by the student and q her true belief. Terry Tao reaches the well-known conclusion that the grading function f should be f(p)=log²(2p) where log² denotes the base 2 logarithm. One property I was unaware of is that the total expected score writes as N+log²(L) where L is the likelihood associated with the student’s subjective model. (This is the only true Bayesian aspect of the problem.)

An interesting and more Bayesian last question from Terry Tao is about what to do when the probabilities themselves are uncertain. More Bayesian because this is where I would introduce a prior model on this uncertainty, in a hierarchical fashion, in order to estimate the true probabilities. (A non-informative prior makes its way into the comments.) Of course, all this leads to a lot of work given the first incentive of asking multiple choice questions…

One may wonder at the link with scary Donald and there is none! But the next post by Terry Tao is entitled “It ought to be common knowledge that Donald Trump is not fit for the presidency of the United States of America”. And unsurprisingly, as an opinion post, it attracted a large number of non-mathematical comments.

Follow

Get every new post delivered to your Inbox.

Join 1,068 other followers