Archive for bootstrap

Nonparametric applications of Bayesian inference

Posted in Books, Statistics, University life with tags , , , , , , on April 22, 2016 by xi'an

Gary Chamberlain and Guido Imbens published this paper in the Journal of Business & Economic Statistics in 2003. I just came to read it in connection with the paper by Luke Bornn, Niel Shephard and Reza Solgi that I commented a few months ago. The setting is somewhat similar: given a finite support distribution with associated probability parameter θ, a natural prior on θ is a Dirichlet prior. This prior induces a prior on transforms of θ, whether or not they are in close form (for instance as the solution of a moment equation E[F(X,β)]=0. As in Bornn et al. In this paper, Chamberlain and Imbens argue in favour of the limiting Dirichlet with all coefficients equal to zero as a way to avoid prior dominating influence when the number of classes J goes to infinity and the data size remains fixed. But they fail to address the issue that the posterior is no longer defined since some classes get unobserved. They consider instead that the parameters corresponding to those classes are equal to zero with probability one, a convention and not a result. (The computational advantage in using the improper prior sounds at best incremental.) The notion of letting some Dirichlet hyper-parameters going to zero is somewhat foreign to a Bayesian perspective as those quantities should be either fixed or distributed according to an hyper-prior, rather than set to converge according to a certain topology that has nothing to do with prior modelling. (Another reason why setting those quantities to zero does not have the same meaning as picking a Dirac mass at zero.)

“To allow for the possibility of an improper posterior distribution…” (p.4)

This is a weird beginning of a sentence, especially when followed by a concept of expected posterior distribution, which is actually a bootstrap expectation. Not as in Bayesian bootstrap, mind. And thus this feels quite orthogonal to the Bayesian approach. I do however find most interesting this notion of constructing a true expected posterior by imposing samples that ensure properness as it reminds me of our approach to mixtures with Jean Diebolt, where (latent) allocations were prohibited to induce improper priors. The bootstrapped posterior distribution seems to be proposed mostly for assessing the impact of the prior modelling, albeit in an non-quantitative manner. (I fail to understand how the very small bootstrap sample sizes are chosen.)

Obviously, there is a massive difference between this paper and Bornn et al, where the authors use two competing priors in parallel, one on θ and one on β, which induces difficulties in setting priors since the parameter space is concentrated upon a manifold. (In which case I wonder what would happen if one implemented the preposterior idea of Berger and Pérez, 2002, to derive a fixed point solution. That we implemented recently with Diego Salmerón and Juan Antonio Caño in a paper published in Statistica Sinica.. This exhibits a similarity with the above bootstrap proposal in that the posterior gets averaged wrt another posterior.)

Peter Hall (1951-2016)

Posted in Books, Statistics, Travel, University life with tags , , , , , , , , , , , , , on January 10, 2016 by xi'an

I just heard that Peter Hall passed away yesterday in Melbourne. Very sad news from down under. Besides being a giant in the fields of statistics and probability, with an astounding publication record, Peter was also a wonderful man and so very much involved in running local, national and international societies. His contributions to the field and the profession are innumerable and his loss impacts the entire community. Peter was a regular visitor at Glasgow University in the 1990s and I crossed paths with  him a few times, appreciating his kindness as well as his highest dedication to research. In addition, he was a gifted photographer and I recall that the [now closed] wonderful guest-house where we used to stay at the top of Hillhead had a few pictures of his taken in the Highlands and framed on its walls. (If I remember well, there were also beautiful pictures of the Belgian countryside by him at CORE, in Louvain-la-Neuve.) I think the last time we met was in Melbourne, three years ago… Farewell, Peter, you certainly left an indelible print on a lot of us.

[Song Chen from Beijing University has created a memorial webpage for Peter Hall to express condolences and share memories.]

MCMskv #3 [town with a view]

Posted in Statistics with tags , , , , , , , , , , , , , on January 8, 2016 by xi'an

Third day at MCMskv, where I took advantage of the gap left by the elimination of the Tweedie Race [second time in a row!] to complete and submit our mixture paper. Despite the nice weather. The rest of the day was quite busy with David Dunson giving a plenary talk on various approaches to approximate MCMC solutions, with a broad overview of the potential methods and of the need for better solutions. (On a personal basis, great line from David: “five minutes or four minutes?”. It almost beat David’s question on the previous day, about the weight of a finch that sounded suspiciously close to the question about the air-speed velocity of an unladen swallow. I was quite surprised the speaker did not reply with the Arthurian “An African or an European finch?”) In particular, I appreciated the notion that some problems were calling for a reduction in the number of parameters, rather than the number of observations. At which point I wrote down “multiscale approximations required” in my black pad,  a requirement David made a few minutes later. (The talk conditions were also much better than during Michael’s talk, in that the man standing between the screen and myself was David rather than the cameraman! Joke apart, it did not really prevent me from reading them, except for most of the jokes in small prints!)

The first session of the morning involved a talk by Marc Suchard, who used continued fractions to find a closed form likelihood for the SIR epidemiology model (I love continued fractions!), and a talk by Donatello Telesca who studied non-local priors to build a regression tree. While I am somewhat skeptical about non-local testing priors, I found this approach to the construction of a tree quite interesting! In the afternoon, I obviously went to the intractable likelihood session, with talks by Chris Oates on a control variate method for doubly intractable models, Brenda Vo on mixing sequential ABC with Bayesian bootstrap, and Gael Martin on our consistency paper. I was not aware of the Bayesian bootstrap proposal and need to read through the paper, as I fail to see the appeal of the bootstrap part! I later attended a session on exact Monte Carlo methods that was pleasantly homogeneous. With talks by Paul Jenkins (Warwick) on the exact simulation of the Wright-Fisher diffusion, Anthony Lee (Warwick) on designing perfect samplers for chains with atoms, Chang-han Rhee and Sebastian Vollmer on extensions of the Glynn-Rhee debiasing technique I previously discussed on the blog. (Once again, I regretted having to make a choice between the parallel sessions!)

The poster session (after a quick home-made pasta dish with an exceptional Valpolicella!) was almost universally great and with just the right number of posters to go around all of them in the allotted time. With in particular the Breaking News! posters of Giacomo Zanella (Warwick), Beka Steorts and Alexander Terenin. A high quality session that made me regret not touring the previous one due to my own poster presentation.

bootstrap(ed) likelihood for ABC

Posted in pictures, Statistics with tags , , , , , , , , on November 6, 2015 by xi'an

AmstabcThis recently arXived paper by Weixuan Zhu , Juan Miguel Marín, and Fabrizio Leisen proposes an alternative to our empirical likelihood ABC paper of 2013, or BCel. Besides the mostly personal appeal for me to report on a Juan Miguel Marín working [in Madrid] on ABC topics, along my friend Jean-Michel Marin!, this paper is another entry on ABC that connects with yet another statistical perspective, namely bootstrap. The proposal, called BCbl, is based on a reference paper by Davison, Hinkley and Worton (1992) which defines a bootstrap likelihood, a notion that relies on a double-bootstrap step to produce a non-parametric estimate of the distribution of a given estimator of the parameter θ. This estimate includes a smooth curve-fitting algorithm step, for which little description is available from the current paper. The bootstrap non-parametric substitute then plays the role of the actual likelihood, with no correction for the substitution just as in our BCel. Both approaches are convergent, with Monte Carlo simulations exhibiting similar or even identical convergence speeds although [unsurprisingly!] no deep theory is available on the comparative advantage.

An important issue from my perspective is that, while the empirical likelihood approach relies on a choice of identifying constraints that strongly impact the numerical value of the likelihood approximation, the bootstrap version starts directly from a subjectively chosen estimator of θ, which may also impact the numerical value of the likelihood approximation. In some ABC settings, finding a primary estimator of θ may be a real issue or a computational burden. Except when using a preliminary ABC step as in semi-automatic ABC. This would be an interesting crash-test for the BCbl proposal! (This would not necessarily increase the computational cost by a large amount.) In addition, I am not sure the method easily extends to larger collections of summary statistics as those used in ABC, in particular because it necessarily relies on non-parametric estimates, only operating in small enough dimensions where smooth curve-fitting algorithms can be used. Critically, the paper only processes examples with a few parameters.

The comparisons between BCel and BCbl that are produced in the paper show some gain towards BCbl. Obviously, it depends on the respective calibrations of the non-parametric methods and of regular ABC, as well as on the available computing time. I find the population genetic example somewhat puzzling: The paper refers to our composite likelihood to set the moment equations. Since this is a pseudo-likelihood, I wonder how the authors do select their parameter estimates in the double-bootstrap experiment. And for the Ising model, it is not straightforward to conceive of a bootstrap algorithm on an Ising model: (a) how does one subsample pixels and (b) what are the validity guarantees for the estimation procedure.

Mathematical underpinnings of Analytics (theory and applications)

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , , on September 25, 2015 by xi'an

“Today, a week or two spent reading Jaynes’ book can be a life-changing experience.” (p.8)

I received this book by Peter Grindrod, Mathematical underpinnings of Analytics (theory and applications), from Oxford University Press, quite a while ago. (Not that long ago since the book got published in 2015.) As a book for review for CHANCE. And let it sit on my desk and in my travel bag for the same while as it was unclear to me that it was connected with Statistics and CHANCE. What is [are?!] analytics?! I did not find much of a definition of analytics when I at last opened the book, and even less mentions of statistics or machine-learning, but Wikipedia told me the following:

“Analytics is a multidimensional discipline. There is extensive use of mathematics and statistics, the use of descriptive techniques and predictive models to gain valuable knowledge from data—data analysis. The insights from data are used to recommend action or to guide decision making rooted in business context. Thus, analytics is not so much concerned with individual analyses or analysis steps, but with the entire methodology.”

Barring the absurdity of speaking of a “multidimensional discipline” [and even worse of linking with the mathematical notion of dimension!], this tells me analytics is a mix of data analysis and decision making. Hence relying on (some) statistics. Fine.

“Perhaps in ten years, time, the mathematics of behavioural analytics will be common place: every mathematics department will be doing some of it.”(p.10)

First, and to start with some positive words (!), a book that quotes both Friedrich Nietzsche and Patti Smith cannot get everything wrong! (Of course, including a most likely apocryphal quote from the now late Yogi Berra does not partake from this category!) Second, from a general perspective, I feel the book meanders its way through chapters towards a higher level of statistical consciousness, from graphs to clustering, to hidden Markov models, without precisely mentioning statistics or statistical model, while insisting very much upon Bayesian procedures and Bayesian thinking. Overall, I can relate to most items mentioned in Peter Grindrod’s book, but mostly by first reconstructing the notions behind. While I personally appreciate the distanced and often ironic tone of the book, reflecting upon the author’s experience in retail modelling, I am thus wondering at which audience Mathematical underpinnings of Analytics aims, for a practitioner would have a hard time jumping the gap between the concepts exposed therein and one’s practice, while a theoretician would require more formal and deeper entries on the topics broached by the book. I just doubt this entry will be enough to lead maths departments to adopt behavioural analytics as part of their curriculum… Continue reading

evolution of correlations [award paper]

Posted in Books, pictures, Statistics, University life with tags , , , , , on September 15, 2015 by xi'an

“Many researchers might have observed that the magnitude of a correlation is pretty unstable in small samples.”

On the statsblog aggregator, I spotted an entry that eventually led me to this post about the best paper award for the evolution of correlation, a paper published in the Journal of Research in Personality. A journal not particularly well-known for its statistical methodology input. The main message of the paper is that, while the empirical correlation is highly varying for small n’s, an interval (or corridor of stability!) can be constructed so that a Z-transform of the correlation does not vary away from the true value by more than a chosen quantity like 0.1. And the point of stability is then defined as the sample size after which the trajectory of the estimate does not leave the corridor… Both corridor and point depending on the true and unknown value of the correlation parameter by the by. Which implies resorting to bootstrap to assess the distribution of this point of stability. And deduce quantiles that can be used for… For what exactly?! Setting the necessary sample size? But this requires a preliminary run to assess the possible value of the true correlation ρ. The paper concludes that “for typical research scenarios reasonable trade-offs between accuracy and confidence start to be achieved when n approaches 250”. This figure was achieved by a bootstrap study on a bivariate Gaussian population with 10⁶ datapoints, yes indeed 10⁶!, and bootstrap samples of maximal size 10³. All in all, while I am at a loss as to why the Journal of Research in Personality would promote the estimation of a correlation coefficient with 250 datapoints, there is nothing fundamentally wrong with the paper (!), except for this recommendation of the 250 datapoint, as the figure stems from a specific setting with particular calibrations and cannot be expected to apply in every and all cases.

bespprActually, the graph in the paper was the thing that first attracted my attention because it looks very much like the bootstrap examples I show my third year students to demonstrate the appeal of bootstrap. Which is not particularly useful in the current case. A quick simulation on 100 samples of size 300 showed [above] that Monte Carlo simulations produce a tighter confidence band than the one created by bootstrap, in the Gaussian case. Continue reading

Conditional love [guest post]

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , , on August 4, 2015 by xi'an

[When Dan Simpson told me he was reading Terenin’s and Draper’s latest arXival in a nice Bath pub—and not a nice bath tub!—, I asked him for a blog entry and he agreed. Here is his piece, read at your own risk! If you remember to skip the part about Céline Dion, you should enjoy it very much!!!]

Probability has traditionally been described, as per Kolmogorov and his ardent follower Katy Perry, unconditionally. This is, of course, excellent for those of us who really like measure theory, as the maths is identical. Unfortunately mathematical convenience is not necessarily enough and a large part of the applied statistical community is working with Bayesian methods. These are unavoidably conditional and, as such, it is natural to ask if there is a fundamentally conditional basis for probability.

Bruno de Finetti—and later Richard Cox and Edwin Jaynes—considered conditional bases for Bayesian probability that are, unfortunately, incomplete. The critical problem is that they mainly consider finite state spaces and construct finitely additive systems of conditional probability. For a variety of reasons, neither of these restrictions hold much truck in the modern world of statistics.

In a recently arXiv’d paper, Alexander Terenin and David Draper devise a set of axioms that make the Cox-Jaynes system of conditional probability rigorous. Furthermore, they show that the complete set of Kolmogorov axioms (including countable additivity) can be derived as theorems from their axioms by conditioning on the entire sample space.

This is a deep and fundamental paper, which unfortunately means that I most probably do not grasp it’s complexities (especially as, for some reason, I keep reading it in pubs!). However I’m going to have a shot at having some thoughts on it, because I feel like it’s the sort of paper one should have thoughts on. Continue reading

Follow

Get every new post delivered to your Inbox.

Join 1,029 other followers