Archive for central limit theorem

Paret’oothed importance sampling and infinite variance [guest post]

Posted in Kids, pictures, R, Statistics, University life with tags , , , , , , on November 17, 2015 by xi'an

IS_vs_PSIS_k09[Here are some comments sent to me by Aki Vehtari in the sequel of the previous posts.]

The following is mostly based on our arXived paper with Andrew Gelman and the references mentioned  there.

Koopman, Shephard, and Creal (2009) proposed to make a sample based estimate of the existence of the moments using generalized Pareto distribution fitted to the tail of the weight distribution. The number of existing moments is less than 1/k (when k>0), where k is the shape parameter of generalized Pareto distribution.

When k<1/2, the variance exists and the central limit theorem holds. Chen and Shao (2004) show further that the rate of convergence to normality is faster when higher moments exist. When 1/2≤k<1, the variance does not exist (but mean exists), the generalized central limit theorem holds, and we may assume the rate of convergence is faster when k is closer to 1/2.

In the example with “Exp(1) proposal for an Exp(1/2) target”, k=1/2 and we are truly on the border. IS_vs_PSIS_k05

In our experiments in the arXived paper and in Vehtari, Gelman, and Gabry (2015), we have observed that Pareto smoothed importance sampling (PSIS) usually converges well also with k>1/2 but k close to 1/2 (let’s say k<0.7). But if k<1 and k is close to 1 (let’s say k>0.7) the convergence is much worse and both naïve importance sampling and PSIS are unreliable.

Two figures are attached, which show the results comparing IS and PSIS in the Exp(1/2) and Exp(1/10) examples. The results were computed with repeating 1000 times a simulation with 10000 samples in each. We can see the bad performance of IS in both examples as you also illustrated. In Exp(1/2) case, PSIS is also to produce much more stable results. In Exp(1/10) case, PSIS is able to reduce the variance of the estimate, but it is not enough to avoid a big bias.

It would be interesting to have more theoretical justification why infinite variance is not so big problem if k is close to 1/2 (e.g. how the convergence rate is related to the amount of fractional moments).

I guess that max ω[t] / ∑ ω[t] in Chaterjee and Diaconis has some connection to the tail shape parameter of the generalized Pareto distribution, but it is likely to be much noisier as it depends on the maximum value instead of a larger number of tail samples as in the approach by Koopman, Shephard, and Creal (2009).IS_vs_PSIS_exp19A third figure shows an example where the variance is finite, with “an Exp(1) proposal for an Exp(1/1.9) target”, which corresponds to k≈0.475 < 1/2. Although the variance is finite, we are close to the border and the performance of basic IS is bad. There is no sharp change in the practical behaviour with a finite number of draws when going from finite variance to infinite variance. Thus, I think it is not enough to focus on the discrete number of moments, but for example, the Pareto shape parameter k gives us more information. Koopman, Shephard, and Creal (2009) also estimated the Pareto shape k, but they formed a hypothesis test whether the variance is finite and thus discretising the information in k, and assuming that finite variance is enough to get good performance.


Posted in Kids, pictures, University life with tags , , , , on December 21, 2014 by xi'an


Abraham De Moivre

Posted in Books, Statistics, Travel, University life with tags , , , , , , , , on March 7, 2012 by xi'an

During my week in Roma, I read David Bellhouse’s book on Abraham De Moivre (at night and in the local transportations and even in Via del Corso waiting for my daughter!)… This is a very scholarly piece of work, with many references to original documents, and it may not completely appeal to the general audience: The Baroque Cycle by Neal Stephenson is covering the same period and the rise of the “scientific man” (or Natural Philosopher) in a much more novelised manner, while centering on Newton as its main character and on the earlier Newton-Leibniz dispute, rather than the later Newton-(De Moivre)-Bernoulli dispute. (De Moivre does not appear in the books, at least under his name.)

Bellhouse’s book should however fascinate most academics in that, beside going with the uttermost detail into De Moivre’s contributions to probability, it uncovers the way (mathematical) research was done in the 17th and 18th century England: De Moivre never got an academic position (although he applied for several ones, incl. in Cambridge), in part because he was an emigrated French huguenot (after the revocation of the Édit de Nantes by Louis XIV), and he got a living by tutoring gentry and aristocracy sons in mathematics and accounting. He also was a consultant on annuities. His interactions with other mathematicians of the time was done through coffee-houses, the newly founded Royal Society, and letters. De Moivre published most of his work in the Philosophical Transactions and in self-edited books that he financed by subscriptions. (As a Frenchman, I personally[and so did Jacob Bernoulli!] found puzzling the fact that De Moivre never wrote anything in french but assimilated very quickly into English society.)

Another fascinating aspect of the book is the way English (incl. De Moivre) and Continental mathematicians fought and bickered on the priority of discoveries. Because their papers were rarely and slowly published, and even more slowly distributed throughout Western Europe, they had to rely on private letters for those priority claims. De Moivre’s main achievement is his book, The Doctrine of Chances, which contains among clever binomial derivations on various chance games an occurrence of the central limit theorem for binomial experiments. And the use of generating functions. De Moivre had a suprisingly long life since he died at 87 in London, still giving private lessons as old as 72. Besides being seen as a leading English mathematician, he eventually got recognised by the French Académie Royale des Sciences, if as a foreign member, a few months prior to his death (as well as by the Berlin Academy of Sciences). There is also a small section in the book on the connections between De Moivre and Thomas Bayes (pp. 200-203), although very little is known of their personal interactions. Bayes was close to one of De Moivre’s former students, Phillip Stanhope, and he worked on several of De Moivre’s papers to get entry to the Royal Society. Some open question is whether or not Bayes was ever tutored by De Moivre, although there is no material proof he did. The book also mentions Bayes’ theorem in connection with an comment on The Doctrine of Chances by Hartley (p.191), as if De Moivre had an hand in it or at least a knowledge of it, but this seems unlikely…

In conclusion, this is a highly pleasant and easily readable book on the career of a major mathematician and of one of the founding fathers of probability theory. David Bellhouse is to be congratulated on the scholarship exhibited by this book and on the painstaking pursuit of all historical documents related with De Moivre’s life.

workshop in Columbia [talk]

Posted in Statistics, Travel, University life with tags , , , , , , , , , on September 25, 2011 by xi'an

Here are the slides of my talk yesterday at the Computational Methods in Applied Sciences workshop in Columbia:

The last section of the talk covers our new results with Jean-Michel Marin, Natesh Pillai and Judith Rousseau on the necessary and sufficient conditions for a summary statistic to be used in ABC model choice. (The paper is about to be completed.) This obviously comes as the continuation of our reflexions on  ABC model choice started last January. The major message of the paper is that the statistics used for running model choice cannot have a mean value common to both models, which strongly implies using ancillary statistics with different means under each model. (I am afraid that, thanks to the mixture of no-jetlag fatigue and of slide inflation [95 vs. 40mn] and of asymptotics technicalities in the last part, the talk was far from comprehensible. I started on the wrong foot with not getting an XL [Xiao-Li’s] comment on the measure-theory problem with the limit in ε going to zero. A peak given that great debate we had in Banff with Jean-Michel, David Balding, and Mark Beaumont, years ago. And our more recent paper about the arbitrariness of the density value in the Savage-Dickey paradox. I then compounded the confusion by stating the empirical mean was sufficient in the Laplace case…which is not even an exponential family. I hope I will be more articulate next week in Zürich where at least I will not speak past my bedtime!)