Archive for asymptotics

statistical analysis of GANs

Posted in Books, Statistics with tags , , , , , , , , on May 24, 2021 by xi'an

My friend Gérard Biau and his coauthors have published a paper in the Annals of Statistics last year on the theoretical [statistical] analysis of GANs, which I had missed and recently read with a definitive interest in the issues. (With no image example!)

If the discriminator is unrestricted the unique optimal solution is the Bayes posterior probability

\dfrac{p^\star(x)}{p^\star(x)+p_\theta(x)}

when the model density is everywhere positive. And the optimal parameter θ corresponds to the closest model in terms of Kullback-Leibler divergence. The pseudo-true value of the parameter. This is however the ideal situation, while in practice D is restricted to a parametric family. In this case, if the family is wide enough to approximate the ideal discriminator in the sup norm, with error of order ε, and if the parameter space Θ is compact, the optimal parameter found under the restricted family approximates the pseudo-true value in the sense of the GAN loss, at the order ε². With a stronger assumption on the family ability to approximate any discriminator, the same property holds for the empirical version (and in expectation). (As an aside, the figure illustrating this property confusedly uses an histogramesque rectangle to indicate the expectation of the discriminator loss!) And both parameter (θ and α) estimators converge to the optimal ones with the sample size. An interesting foray from statisticians in a method whose statistical properties are rarely if ever investigated. Missing a comparison with alternative approaches, like MLE, though.

Don Fraser (1925-2020)

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , , on December 24, 2020 by xi'an

I just received the very sad news that Don Fraser, emeritus professor of statistics at the University of Toronto, passed away this Monday, 21 December 2020. He was a giant of the field, with a unique ability for abstract modelling and he certainly pushed fiducial statistics much further than Fisher ever did. He also developed a theory of structural  inference that came close to objective Bayesian statistics, although he remained quite critical of the Bayesian approach (always in a most gentle manner, as he was a very nice man!). And most significantly contributed to high order asymptotics, to the critical analysis of ancilarity and sufficiency principles, and more beyond. (Statistical Science published a conversation with Don, in 2004, providing more personal views on his career till then.) I met with Don and Nancy rather regularly over the years, as they often attended and talked at (objective) Bayesian meetings, from the 1999 edition in Granada, to the last one in Warwick in 2019. I also remember a most enjoyable barbecue together, along with Ivar Ekeland and his family, during JSM 2018, on Jericho Park Beach, with a magnificent sunset over the Burrard Inlet. Farewell, Don!

essentials of probability theory for statisticians

Posted in Books, Kids, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , on April 25, 2020 by xi'an

On yet another confined sunny lazy Sunday morning, I read through Proschan and Shaw’s Essentials of Probability Theory for Statisticians, a CRC Press book that was sent to me quite a while ago for review. The book was indeed published in 2016. Before moving to serious things, let me evacuate the customary issue with the cover. I have trouble getting the point of the “face on Mars” being adopted as the cover of a book on probability theory (rather than a book on, say, pareidolia). There is a brief paragraph on post-facto probability calculations, stating how meaningless the question of the probability of this shade appearing on a Viking Orbiter picture by “chance”, but this is so marginal I would have preferred any other figure from the book!

The book plans to cover the probability essentials for dealing with graduate level statistics and in particular convergence, conditioning, and paradoxes following from using non-rigorous approaches to probability. A range that completely fits my own prerequisite for statistics students in my classes and that of course involves the recourse to (Lebesgue) measure theory. And a goal that I find both commendable and comforting as my past experience with exchange students led me to the feeling that rigorous probability theory was mostly scrapped from graduate programs. While the book is not extremely formal, it provides a proper motivation for the essential need of measure theory to handle the complexities of statistical analysis and in particular of asymptotics. It thus relies as much as possible on examples that stem from or relate to statistics, even though most examples may appear as standard to senior readers. For instance the consistency of the sample median or a weak version of the Glivenko-Cantelli theorem. The final chapter is dedicated to applications (in the probabilist’ sense!) that emerged from statistical problems. I felt these final chapters were somewhat stretched compared with what they could have been, as for instance with the multiple motivations of the conditional expectation, but this simply makes for more material. If I had to teach this material to students, I would certainly rely on the book! in particular because of the repeated appearances of the quincunx for motivating non-Normal limites. (A typo near Fatou’s lemma missed the dominating measure. And I did not notice the Riemann notation dx being extended to the measure in a formal manner.)

[Disclaimer about potential self-plagiarism: this post or an edited version will eventually appear in my Books Review section in CHANCE.]

sampling-importance-resampling is not equivalent to exact sampling [triste SIR]

Posted in Books, Kids, Statistics, University life with tags , , , , , , on December 16, 2019 by xi'an

Following an X validated question on the topic, I reassessed a previous impression I had that sampling-importance-resampling (SIR) is equivalent to direct sampling for a given sample size. (As suggested in the above fit between a N(2,½) target and a N(0,1) proposal.)  Indeed, when one produces a sample

x_1,\ldots,x_n \stackrel{\text{i.i.d.}}{\sim} g(x)

and resamples with replacement from this sample using the importance weights

f(x_1)g(x_1)^{-1},\ldots,f(x_n)g(x_n)^{-1}

the resulting sample

y_1,\ldots,y_n

is neither “i.” nor “i.d.” since the resampling step involves a self-normalisation of the weights and hence a global bias in the evaluation of expectations. In particular, if the importance function g is a poor choice for the target f, meaning that the exploration of the whole support is imperfect, if possible (when both supports are equal), a given sample may well fail to reproduce the properties of an iid example ,as shown in the graph below where a Normal density is used for g while f is a Student t⁵ density:

asymptotics of M³C²L

Posted in Statistics with tags , , , , , , , on August 19, 2018 by xi'an
In a recent arXival, Blazej Miasojedow, Wojciech Niemiro and Wojciech Rejchel establish the convergence of a maximum likelihood estimator based on an MCMC approximation of the likelihood function. As in intractable normalising constants. The main result in the paper is a Central Limit theorem for the M³C²L estimator that incorporates an additional asymptotic variance term for the Monte Carlo error. Where both the sample size n and the number m of simulations go to infinity. Independently so. However, I do not fully perceive the relevance of using an MCMC chain to target an importance function [which is used in the approximation of the normalising constant or otherwise for the intractable likelihood], relative to picking an importance function h(.) that can be directly simulated.