Archive for mathematical statistics

mathematical theory of Bayesian statistics [book review]

Posted in Books, Statistics, Travel, University life with tags , , , on May 6, 2021 by xi'an

I came by chance (and not by CHANCE) upon this 2018 CRC Press book by Sumio Watanabe and ordered it myself to gather which material it really covered. As the back-cover blurb was not particularly clear and the title sounded quite general. After reading it, I found out that this is a mathematical treatise on some aspects of Bayesian information criteria, in particular on the Widely Applicable Information Criterion (WAIC) that was introduced by the author in 2010. The result is a rather technical and highly focussed book with little motivation or intuition surrounding the mathematical results, which may make the reading arduous for readers. Some background on mathematical statistics and Bayesian inference is clearly preferable and the book cannot be used as a textbook for most audiences, as opposed to eg An Introduction to Bayesian Analysis by J.K. Ghosh et al. or even more to Principles of Uncertainty by J. Kadane. In connection with this remark the exercises found in the book are closer to the delivery of additional material than to textbook-style exercises.

“posterior distributions are often far from any normal distribution, showing that Bayesian estimation gives the more accurate inference than other estimation methods.”

The overall setting is one where both the sampling and the prior distributions are different from respective “true” distributions. Requiring a tool to assess the discrepancy when utilising a specific pair of such distributions. Especially when the posterior distribution cannot be approximated by a Normal distribution. (Lindley’s paradox makes an interesting incognito incursion on p.238.) The WAIC is supported for the determination of the “true” model, in opposition to AIC and DIC, incl. on a mixture example that reminded me of our eight versions of DIC paper. In the “Basic Bayesian Theory” chapter (§3), the “basic theorem of Bayesian statistics” (p.85) states that the various losses related with WAIC can be expressed as second-order Taylor expansions of some cumulant generating functions, with order o(n⁻¹), “even if the posterior distribution cannot be approximated by any normal distribution” (p.87). With the intuition that

“if a log density ratio function has a relatively finite variance then the generalization loss, the cross validation loss, the training loss and WAIC have the same asymptotic behaviors.”

Obviously, these “basic” aspects should come as a surprise to a fair percentage of Bayesians (in the sense of not being particularly basic). Myself included. Chapter 4 exposes why, for regular models, the posterior distribution accumulates in an ε neighbourhood of the optimal parameter at a speed O(n2/5prior weights on said models.prior weights). With the normalised partition fposterior probability ratiosunction being of order n-d/2 in the neighbourhood and exponentially negligible outside. A consequence of this regular asymptotic theory is that all above losses are asymptotically equivalent to the negative log likelihood plus similar order n⁻¹ terms that can be ordered. Chapters 5 and 6 deal with “standard” [the likelihood ratio is a multi-index power of the parameter ω] and general posterior distributions that can be written as mixtures of standard distributions,  with expressions of the above losses in terms of new universal constants. Again, a rather remote concern of mine. The book also includes a chapter (§7) on MCMC, with a rather involved proof that a Metropolis algorithm satisfies detailed balance (p.210). The Gibbs sampling section contains an extensive example on a two-dimensional two-component unit-variance Normal mixture, with an unusual perspective on the posterior, which is considered as “singular” when the true means are close. (Label switching or the absence thereof is not mentioned.) In terms of approximating the normalising constant (or free energy), the only method discussed there is path sampling, with a cryptic remark about harmonic mean estimators (not identified as such). In a final knapsack chapter (§9),  Bayes factors (confusedly denoted as L(x)) are shown to be most powerful tests in a Bayesian sense when comparing hypotheses without prior weights on said hypotheses, while posterior probability ratios are the natural statistics for comparing models with prior weights on said models. (With Lindley’s paradox making another appearance, still incognito!) And a  notion of phase transition for hyperparameters is introduced, with the meaning of a radical change of behaviour at a critical value of said hyperparameter. For instance, for a simple normal- mixture outlier model, the critical value of the Beta hyperparameter is α=2. Which is a wee bit of a surprise when considering Rousseau and Mengersen (2011) since their bound for consistency was α=d/2.

In conclusion, this is quite an original perspective on Bayesian models, covering the somewhat unusual (and potentially controversial) issue of misspecified priors and centered on the use of information criteria. I find the book could have benefited from further editing as I noticed many typos and somewhat unusual sentences (at least unusual to me).

[Disclaimer about potential self-plagiarism: this post or an edited version should eventually appear in my Books Review section in CHANCE.]

factorisation theorem on densities

Posted in Statistics with tags , , , , , , on December 23, 2020 by xi'an

Another occurrence, while building my final math stat exam for my (quarantined!) third year students, of a question on X validated that led me to write down more precisely an argument for the decomposition of densities in exponential families. Albeit the decomposition is somewhat moot (and lost on the initiator of the question since this person later posted an answer ignoring measures), as it all depends on the choice of the dominating measures over X, T(X), and the slices {x; T(x)=t}. The fact that the slice does depend on t requires the measure to accept a potential dependence on t, in which case the conditional density wrt this measure can as well be constant.

sans sérif & sans chevron

Posted in Books, R, Statistics, University life with tags , , , , , , , , , , , , on June 17, 2020 by xi'an
{\sf df=function(x)2*pi*x-4*(x>1)*acos(1/(x+(1-x)*(x<1)))}

As I was LaTeXing a remote exam for next week, including some R code questions, I came across the apparent impossibility to use < and > symbols in the sans-sérif “\sf” font… Which is a surprise, given the ubiquity of the symbols in R and my LaTeXing books over the years. Must have always used “\tt” and “\verb” then! On the side, I tried to work with the automultiplechoice LaTeX package [which should be renamed velomultiplechoice!] of Alexis Bienvenüe, which proved a bit of a challenge as the downloadable version contained a flawed file of automultiplechoice.sty! Still managed to produce a 400 question exam with random permutations of questions and potential answers. But not looking forward the 4 or 5 hours of delivering the test on Zoom…

PhD position for research in ABC in Chalmers University

Posted in Statistics with tags , , , , , , , , , on May 27, 2020 by xi'an

[Posting a call for PhD candidates from Umberto Piccini as the deadline is June 1, next Monday!]

A PhD student position in mathematical statistics on simulation-based inference methods for models with an “intractable” likelihood is available at the Dept. Mathematical Sciences, Chalmers University, Gothenburg (Sweden).

You will be part of an international collaboration to create new methodology bridging between simulation-based inference (such as approximate Bayesian computation and other likelihood-free methods) and deep neuronal networks. The goal is to ease inference for stochastic modelling.

Details on the project and the essential requirements are at

The PhD student position is fully funded and is up to 5 years, in the dynamic and international city of Gothenburg, the second largest city in Sweden, As a PhD student in Mathematical Sciences you will have opportunities for many inspiring conversations, a lot of autonomous work and some travel.

The position will be supervised by Assoc. Prof. Umberto Picchini.

Apply by 01 June 2020 following the instructions at

For informal enquiries, please get in touch with Umberto Picchini

PhD studenships at Warwick

Posted in Kids, pictures, Statistics, University life with tags , , , , , , , , on May 2, 2019 by xi'an

There is an exciting opening for several PhD positions at Warwick, in the departments of Statistics and of Mathematics, as part of the Centre for Doctoral Training in Mathematics and Statistics newly created by the University. CDT studentships are funded for four years and funding is open to students from the European Union without restrictions. (No Brexit!) Funding includes a stipend at UK/RI rates and tuition fees at UK/EU rates. Applications are made via the University of Warwick Online Application Portal and should be made  as quickly as possible since the funding will be allocated on a first come first serve basis. For more details, contact the CDT director, Martyn Plummer. I cannot but strongly encourage interested students to apply as this is a great opportunity to start a research career in a fantastic department!