Archive for loss function

the three i’s of poverty

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , on September 15, 2019 by xi'an

Today I made a “quick” (10h door to door!) round trip visit to Marseille (by train) to take part in the PhD thesis defense (committee) of Edwin Fourrier-Nicolaï, which title was Poverty, inequality and redistribution: an econometric approach. While this was mainly a thesis in economics, meaning defending some theory on inequalities based on East German data, there were Bayesian components in the thesis that justified (to some extent!) my presence in the jury. Especially around mixture estimation by Gibbs sampling. (On which I started working almost exactly 30 years ago, when I joined Paris 6 and met  Gilles Celeux and Jean Diebolt.) One intriguing [for me] question stemmed from this defense, namely the notion of a Bayesian estimation of a three i’s of poverty (TIP) curve. The three i’s stand for incidence, intensity, and inequality, as, introduced in Jenkins and Lambert (1997), this curve measure the average income loss from the poverty level for the 100p% lower incomes, when p varies between 0 and 1. It thus depends on the distribution F of the incomes and when using a mixture distribution its computation requires a numerical cdf inversion to determine the income p-th quantile. A related question is thus on how to define a Bayesian estimate of the TIP curve. Using an average over the values of an MCMC sample does not sound absolutely satisfactory since the upper bound in the integral varies for each realisation of the parameter. The use of another estimate would however require a specific loss function, an issue not discussed in the thesis.

admissible estimators that are not Bayes

Posted in Statistics with tags , , , , , , on December 30, 2017 by xi'an

A question that popped up on X validated made me search a little while for point estimators that are both admissible (under a certain loss function) and not generalised Bayes (under the same loss function), before asking Larry Brown, Jim Berger, or Ed George. The answer came through Larry’s book on exponential families, with the two examples attached. (Following our 1989 collaboration with Roger Farrell at Cornell U, I knew about the existence of testing procedures that were both admissible and not Bayes.) The most surprising feature is that the associated loss function is strictly convex as I would have thought that a less convex loss would have helped to find such counter-examples.

MAP as Bayes estimators

Posted in Books, Kids, Statistics with tags , , , , on November 30, 2016 by xi'an

screenshot_20161122_123607Robert Bassett and Julio Deride just arXived a paper discussing the position of MAPs within Bayesian decision theory. A point I have discussed extensively on the ‘Og!

“…we provide a counterexample to the commonly accepted notion of MAP estimators as a limit of Bayes estimators having 0-1 loss.”

The authors mention The Bayesian Choice stating this property without further precautions and I completely agree to being careless in this regard! The difficulty stands with the limit of the maximisers being not necessarily the maximiser of the limit. The paper includes an example to this effect, with a prior as above,  associated with a sampling distribution that does not depend on the parameter. The sufficient conditions proposed therein are that the posterior density is almost surely proper or quasiconcave.

This is a neat mathematical characterisation that cleans this “folk theorem” about MAP estimators. And for which the authors are to be congratulated! However, I am not very excited by the limiting property, whether it holds or not, as I have difficulties conceiving the use of a sequence of losses in a mildly realistic case. I rather prefer the alternate characterisation of MAP estimators by Burger and Lucka as proper Bayes estimators under another type of loss function, albeit a rather artificial one.

ISBA 2016 [#3]

Posted in pictures, Running, Statistics, Travel, University life, Wines with tags , , , , , , , , , , on June 16, 2016 by xi'an

Among the sessions I attended yesterday, I really liked the one on robustness and model mispecification. Especially the talk by Steve McEachern on Bayesian inference based on insufficient statistics, with a striking graph of the degradation of the Bayes factor as the prior variance increases. I sadly had no time to grab a picture of the graph, which compared this poor performance against a stable rendering when using a proper summary statistic. It clearly relates to our work on ABC model choice, as well as to my worries about the Bayes factor, so this explains why I am quite excited about this notion of restricted inference. In this session, Chris Holmes also summarised his two recent papers on loss-based inference, which I discussed here in a few posts, including the Statistical Science discussion Judith and I wrote recently. I also went to the j-ISBA [section] session which was sadly under-attended, maybe due to too many parallel sessions, maybe due to the lack of unifying statistical theme.

likelihood-free Bayesian inference on the minimum clinically important difference

Posted in Books, Statistics, University life with tags , , , , , on January 20, 2015 by xi'an

Last week, Likelihood-free Bayesian inference on the minimum clinically important difference was arXived by Nick Syring and Ryan Martin and I read it over the weekend, slowly coming to the realisation that their [meaning of] “likelihood free” was not my [meaning of] “likelihood free”, namely that it has nothing to do with ABC! The idea therein is to create a likelihood out of a loss function, in the spirit of Bassiri, Holmes and Walker, the loss being inspired here by a clinical trial concept, the minimum clinically important difference, defined as

\theta^* = \min_\theta\mathbb{P}(Y\ne\text{sign}(X-\theta))

which defines a loss function per se when considering the empirical version. In clinical trials, Y is a binary outcome and X a vector of explanatory variables. This model-free concept avoids setting a joint distribution  on the pair (X,Y), since creating a distribution on a large vector of covariates is always an issue. As a marginalia, the authors actually mention our MCMC book in connection with a logistic regression (Example 7.11) and for a while I thought we had mentioned MCID therein, realising later it was a standard description of MCMC for logistic models.

The central and interesting part of the paper is obviously defining the likelihood-free posterior as

\pi_n(\theta) \propto \exp\{-n L_n(\theta) \}\pi(\theta)

The authors manage to obtain the rate necessary for the estimation to be asymptotically consistent, which seems [to me] to mean that a better representation of the likelihood-free posterior should be

\pi_n(\theta) \propto \exp\{-n^{-2/5} L_n(\theta) \}\pi(\theta)

(even though this rescaling does not appear verbatim in the paper). This is quite an interesting application of the concept developed by Bissiri, Holmes and Walker, even though it also illustrates the difficulty of defining a specific prior, given that the minimised target above can be transformed by an arbitrary increasing function. And the mathematical difficulty in finding a rate.