Archive for Bayes estimators

admissible estimators that are not Bayes

Posted in Statistics with tags , , , , , , on December 30, 2017 by xi'an

A question that popped up on X validated made me search a little while for point estimators that are both admissible (under a certain loss function) and not generalised Bayes (under the same loss function), before asking Larry Brown, Jim Berger, or Ed George. The answer came through Larry’s book on exponential families, with the two examples attached. (Following our 1989 collaboration with Roger Farrell at Cornell U, I knew about the existence of testing procedures that were both admissible and not Bayes.) The most surprising feature is that the associated loss function is strictly convex as I would have thought that a less convex loss would have helped to find such counter-examples.

hierarchical models are not Bayesian models

Posted in Books, Kids, Statistics, University life with tags , , , , , , , on February 18, 2015 by xi'an

When preparing my OxWaSP projects a few weeks ago, I came perchance on a set of slides, entitled “Hierarchical models are not Bayesian“, written by Brian Dennis (University of Idaho), where the author argues against Bayesian inference in hierarchical models in ecology, much in relation with the previously discussed paper of Subhash Lele. The argument is the same, namely a possibly major impact of the prior modelling on the resulting inference, in particular when some parameters are hardly identifiable, the more when the model is complex and when there are many parameters. And that “data cloning” being available since 2007, frequentist methods have “caught up” with Bayesian computational abilities.

Let me remind the reader that “data cloning” means constructing a sequence of Bayes estimators corresponding to the data being duplicated (or cloned) once, twice, &tc., until the point estimator stabilises. Since this corresponds to using increasing powers of the likelihood, the posteriors concentrate more and more around the maximum likelihood estimator. And even recover the Hessian matrix. This technique is actually older than 2007 since I proposed it in the early 1990’s under the name of prior feedback, with earlier occurrences in the literature like D’Epifanio (1989) and even the discussion of Aitkin (1991). A more efficient version of this approach is the SAME algorithm we developed in 2002 with Arnaud Doucet and Simon Godsill where the power of the likelihood is increased during iterations in a simulated annealing version (with a preliminary version found in Duflo, 1996).

I completely agree with the author that a hierarchical model does not have to be Bayesian: when the random parameters in the model are analysed as sources of additional variations, as for instance in animal breeding or ecology, and integrated out, the resulting model can be analysed by any statistical method. Even though one may wonder at the motivations for selecting this particular randomness structure in the model. And at an increasing blurring between what is prior modelling and what is sampling modelling as the number of levels in the hierarchy goes up. This rather amusing set of slides somewhat misses a few points, in particular the ability of data cloning to overcome identifiability and multimodality issues. Indeed, as with all simulated annealing techniques, there is a practical difficulty in avoiding the fatal attraction of a local mode using MCMC techniques. There are thus high chances data cloning ends up in the “wrong” mode. Moreover, when the likelihood is multimodal, it is a general issue to decide which of the modes is most relevant for inference. In which sense is the MLE more objective than a Bayes estimate, then? Further, the impact of a prior on some aspects of the posterior distribution can be tested by re-running a Bayesian analysis with different priors, including empirical Bayes versions or, why not?!, data cloning, in order to understand where and why huge discrepancies occur. This is part of model building, in the end.

minimaxity of a Bayes estimator

Posted in Books, Kids, Statistics, University life with tags , , , , , on February 2, 2015 by xi'an

Today, while in Warwick, I spotted on Cross Validated a question involving “minimax” in the title and hence could not help but look at it! The way I first understood the question (and immediately replied to it) was to check whether or not the standard Normal average—reduced to the single Normal observation by sufficiency considerations—is a minimax estimator of the normal mean under an interval zero-one loss defined by

\mathcal{L}(\mu,\hat{\mu})=\mathbb{I}_{|\mu-\hat\mu|>L}=\begin{cases}1 &\text{if }|\mu-\hat\mu|>L\\ 0&\text{if }|\mu-\hat{\mu}|\le L\\ \end{cases}

where L is a positive tolerance bound. I had not seen this problem before, even though it sounds quite standard. In this setting, the identity estimator, i.e., the normal observation x, is indeed minimax as (a) it is a generalised Bayes estimator—Bayes estimators under this loss are given by the centre of an equal posterior interval—for this loss function under the constant prior and (b) it can be shown to be a limit of proper Bayes estimators and its Bayes risk is also the limit of the corresponding Bayes risks. (This is a most traditional way of establishing minimaxity for a generalised Bayes estimator.) However, this was not the question asked on the forum, as the book by Zacks it referred to stated that the standard Normal average maximised the minimal coverage, which amounts to the maximal risk under the above loss. With the strange inversion of parameter and estimator in the minimax risk:

\sup_\mu\inf_{\hat\mu} R(\mu,\hat{\mu})\text{ instead of } \sup_\mu\inf_{\hat\mu} R(\mu,\hat{\mu})

which makes the first bound equal to 0 by equating estimator and mean μ. Note however that I cannot access the whole book and hence may miss some restriction or other subtlety that would explain for this unusual definition. (As an aside, note that Cross Validated has a protection against serial upvoting, So voting up or down at once a large chunk of my answers on that site does not impact my “reputation”!)

MAP estimators (cont’d)

Posted in Statistics with tags , , , , on September 13, 2009 by xi'an

In connection with Anthony’s comments, here are the details for the normal example. I am using a flat prior on \mu when x\sim\mathcal{N}(\mu,1). The MAP estimator of \mu is then \hat\mu=x. If I consider the change of variable \mu=\text{logit}(\eta), the posterior distribution on \eta is

\pi(\eta|x) = \exp[ -(\text{logit}(\eta)-x)^2/2 ] / \sqrt{2\pi} \eta (1-\eta)

and the MAP in \eta is then obtained numerically. For instance, the R code

f=function(x,mea) dnorm(log(x/(1-x)),mean=mea)/(x*(1-x))
g=function(x){ a=optimise(f,int=c(0,1),maximum=TRUE,mea=x)$max;log(a/(1-a))}
plot(seq(0,4,.01),apply(as.matrix(seq(0,4,.01)),1,g),type="l",col="sienna",lwd=2)
abline(a=0,b=1,col="tomato2",lwd=2)

shows the divergence between the MAP estimator \hat\mu and the reverse transform of the MAP estimator \hat\eta of the transform… The second estimator is asymptotically (in x) equivalent to x+1.

An example I like very much in The Bayesian Choice is Example 4.1.2, when observing x\sim\text{Cauchy}(\theta,1) with a double exponential prior on \theta\sim\exp\{-|\theta|\}/2. The MAP is then always \hat\theta=0!

The dependence of the MAP estimator on the dominating measure is also studied in a BA paper by Pierre Druihlet and Jean-Michel Marin, who propose a solution that relies on Jeffreys’ prior as the reference measure.