## hierarchical models are not Bayesian models

**W**hen preparing my OxWaSP projects a few weeks ago, I came perchance on a set of slides, entitled “Hierarchical models are not Bayesian“, written by Brian Dennis (University of Idaho), where the author argues against Bayesian inference in hierarchical models in ecology, much in relation with the previously discussed paper of Subhash Lele. The argument is the same, namely a possibly major impact of the prior modelling on the resulting inference, in particular when some parameters are hardly identifiable, the more when the model is complex and when there are many parameters. And that “data cloning” being available since 2007, frequentist methods have “caught up” with Bayesian computational abilities.

Let me remind the reader that “data cloning” means constructing a sequence of Bayes estimators corresponding to the data being duplicated (or cloned) once, twice, &tc., until the point estimator stabilises. Since this corresponds to using increasing powers of the likelihood, the posteriors concentrate more and more around the maximum likelihood estimator. And even recover the Hessian matrix. This technique is actually older than 2007 since I proposed it in the early 1990’s under the name of prior feedback, with earlier occurrences in the literature like D’Epifanio (1989) and even the discussion of Aitkin (1991). A more efficient version of this approach is the SAME algorithm we developed in 2002 with Arnaud Doucet and Simon Godsill where the power of the likelihood is increased during iterations in a simulated annealing version (with a preliminary version found in Duflo, 1996).

I completely agree with the author that a hierarchical model *does not have to be* Bayesian: when the random parameters in the model are analysed as sources of additional variations, as for instance in animal breeding or ecology, and integrated out, the resulting model can be analysed by *any* statistical method. Even though one may wonder at the motivations for selecting this particular randomness structure in the model. And at an increasing blurring between what is prior modelling and what is sampling modelling as the number of levels in the hierarchy goes up. This rather amusing set of slides somewhat misses a few points, in particular the ability of data cloning to overcome identifiability and multimodality issues. Indeed, as with all simulated annealing techniques, there is a practical difficulty in avoiding the fatal attraction of a local mode using MCMC techniques. There are thus high chances data cloning ends up in the “wrong” mode. Moreover, when the likelihood is multimodal, it is a general issue to decide which of the modes is most relevant for inference. In which sense is the MLE more objective than a Bayes estimate, then? Further, the impact of a prior on some aspects of the posterior distribution can be tested by re-running a Bayesian analysis with different priors, including empirical Bayes versions or, why not?!, data cloning, in order to understand where and why huge discrepancies occur. This is part of model building, in the end.

February 18, 2015 at 10:14 am

Imagine a parameter which is bounded (a variance, bounded at 0). Confidence interval given by the Hessian matrix may span beyond the boundary (a nasty feature for practitioners).

But this “data cloning” stuff will not recover the correct Hessian matrix, because the Bayesian estimator will never estimate parameters out of bounds (or am I wrong?).

One of the points of Bayesian estimators coupled with MCMC is its ability to handle small data correctly, giving correct finite-size confidence intervals, without using asymptotic results. These people suggest that this is a bad feature. I do not understand. I would rather suggest using bootstrapping.

February 18, 2015 at 2:53 pm

So there’s a recent-ish AoS paper by Bochkina and Green on Bernstein-von Mises theorems for non-regular problems (such as the true parameter being on the boundary on the space) which says, under fairly ok conditions on the prior, that if gradient of the log-likelihood (asymptotically) vanishes at the true value*, the posterior looks exactly like the corresponding asymptotics for the MLE.

* For a bounded problem, the derivative doesn’t have to vanish at the maximum! It could just be an end-point of the domain. In this case, weird things happen.

February 18, 2015 at 3:00 pm

“But this “data cloning” stuff will not recover the correct Hessian matrix, because the Bayesian estimator will never estimate parameters out of bounds (or am I wrong?).”

the sample covariance based on MCMC draws for the cloned posterior, multiplied by the number of clones K, will give you the covariance of the MLE (for an infinite K). Therefore yes, the multiplication by K might well produce confidence intervals “out of bounds”

February 18, 2015 at 1:58 am

OK, much later but in 2007/8 when I overheard a couple of graduate students at Duke fretting over how to simulate a data set twice the size that was similar enough to the original data set to determine if convergence problems would disappear with more data – I suggested they simply raise the likelihood by various powers instead. I was a bit surprised why they did not get it and seemed to think I was talking nonsense.

A couple years later, I noticed the data cloning R package and reference and sent the it to one of the students who had graduated. They replied, “Oh, now it makes sense.”

It is not always obvious what is obvious to others. On the other hand, Don Fraser once said, its is not obvious how to make the obvious, obvious.