## Bayesian parameter estimation versus model comparison

Posted in Books, pictures, Statistics with tags , , , , , , on December 5, 2016 by xi'an

John Kruschke [of puppies’ fame!] wrote a paper in Perspectives in Psychological Science a few years ago on the comparison between two Bayesian approaches to null hypotheses. Of which I became aware through a X validated question that seemed to confuse Bayesian parameter estimation with Bayesian hypothesis testing.

“Regardless of the decision rule, however, the primary attraction of using parameter estimation to assess null values is that the an explicit posterior distribution reveals the relative credibility of all the parameter values.” (p.302)

After reading this paper, I realised that Kruschke meant something completely different, namely that a Bayesian approach to null hypothesis testing could operate from the posterior on the corresponding parameter, rather than to engage into formal Bayesian model comparison (null versus the rest of the World). The notion is to check whether or not the null value stands within the 95% [why 95?] HPD region [modulo a buffer zone], which offers the pluses of avoiding a Dirac mass at the null value and a long-term impact of the prior tails on the decision, with the minus of replacing the null with a tolerance region around the null and calibrating the rejection level. This opposition is thus a Bayesian counterpart of running tests on point null hypotheses either by Neyman-Pearson procedures or by confidence intervals. Note that in problems with nuisance parameters this solution requires a determination of the 95% HPD region associated with the marginal on the parameter of interest, which may prove a challenge.

“…the measure provides a natural penalty for vague priors that allow a broad range of parameter values, because a vague prior dilutes credibility across a broad range of parameter values, and therefore the weighted average is also attenuated.” (p. 306)

While I agree with most of the critical assessment of Bayesian model comparison, including Kruschke’s version of Occam’s razor [and Lindley’s paradox] above, I do not understand how Bayesian model comparison fails to return a full posterior on both the model indices [for model comparison] and the model parameters [for estimation]. To state that it does not because the Bayes factor only depends on marginal likelihoods (p.307) sounds unfair if only because most numerical techniques to approximate the Bayes factors rely on preliminary simulations of the posterior. The point that the Bayes factor strongly depends on the modelling of the alternative model is well-taken, albeit the selection of the null in the “estimation” approach does depend as well on this alternative modelling. Which is an issue if one ends up accepting the null value and running a Bayesian analysis based on this null value.

“The two Bayesian approaches to assessing null values can be unified in a single hierarchical model.” (p.308)

Incidentally, the paper briefly considers a unified modelling that can be interpreted as a mixture across both models, but this mixture representation completely differs from ours [where we also advocate estimation to replace testing] since the mixture is at the likelihood x prior level, as in O’Neill and Kypriaos.

## rehabilitation of the communards

Posted in Books, Kids, pictures with tags , , , , on December 4, 2016 by xi'an

Today the French National Assembly voted the rehabilitation of all victims of the repression of the Commune de Paris (March 18—May 28, 1871), which saw between 10,000 and 20,000 Parisian men and women summarily executed by troops of the provisional government of Adolphe Thiers during the Bloody Week.

## ratio-of-uniforms [#4]

Posted in Books, pictures, R, Statistics, University life with tags , , , , on December 2, 2016 by xi'an

Possibly the last post on random number generation by Kinderman and Monahan’s (1977) ratio-of-uniform method. After fiddling with the Gamma(a,1) distribution when a<1 for a while, I indeed figured out a way to produce a bounded set with this method: considering an arbitrary cdf Φ with corresponding pdf φ, the uniform distribution on the set Λ of (u,v)’s in R⁺xX such that

0≤u≤Φοƒ[φοΦ⁻¹(u)v]

induces the distribution with density proportional to ƒ on φοΦ⁻¹(U)V. This set Λ has a boundary that is parameterised as

u=Φοƒ(x),  v=1/φοƒ(x), x∈Χ

which remains bounded in u since Φ is a cdf and in v if φ has fat enough tails. At both 0 and ∞. When ƒ is the Gamma(a,1) density this can be achieved if φ behaves like log(x)² near zero and like a inverse power at infinity. Without getting into all the gory details, closed form density φ and cdf Φ can be constructed for all a’s, as shown for a=½ by the boundaries in u and v (yellow) below

which leads to a bounded associated set Λ

At this stage, I remain uncertain of the relevance of such derivations, if only because the set A thus derived is ill-suited for uniform draws proposed on the enclosing square box. And also because a Gamma(a,1) simulation can rather simply be derived from a Gamma(a+1,1) simulation. But, who knows?!, there may be alternative usages of this representation, such as innovative slice samplers. Which means the ratio-of-uniform method may reappear on the ‘Og one of those days…

## MAP as Bayes estimators

Posted in Books, Kids, Statistics with tags , , , , on November 30, 2016 by xi'an

Robert Bassett and Julio Deride just arXived a paper discussing the position of MAPs within Bayesian decision theory. A point I have discussed extensively on the ‘Og!

“…we provide a counterexample to the commonly accepted notion of MAP estimators as a limit of Bayes estimators having 0-1 loss.”

The authors mention The Bayesian Choice stating this property without further precautions and I completely agree to being careless in this regard! The difficulty stands with the limit of the maximisers being not necessarily the maximiser of the limit. The paper includes an example to this effect, with a prior as above,  associated with a sampling distribution that does not depend on the parameter. The sufficient conditions proposed therein are that the posterior density is almost surely proper or quasiconcave.

This is a neat mathematical characterisation that cleans this “folk theorem” about MAP estimators. And for which the authors are to be congratulated! However, I am not very excited by the limiting property, whether it holds or not, as I have difficulties conceiving the use of a sequence of losses in a mildly realistic case. I rather prefer the alternate characterisation of MAP estimators by Burger and Lucka as proper Bayes estimators under another type of loss function, albeit a rather artificial one.

## asymptotically exact inference in likelihood-free models

Posted in Books, pictures, Statistics with tags , , , , , , , on November 29, 2016 by xi'an

“We use the intuition that inference corresponds to integrating a density across the manifold corresponding to the set of inputs consistent with the observed outputs.”

Following my earlier post on that paper by Matt Graham and Amos Storkey (University of Edinburgh), I now read through it. The beginning is somewhat unsettling, albeit mildly!, as it starts by mentioning notions like variational auto-encoders, generative adversial nets, and simulator models, by which they mean generative models represented by a (differentiable) function g that essentially turn basic variates with density p into the variates of interest (with intractable density). A setting similar to Meeds’ and Welling’s optimisation Monte Carlo. Another proximity pointed out in the paper is Meeds et al.’s Hamiltonian ABC.

“…the probability of generating simulated data exactly matching the observed data is zero.”

The section on the standard ABC algorithms mentions the fact that ABC MCMC can be (re-)interpreted as a pseudo-marginal MCMC, albeit one targeting the ABC posterior instead of the original posterior. The starting point of the paper is the above quote, which echoes a conversation I had with Gabriel Stolz a few weeks ago, when he presented me his free energy method and when I could not see how to connect it with ABC, because having an exact match seemed to cancel the appeal of ABC, all parameter simulations then producing an exact match under the right constraint. However, the paper maintains this can be done, by looking at the joint distribution of the parameters, latent variables, and observables. Under the implicit restriction imposed by keeping the observables constant. Which defines a manifold. The mathematical validation is achieved by designing the density over this manifold, which looks like

$p(u)\left|\frac{\partial g^0}{\partial u}\frac{\partial g^0}{\partial u}^\text{T}\right|^{-\textonehalf}$

if the constraint can be rewritten as g⁰(u)=0. (This actually follows from a 2013 paper by Diaconis, Holmes, and Shahshahani.) In the paper, the simulation is conducted by Hamiltonian Monte Carlo (HMC), the leapfrog steps consisting of an unconstrained move followed by a projection onto the manifold. This however sounds somewhat intense in that it involves a quasi-Newton resolution at each step. I also find it surprising that this projection step does not jeopardise the stationary distribution of the process, as the argument found therein about the approximation of the approximation is not particularly deep. But the main thing that remains unclear to me after reading the paper is how the constraint that the pseudo-data be equal to the observable data can be turned into a closed form condition like g⁰(u)=0. As mentioned above, the authors assume a generative model based on uniform (or other simple) random inputs but this representation seems impossible to achieve in reasonably complex settings.

## simulation by hand

Posted in Books, Kids, pictures, Statistics, Travel with tags , , , , , , , on November 28, 2016 by xi'an

A rather weird question on X validated this week was about devising a manual way to simulate (a few) normal variates. By manual I presume the author of the question means without resorting to a computer or any other business machine. Now, I do not know of any real phenomenon that is exactly and provably Normal. As analysed in a great philosophy of science paper by Aidan Lyon, the standard explanations for a real phenomenon to be Normal are almost invariably false, even those invoking the Central Limit Theorem. Hence I cannot think of a mechanical device that would directly return Normal generations from a Normal distribution with known parameters. However, since it is possible to simulate by hand Uniform U(0,1) variates [up to a given precision] using a chronometre or a wheel, calls to versions of the Box-Müller algorithm that do not rely on logarithmic or trigonometric functions are feasible, for instance by generating two Exponential variates, x and y, until 2y>(1-x)², x being the output. And generating Exponential variates is easy provided a radioactive material with known half-life is available, along with a Geiger counter. Or, if not, by calling von Neumann’s exponential generator. As detailed in Devroye’s simulation book.

After proposing this solution, I received a comment from the author of the question towards a simpler solution based, e.g., on the Central Limit Theorem. Presumably for simple iid random variables such as coin tosses or dice experiments. While I used the CLT for simulating Normal variables in my very early days [just after programming on punched cards!], I do not think this is a very good or efficient method, as the tails grow very slowly to normality. By comparison, using the same amount of coin tosses to create a sufficient number of binary digits of a Uniform variate produces a computer-precision exact Uniform variate, which can be exploited in Box-Müller-like algorithms to return exact Normal variates… Even by hand if necessary. [For some reason, this question attracted a lot of traffic and an encyclopaedic answer on X validated, despite being borderline to the point of being proposed for closure.]

## “Stein deviates from the statistical norm”

Posted in Books, pictures, Statistics, University life with tags , , , , , , , on November 27, 2016 by xi'an