## ABC for bivariate betas

Posted in Statistics, University life with tags , , , , , , , on February 19, 2014 by xi'an

Crakel and Flegal just arXived a short paper running ABC for doing inference on the parameters of two families of bivariate betas. And I could not but read it thru. And wonder why ABC was that necessary to handle the model. The said bivariate betas are defined from

$V_1=(U_1+U_5+U_7)/(U_3+U_6+U_8)\,,$

$V_2=(U_2+U_5+U_8)/(U_4+U_6+U_7)$

when

$U_i\sim \text{Ga}(\delta_i,1)$

and

$X_1=V_1/(1+V_1)\,,\ X_2=V_2/(1+V_2)$

This makes each term in the pair Beta and the two components dependent. This construct was proposed by Arnold and Ng (2011). (The five-parameter version cancels the gammas for i=3,4,5.)

Since the pdf of the joint distribution is not available in closed form, Crakel and Flegal zoom on ABC-MCMC as the method of choice and discuss simulation experiments. (The choice of the tolerance ε as an absolute rather than relative value, ε=0.2,0.0.6,0.8, puzzles me, esp. since the distance between the summary statistics is not scaled.) I however wonder why other approaches are impossible. (Or why it is necessary to use this distribution to model correlated betas. Unless I am confused copulas were invented to this effect.) First, this is a latent variable model, so latent variables could be introduced inside an MCMC scheme. A wee bit costly but feasible. Second, several moments of those distributions are known so a empirical likelihood approach could be considered.

## On the use of marginal posteriors in marginal likelihood estimation via importance-sampling

Posted in R, Statistics, University life with tags , , , , , , , , , , , , , on November 20, 2013 by xi'an

Perrakis, Ntzoufras, and Tsionas just arXived a paper on marginal likelihood (evidence) approximation (with the above title). The idea behind the paper is to base importance sampling for the evidence on simulations from the product of the (block) marginal posterior distributions. Those simulations can be directly derived from an MCMC output by randomly permuting the components. The only critical issue is to find good approximations to the marginal posterior densities. This is handled in the paper either by normal approximations or by Rao-Blackwell estimates. the latter being rather costly since one importance weight involves B.L computations, where B is the number of blocks and L the number of samples used in the Rao-Blackwell estimates. The time factor does not seem to be included in the comparison studies run by the authors, although it would seem necessary when comparing scenarii.

After a standard regression example (that did not include Chib’s solution in the comparison), the paper considers  2- and 3-component mixtures. The discussion centres around label switching (of course) and the deficiencies of Chib’s solution against the current method and Neal’s reference. The study does not include averaging Chib’s solution over permutations as in Berkoff et al. (2003) and Marin et al. (2005), an approach that does eliminate the bias. Especially for a small number of components. Instead, the authors stick to the log(k!) correction, despite it being known for being quite unreliable (depending on the amount of overlap between modes). The final example is Diggle et al. (1995) longitudinal Poisson regression with random effects on epileptic patients. The appeal of this model is the unavailability of the integrated likelihood which implies either estimating it by Rao-Blackwellisation or including the 58 latent variables in the analysis.  (There is no comparison with other methods.)

As a side note, among the many references provided by this paper, I did not find trace of Skilling’s nested sampling or of safe harmonic means (as exposed in our own survey on the topic).

## Correlated Poissons

Posted in Statistics with tags , , on March 2, 2011 by xi'an

A graduate student came to see me the other day with a bivariate Poisson distribution and a question about using EM in this framework. The problem boils down to adding one correlation parameter and an extra term in the likelihood

$(1-\rho)^{n_1}(1+\lambda\rho)^{n_2}(1+\mu\rho)^{n_3}(1-\lambda\mu\rho)^{n_4}\quad 0\le\rho\le\min(1,\frac{1}{\lambda\mu})$

Both terms involving sums are easy to deal with, using latent variables as in mixture models. The subtractions are trickier, as the negative parts cannot appear in a conditional distribution. Even though the problem can be handled by a direct numerical maximisation or by an almost standard Metropolis-within-Gibbs sampler, my suggestion regarding EM per se was to proceed by conditional EM, one parameter at a time. For instance, when considering $\rho$ conditional on both Poisson parameters, depending on whether $\lambda\mu>1$ or not, one can consider either

$(1-\theta/\lambda\mu)^{n_1}(1+\theta/\mu)^{n_2}(1+\theta/\lambda)^{n_3}(1-\theta)^{n_4}\quad0<\theta<1$

and turn

$(1-\theta/\lambda\mu) \text{ into } (1-\theta+\theta\{1-\frac{1}{\lambda\mu}\})$

thus producing a Beta-like target function in $\theta$ after completion, or turn

$(1-\lambda\mu\rho) \text{ into } (1-\rho+\{1-\lambda\mu\}\rho)$

to produce a Beta-like target function in $\rho$ after completion. In the end, this is a rather pedestrian exercise and I am still frustrated at missing the trick to handle the subtractions directly, however it was nonetheless a nice question!

## Computing evidence

Posted in Books, R, Statistics with tags , , , , , , , , , , on November 29, 2010 by xi'an

The book Random effects and latent variable model selection, edited by David Dunson in 2008 as a Springer Lecture Note. contains several chapters dealing with evidence approximation in mixed effect models. (Incidentally, I would be interested in the story behind the  Lecture Note as I found no explanation in the backcover or in the preface. Some chapters but not all refer to a SAMSI workshop on model uncertainty…) The final chapter written by Joyee Ghosh and David Dunson (similar to a corresponding paper in JCGS) contains in particular the interesting identity that the Bayes factor opposing model h to model h-1 can be unbiasedly approximated by (the average of the terms)

$\dfrac{f(x|\theta_{i,h},\mathfrak{M}=h-1)}{f(x|\theta_{i,h},\mathfrak{M}=h)}$

when

• $\mathfrak{M}$ is the model index,
• the $\theta_{i,h}$‘s are simulated from the posterior under model h,
• the model $\mathfrak{M}=h-1$ only considers the h-1 first components of $\theta_{i,h}$,
• the prior under model h-1 is the projection of the prior under model h. (Note that this marginalisation is not the projection used in Bayesian Core.)

## València 9 snapshot [5]

Posted in pictures, Running, Statistics, University life with tags , , , , , , , on June 9, 2010 by xi'an

For the final day of the meeting, after a good one hour run to the end of the Benidorm bay (for me at least!),  we got treated to great talks, culminating with the fitting conclusion given by the conference originator, José Bernardo. The first talk of the day was Guido Consonni’s, who introduced a new class of non-local priors to deal with variable selection. From my understanding, those priors avoid a neighbourhood of zero by placing a polynomial prior on the regression coefficients in order to discriminate better between the null and the alternative,

$\pi(\mathbf{\beta}) = \prod_i \beta_i^ h$

but the influence of the power h seems to be drastic, judging from the example showed by Guido where a move from h=0 to h=1, modified the posterior probability from 0.091 to 0.99 for the same dataset. The discussion by Jim Smith was a perfect finale to the Valencia meetings, Jim being much more abrasive than the usual discussant (while always giving the impression of being near a heart attack//!) The talk from Sylvia Früwirth-Schnatter purposely borrowed Nick Polson’ s title Shrink globally, act locally, and was also dealing with the Bayesian (re)interpretation of Lasso. (I was again left with the impression of hyperparameters that needed to be calibrated but this impression may change after I read the paper!) The talk by Xiao-Li Meng was as efficient as ever with Xiao-Li! Despite the penalising fact of being based on a discussion he wrote for Statistical Science, he managed to convey a global  and convincing picture of likelihood inference in latent variable models, while having the audience laugh most of the talk, a feat repeated by his discussant, Ed George. The basic issue of treating latent variables as parameters offers no particular difficulty in Bayesian inference but this is not true for likelihood models, as shown by both Xiao-Li and Ed. The last talk of the València series managed to make a unifying theory out of the major achievements of José Bernardo and, while I have some criticisms about the outcome, this journey back to decision theory, intrinsic losses and reference priors was nonetheless a very appropriate supplementary contribution of José to this wonderful series of meetings…. Luis Perricchi discussed the paper in a very opinionated manner, defending the role of the Bayes factor, and the debate could have gone forever…Hopefully, I will find time to post my comments on José’s paper.

I am quite sorry I had to leave before the Savage prize session where the four finalists to the prize gave a lecture. Those finalists are of the highest quality as the prize is not given in years when the quality of the theses is not deemed high enough. I will also miss the final evening during which the DeGroot Prize is attributed. (When I received the prize for Bayesian Core. in 2004, I had also left in the morning Valparaiso, just before the banquet!)