## top model choice week (#3)

Posted in Statistics, University life with tags , , , , , , , , , , , on June 19, 2013 by xi'an

To conclude this exciting week, there will be a final seminar by Veronika Rockovà (Erasmus University) on Friday, June 21, at 11am at ENSAE  in Room 14. Here is her abstract:

11am: Fast Dynamic Posterior Exploration for Factor Augmented Multivariate Regression byVeronika Rockova

Advancements in high-throughput experimental techniques have facilitated the availability of diverse genomic data, which provide complementary information regarding the function and organization of gene regulatory mechanisms. The massive accumulation of data has increased demands for more elaborate modeling approaches that combine the multiple data platforms. We consider a sparse factor regression model, which augments the multivariate regression approach by adding a latent factor structure, thereby allowing for dependent patterns of marginal covariance between the responses. In order to enable the identi cation of parsimonious structure, we impose spike and slab priors on the individual entries in the factor loading and regression matrices. The continuous relaxation of the point mass spike and slab enables the implementation of a rapid EM inferential procedure for dynamic posterior model exploration. This is accomplished by considering a nested sequence of spike and slab priors and various factor space cardinalities. Identi ed candidate models are evaluated by a conditional posterior model probability criterion, permitting trans-dimensional comparisons. Patterned sparsity manifestations such as an orthogonal allocation of zeros in factor loadings are facilitated by structured priors on the binary inclusion matrix. The model is applied to a problem of integrating two genomic datasets, where expression of microRNA’s is related to the expression of genes with an underlying connectivity pathway network.

## top model choice week (#2)

Posted in Statistics, University life with tags , , , , , , , , , , , , on June 18, 2013 by xi'an

Following Ed George (Wharton) and Feng Liang (University of Illinois at Urbana-Champaign) talks today in Dauphine, Natalia Bochkina (University of Edinburgh) will  give a talk on Thursday, June 20, at 2pm in Room 18 at ENSAE (Malakoff) [not Dauphine!]. Here is her abstract:

2 am: Simultaneous local and global adaptivity of Bayesian wavelet estimators in nonparametric regression by Natalia Bochkina

We consider wavelet estimators in the context of nonparametric regression, with the aim of finding estimators that simultaneously achieve the local and global adaptive minimax rate of convergence. It is known that one estimator – James-Stein block thresholding estimator of T.Cai (2008) – achieves simultaneously both optimal rates of convergence but over a limited set of Besov spaces; in particular, over the sets of spatially inhomogeneous functions (with 1≤ p<2) the upper bound on the global rate of this estimator is slower than the optimal minimax rate.

Another possible candidate to achieve both rates of convergence simultaneously is the Empirical Bayes estimator of Johnstone and Silverman (2005) which is an adaptive estimator that achieves the global minimax rate over a wide rage of Besov spaces and Besov balls. The maximum marginal likelihood approach is used to estimate the hyperparameters, and it can be interpreted as a Bayesian estimator with a uniform prior. We show that it also achieves the adaptive local minimax rate over all Besov spaces, and hence it does indeed achieve both local and global rates of convergence simultaneously over Besov spaces. We also give an example of how it works in practice.

## top model choice week

Posted in Statistics, University life with tags , , , , , , , on June 13, 2013 by xi'an

Next week, we are having a special Bayesian [top] model choice week in Dauphine, thanks to the simultaneous visits of Ed George (Wharton), Feng Liang (University of Illinois at Urbana-Champaign), and Veronika Rockovà (Erasmus University). To start the week and get to know the local actors (!), Ed and Feng both give a talk on Tuesday, June 18, at 11am and 1pm in Room C108. Here are the abstracts:

11am: Prediction and Model Selection for Multi-task Learning by Feng Liang

In multi-task learning one simultaneoulsy fits multiple regression models. We are interested in inference problems like model selection and prediction when there are a large number of tasks. A simple version of such models is a one-way ANOVA model where the number of replicates is fixed but the number of groups goes to infinity. We examine the consistency of Bayesian procedures using Zellner (1986)’s g-prior and its variants (such as mixed g-priors and Empirical Bayes), and compare their prediction accuracy with other procedures, such as the ones based AIC/BIC and group Lasso. Our results indicate that the Empirical Bayes procedure (with some modification for the large p small n setting) can achieve model selection consistency, and also have better estimation accuracy than other procedures being considered. During my talk, I’ll focus on the analysis on the one-way ANOVA model, but will also give a summary on our findings for multi-tasking learning invovling a more general regression setting. This is based on joint work with my PhD student Bin Li from University of Illinois at Urbana-Champaign.

1pm: EMVS: The EM Approach to Bayesian Variable Selection by Edward George

Despite rapid developments in stochastic search algorithms, the practicality of Bayesian variable selection methods has continued to pose challenges. High-dimensional data are now routinely analyzed, typically with many more covariates than observations. To broaden the applicability of Bayesian variable selection for such high-dimensional linear regression contexts, we propose EMVS, a deterministic alternative to stochastic search based on an EM algorithm which exploits a conjugate mixture prior formulation to quickly find posterior modes. Combining a spike-and-slab regularization diagram for the discovery of active predictor sets with subsequent rigorous evaluation of posterior model probabilities, EMVS rapidly identifies promising sparse high posterior probability submodels. External structural information such as likely covariate groupings or network topologies is easily incorporated into the EMVS framework. Deterministic annealing variants are seen to improve the effectiveness of our algorithms by mitigating the posterior multi-modality associated with variable selection priors. The usefulness the EMVS approach is demonstrated on real high-dimensional data, where computational complexity renders stochastic search to be less practical. This is joint work with Veronika Rockova of Erasmus University)

Posted in Books, Statistics with tags , , , , , , , , on March 23, 2013 by xi'an

This morning session at the workshop Recent Advances in statistical inference: theory and case studies was a true blessing for anyone working in Bayesian model choice! And it did give me ideas to complete my current paper on the Jeffreys-Lindley paradox, and more. Attending the talks in the historical Gioachino Rossini room of the fabulous Café Pedrocchi with the Italian spring blue sky as a background surely helped! (It is only beaten by this room of Ca’Foscari overlooking the Gran Canale where we had a workshop last Fall…)

First, Phil Dawid gave a talk on his current work with Monica Musio (who gave a preliminary talk on this in Venezia last fall) on the use of new score functions to compare statistical models. While the regular Bayes factor is based on the log score, comparing the logs of the predictives at the observed data, different functions of the predictive q can be used, like the Hyvärinen score

$S(x,q)=\Delta\sqrt{q(x)}\big/\sqrt{q(x)}$

which offers the immense advantage of being independent of the normalising constant and hence can also be used for improper priors. As written above, a very deep finding that could at last allow for the comparison of models based on improper priors without requiring convoluted constructions (see below) to make the “constants meet”. I first thought the technique was suffering from the same shortcoming as Murray Aitkin’s integrated likelihood, but I eventually figured out (where) I was wrong!

The second talk was given by Ed George, who spoke on his recent research with Veronika Rocková dealing with variable selection via an EM algorithm that proceeds much much faster to the optimal collection of variables, when compared with the DMVS solution of George and McCulloch (JASA, 1993). (I remember discussing this paper with Ed in Laramie during the IMS meeting in the summer of 1993.) This resurgence of the EM algorithm in this framework is both surprising (as the missing data structure represented by the variable indicators could have been exploited much earlier) and exciting, because it opens a new way to explore the most likely models in this variable selection setting and to eventually produce the median model of Berger and Barbieri (Annals of Statistics, 2004). In addition, this approach allows for a fast comparison of prior modellings on the missing variable indicators, showing in some examples a definitive improvement brought by a Markov random field structure. Given that it also produces a marginal posterior density on the indicators, values of hyperparameters can be assessed, escaping the Jeffreys-Lindley paradox (which was clearly a central piece of today’s talks and discussions). I would like to see more details on the MRF part, as I wonder which structure is part of the input and which one is part of the inference.

The third talk of the morning was Susie Bayarri’s, about a collection of desiderata or criteria for building an objective prior in model comparison and achieving a manageable closed-form solution in the case of the normal linear model. While I somehow disagree with the information criterion, which states that the divergence of the likelihood ratio should imply a corresponding divergence of the Bayes factor. While I definitely agree with the invariance argument leading to using the same (improper) prior over parameters common to models under comparison, this may sound too much of a trick to outsiders, especially when accounting for the score solution of Dawid and Musio. Overall, though, I liked the outcome of a coherence reference solution for linear models that could clearly be used as a default in this setting, esp. given the availability of an R package called BayesVarSel. (Even if I also like our simpler solution developped in the incoming edition of Bayesian Core, also available in the bayess R package!) In his discussion, Guido Consonni highlighted the philosophical problem of considering “common paramaters”, a perspective I completely subscribe to, even though I think all that matters is the justification of having a common prior over formally equivalent parameters, even though this may sound like a pedantic distinction to many!

Due to a meeting of the scientific committee of the incoming O’Bayes 2013 meeting (in Duke, December, more about this soon!), whose most members were attending this workshop, I missed the beginning of Alan Aggresti’s talk and could not catch up with the central problem he was addressing (the pianist on the street outside started pounding on his instrument as if intent to break it apart!). A pity as problems with contingency tables are certainly of interest to me… By the end of Alan’s talk, I wished someone would shoot the pianist playing outside (even though he was reasonably gifted) as I had gotten a major headache from his background noise. Following Noel Cressie’s talk proved just as difficult, although I could see his point in comparing very diverse predictors for big Data problems without much of a model structure and even less of a  and I decided to call the day off, despite wishing to stay for Eduardo Gutiérrez-Pena’s talk on conjugate predictives and entropies which definitely interested me… Too bad really (blame the pianist!)