dynamic mixtures [at NBBC15]

Posted in R, Statistics with tags , , , , , , , , , , , , on June 18, 2015 by xi'an

A funny coincidence: as I was sitting next to Arnoldo Frigessi at the NBBC15 conference, I came upon a new question on Cross Validated about a dynamic mixture model he had developed in 2002 with Olga Haug and Håvård Rue [whom I also saw last week in Valencià]. The dynamic mixture model they proposed replaces the standard weights in the mixture with cumulative distribution functions, hence the term dynamic. Here is the version used in their paper (x>0)

$(1-w_{\mu,\tau}(x))f_{\beta,\lambda}(x)+w_{\mu,\tau}(x)g_{\epsilon,\sigma}(x)$

where f is a Weibull density, g a generalised Pareto density, and w is the cdf of a Cauchy distribution [all distributions being endowed with standard parameters]. While the above object is not a mixture of a generalised Pareto and of a Weibull distributions (instead, it is a mixture of two non-standard distributions with unknown weights), it is close to the Weibull when x is near zero and ends up with the Pareto tail (when x is large). The question was about simulating from this distribution and, while an answer was in the paper, I replied on Cross Validated with an alternative accept-reject proposal and with a somewhat (if mildly) non-standard MCMC implementation enjoying a much higher acceptance rate and the same fit.

An objective prior that unifies objective Bayes and information-based inference

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , on June 8, 2015 by xi'an

During the Valencia O’Bayes 2015 meeting, Colin LaMont and Paul Wiggins arxived a paper entitled “An objective prior that unifies objective Bayes and information-based inference”. It would have been interesting to have the authors in Valencia, as they make bold claims about their w-prior as being uniformly and maximally uninformative. Plus achieving this unification advertised in the title of the paper. Meaning that the free energy (log transform of the inverse evidence) is the Akaike information criterion.

The paper starts by defining a true prior distribution (presumably in analogy with the true value of the parameter?) and generalised posterior distributions as associated with any arbitrary prior. (Some notations are imprecise, check (3) with the wrong denominator or the predictivity that is supposed to cover N new observations on p.2…) It then introduces a discretisation by considering all models within a certain Kullback divergence δ to be undistinguishable. (A definition that does not account for the assymmetry of the Kullback divergence.) From there, it most surprisingly [given the above discretisation] derives a density on the whole parameter space

$\pi(\theta) \propto \text{det} I(\theta)^{1/2} (N/2\pi \delta)^{K/2}$

where N is the number of observations and K the dimension of θ. Dimension which may vary. The dependence on N of the above is a result of using the predictive on N points instead of one. The w-prior is however defined differently: “as the density of indistinguishable models such that the multiplicity is unity for all true models”. Where the log transform of the multiplicity is the expected log marginal likelihood minus the expected log predictive [all expectations under the sampling distributions, conditional on θ]. Rather puzzling in that it involves the “true” value of the parameter—another notational imprecision, since it has to hold for all θ’s—as well as possibly improper priors. When the prior is improper, the log-multiplicity is a difference of two terms such that the first term depends on the constant used with the improper prior, while the second one does not…  Unless the multiplicity constraint also determines the normalising constant?! But this does not seem to be the case when considering the following section on normalising the w-prior. Mentioning a “cutoff” for the integration that seems to pop out of nowhere. Curiouser and curiouser. Due to this unclear handling of infinite mass priors, and since the claimed properties of uniform and maximal uninformativeness are not established in any formal way, and since the existence of a non-asymptotic solution to the multiplicity equation is neither demonstrated, I quickly lost interest in the paper. Which does not contain any worked out example. Read at your own risk!

O-Bayes15 [day #1]

Posted in Books, pictures, Running, Statistics, Travel, University life, Wines with tags , , , , , , on June 3, 2015 by xi'an

So here we are back together to talk about objective Bayes methods, and in the City of Valencià as well.! A move back to a city where the 1998 O’Bayes took place. In contrast with my introductory tutorial, the morning tutorials by Luis Pericchi and Judith Rousseau were investigating fairly technical and advanced, Judith looking at the tools used in the frequentist (Bernstein-von Mises) analysis of priors, with forays in empirical Bayes, giving insights into a wide range of recent papers in the field. And Luis covering works on Bayesian robustness in the sense of resisting to over-influential observations. Following works of him and of Tony O’Hagan and coauthors. Which means characterising tails of prior versus sampling distribution to allow for the posterior reverting to the prior in case of over-influential datapoints. Funny enough, after a great opening by Carmen and Ed remembering Susie, Chris Holmes also covered Bayesian robust analysis. More in the sense of incompletely or mis-  specified models. (On the side, rekindling one comment by Susie and the need to embed robust Bayesian analysis within decision theory.) Which was also much Chris’ point, in line with the recent Watson and Holmes’ paper. Dan Simpson in his usual kick-the-anthill-real-hard-and-set-fire-to-it discussion pointed out the possible discrepancy between objective and robust Bayesian analysis. (With lines like “modern statistics has proven disruptive to objective Bayes”.) Which is not that obvious because the robust approach simply reincorporates the decision theory within the objective framework. (Dan also concluded with the comic strip below, whose message can be interpreted in many ways…! Or not.)

The second talk of the afternoon was given by Veronika Ročková on a novel type of spike-and-slab prior to handle sparse regression, bringing an alternative to the standard Lasso. The prior is a mixture of two Laplace priors whose scales are constrained in connection with the actual number of non-zero coefficients. I had not heard of this approach before (although Veronika and Ed have an earlier paper on a spike-and-slab prior to handle multicolinearity that Veronika presented in Boston last year) and I was quite impressed by the combination of minimax properties and practical determination of the scales. As well as by the performances of this spike-and-slab Lasso. I am looking forward the incoming paper!

The day ended most nicely in the botanical gardens of the University of Valencià, with an outdoor reception surrounded by palm trees and parakeet cries…

O’Bayes 2015: back in València

Posted in pictures, Statistics, Travel, University life with tags , , , , , on September 11, 2014 by xi'an

The next O’Bayes meeting (more precisely the International Workshop on Objective Bayes Methodology, O-Bayes15), will take place in València, Spain, on June 1-4, 2015. This is the second time an O’Bayes conference takes place in València, after the one José Miguel Bernardo organised in 1998 there.  The principal objectives of O-Bayes15 will be to facilitate the exchange of recent research developments in objective Bayes theory, methodology and applications, and related topics (like limited information Bayesian statistics), to provide opportunities for new researchers, and to establish new collaborations and partnerships. Most importantly, O-Bayes15 will be dedicated to our friend Susie Bayarri, to celebrate her life and contributions to Bayesian Statistics. Check the webpage of O-Bayes15 for the program (under construction) and the practical details. Looking forward to the meeting and hopeful for a broadening of the basis of the O’Bayes community and of its scope!

Cancun, ISBA 2014 [½ day #2]

Posted in pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , on July 19, 2014 by xi'an

Half-day #2 indeed at ISBA 2014, as the Wednesday afternoon kept to the Valencia tradition of free time, and potential cultural excursions, so there were only talks in the morning. And still the core poster session at (late) night. In which my student Kaniav Kamari presented a poster on a current project we are running with Kerrie Mengersen and Judith Rousseau on the replacement of the standard Bayesian testing setting with a mixture representation. Being half-asleep by the time the session started, I did not stay long enough to collect data on the reactions to this proposal, but the paper should be arXived pretty soon. And Kate Lee gave a poster on our importance sampler for evidence approximation in mixtures (soon to be revised!). There was also an interesting poster about reparameterisation towards higher efficiency of MCMC algorithms, intersecting with my long-going interest in the matter, although I cannot find a mention of it in the abstracts. And I had a nice talk with Eduardo Gutierrez-Pena about infering on credible intervals through loss functions. There were also a couple of appealing posters on g-priors. Except I was sleepwalking by the time I spotted them… (My conference sleeping pattern does not work that well for ISBA meetings! Thankfully, both next editions will be in Europe.)

Great talk by Steve McEachern that linked to our ABC work on Bayesian model choice with insufficient statistics, arguing towards robustification of Bayesian inference by only using summary statistics. Despite this being “against the hubris of Bayes”… Obviously, the talk just gave a flavour of Steve’s perspective on that topic and I hope I can read more to see how we agree (or not!) on this notion of using insufficient summaries to conduct inference rather than trying to model “the whole world”, given the mistrust we must preserve about models and likelihoods. And another great talk by Ioanna Manolopoulou on another of my pet topics, capture-recapture, although she phrased it as a partly identified model (as in Kline’s talk yesterday). This related with capture-recapture in that when estimating a capture-recapture model with covariates, sampling and inference are biased as well. I appreciated particularly the use of BART to analyse the bias in the modelling. And the talk provided a nice counterpoint to the rather pessimistic approach of Kline’s.

Terrific plenary sessions as well, from Wilke’s spatio-temporal models (in the spirit of his superb book with Noel Cressie) to Igor Prunster’s great entry on Gibbs process priors. With the highly significant conclusion that those processes are best suited for (in the sense that they are only consistent for) discrete support distributions. Alternatives are to be used for continuous support distributions, the special case of a Dirichlet prior constituting a sort of unique counter-example. Quite an inspiring talk (even though I had a few micro-naps throughout it!).

I shared my afternoon free time between discussing the next O’Bayes meeting (2015 is getting very close!) with friends from the Objective Bayes section, getting a quick look at the Museo Maya de Cancún (terrific building!), and getting some work done (thanks to the lack of wireless…)

Cancún, ISBA 2014 [day #0]

Posted in Statistics, Travel, University life with tags , , , , , , , , on July 17, 2014 by xi'an

Day zero at ISBA 2014! The relentless heat outside (making running an ordeal, even at 5:30am…) made the (air-conditioned) conference centre the more attractive. Jean-Michel Marin and I had a great morning teaching our ABC short course and we do hope the ABC class audience had one as well. Teaching in pair is much more enjoyable than single as we can interact with one another as well as the audience. And realising unsuspected difficulties with the material is much easier this way, as the (mostly) passive instructor can spot the class’ reactions. This reminded me of the course we taught together in Oulu, northern Finland, in 2004 and that ended as the Bayesian Core. We did not cover the entire material we have prepared for this short course, but I think the pace was the right one. (Just tell me otherwise if you were there!) This was also the only time I had given a course wearing sunglasses, thanks to yesterday’s incident!

Waiting for a Spanish speaking friend to kindly drive with me downtown Cancún to check whether or not an optician could make me new prescription glasses, I attended Jim Berger’s foundational lecture on frequentist properties of Bayesian procedures but could only listen as the slides were impossible for me to read, with or without glasses. The partial overlap with the Varanasi lecture helped. I alas had to skip both Gareth Roberts’ and Sylvia Früwirth-Schnatter’s lectures, apologies to both of them!, but the reward was to get a new pair of prescription glasses within a few hours. Perfectly suited to my vision! And to get back just in time to read slides during Peter Müller’s lecture from the back row! Thanks to my friend Sophie for her negotiating skills! Actually, I am still amazed at getting glasses that quickly, given the time it would have taken in, e.g., France. All set for another 15 years with the same pair?! Only if I do not go swimming with them in anything but a quiet swimming pool!

The starting dinner happened to coincide with the (second) ISBA Fellow Award ceremony. Jim acted as the grand master of ceremony and he did great to add life and side stories to the written nominations for each and everyone of the new Fellows. The Fellowships honoured Bayesian statisticians who had contributed to the field as researchers and to the society since its creation. I thus feel very honoured (and absolutely undeserving) to be included in this prestigious list, along with many friends.  (But would have loved to see two more former ISBA presidents included, esp. for their massive contribution to Bayesian theory and methodology…) And also glad to wear regular glasses instead of my morning sunglasses.

[My Internet connection during the meeting being abysmally poor, the posts will appear with some major delay! In particular, I cannot include new pictures at times I get a connection… Hence a picture of northern Finland instead of Cancún at the top of this post!]

Jeffreys prior with improper posterior

Posted in Books, Statistics, University life with tags , , , , , , , , , , on May 12, 2014 by xi'an

In a complete coincidence with my visit to Warwick this week, I became aware of the paper “Inference in two-piece location-scale models with Jeffreys priors” recently published in Bayesian Analysis by Francisco Rubio and Mark Steel, both from Warwick. Paper where they exhibit a closed-form Jeffreys prior for the skewed distribution

$\dfrac{2\epsilon}{\sigma_1}f(\{x-\mu\}/\sigma_1)\mathbb{I}_{x<\mu}+\dfrac{2(1-\epsilon)}{\sigma_2}f(\{x-\mu\}/\sigma_2) \mathbb{I}_{x>\mu}$

where f is a symmetric density, namely

$\pi(\mu,\sigma_1,\sigma_2) \propto 1 \big/ \sigma_1\sigma_2\{\sigma_1+\sigma_2\}\,,$

where

$\epsilon=\sigma_1/\{\sigma_1+\sigma_2\}\,.$

only to show  immediately after that this prior does not allow for a proper posterior, no matter what the sample size is. While the above skewed distribution can always be interpreted as a mixture, being a weighted sum of two terms, it is not strictly speaking a mixture, if only because the “component” can be identified from the observation (depending on which side of μ is stands). The likelihood is therefore a product of simple terms rather than a product of a sum of two terms.

As a solution to this conundrum, the authors consider the alternative of the “independent Jeffreys priors”, which are made of a product of conditional Jeffreys priors, i.e., by computing the Jeffreys prior one parameter at a time with all other parameters considered to be fixed. Which differs from the reference prior, of course, but would have been my second choice as well. Despite criticisms expressed by José Bernardo in the discussion of the paper… The difficulty (in my opinion) resides in the choice (and difficulty) of the parameterisation of the model, since those priors are not parameterisation-invariant. (Xinyi Xu makes the important comment that even those priors incorporate strong if hidden information. Which relates to our earlier discussion with Kaniav Kamari on the “dangers” of prior modelling.)

Although the outcome is puzzling, I remain just slightly sceptical of the income, namely Jeffreys prior and the corresponding Fisher information: the fact that the density involves an indicator function and is thus discontinuous in the location μ at the observation x makes the likelihood function not differentiable and hence the derivation of the Fisher information not strictly valid. Since the indicator part cannot be differentiated. Not that I am seeing the Jeffreys prior as the ultimate grail for non-informative priors, far from it, but there is definitely something specific in the discontinuity in the density. (In connection with the later point, Weiss and Suchard deliver a highly critical commentary on the non-need for reference priors and the preference given to a non-parametric Bayes primary analysis. Maybe making the point towards a greater convergence of the two perspectives, objective Bayes and non-parametric Bayes.)

This paper and the ensuing discussion about the properness of the Jeffreys posterior reminded me of our earliest paper on the topic with Jean Diebolt. Where we used improper priors on location and scale parameters but prohibited allocations (in the Gibbs sampler) that would lead to less than two observations per components, thereby ensuring that the (truncated) posterior was well-defined. (This feature also remained in the Series B paper, submitted at the same time, namely mid-1990, but only published in 1994!)  Larry Wasserman proved ten years later that this truncation led to consistent estimators, but I had not thought about it in very long while. I still like this notion of forcing some (enough) datapoints into each component for an allocation (of the latent indicator variables) to be an acceptable Gibbs move. This is obviously not compatible with the iid representation of a mixture model, but it expresses the requirement that components all have a meaning in terms of the data, namely that all components contributed to generating a part of the data. This translates as a form of weak prior information on how much we trust the model and how meaningful each component is (in opposition to adding meaningless extra-components with almost zero weights or almost identical parameters).

As a marginalia, the insistence in Rubio and Steel’s paper that all observations in the sample be different also reminded me of a discussion I wrote for one of the Valencia proceedings (Valencia 6 in 1998) where Mark presented a paper with Carmen Fernández on this issue of handling duplicated observations modelled by absolutely continuous distributions. (I am afraid my discussion is not worth the \$250 price tag given by amazon!)