Archive for Read paper

No review this summer

Posted in Books, Statistics, University life with tags , , , , , , , , on September 19, 2019 by xi'an

A recent editorial in Nature was a declaration by a biologist from UCL on her refusal to accept refereeing requests during the summer (or was it the summer break), which was motivated by a need to reconnect with her son. Which is a good enough reason (!), but reflects sadly on the increasing pressure on one’s schedule to juggle teaching, research, administration, grant hunting, society service, along with a balanced enough family life. (Although I have been rather privileged in this regard!) Given that refereeing or journal editing is neither visible nor rewarded, it comes as the first task to be postponed or abandoned, even though most of us realise it is essential to keep science working as a whole and to make our own papers published. I have actually noticed an increasing difficulty in the past decade to get (good) referees to accept new reviews, often asking for deadlines that are hurting the authors, like six months. Making them practically unavailable. As I mentioned earlier on this blog, it could be that publishing referees’ reports as discussions would help, since they would become recognised as (unreviewed!) publications, but it is unclear this is the solution. If judging from the similar difficulty in getting discussions for discussed papers. (As an aside, there are two exciting papers coming up for discussion in Series B, ‘Unbiased Markov chain Monte Carlo methods with couplings’ by  Pierre E. Jacob, John O’Leary and Yves F. Atchadé and in Bayesian Analysis, Latent nested nonparametric priors by Frederico Camerlenghi, David Dunson, Antonio Lijoi, Igor Prünster, and Abel Rodríguez). Which is surprising when considering the willingness of a part of the community to engage into forii discussions, sometimes of a considerable length as illustrated on Andrew’s blog.

Another entry in Nature mentioned the case of two University of København tenured professors in geology who were fired for either using a private email address (?!) or being away on field work during an exam and at a conference without permission from the administration. Which does not even remotely sound like a faulty behaviour to me or else I would have been fired eons ago..!

O’Bayes 19/2

Posted in Books, pictures, Running, Travel, University life with tags , , , , , , , , , , , , , , , , , on July 1, 2019 by xi'an

One talk on Day 2 of O’Bayes 2019 was by Ryan Martin on data dependent priors (or “priors”). Which I have already discussed in this blog. Including the notion of a Gibbs posterior about quantities that “are not always defined through a model” [which is debatable if one sees it like part of a semi-parametric model]. Gibbs posterior that is built through a pseudo-likelihood constructed from the empirical risk, which reminds me of Bissiri, Holmes and Walker. Although requiring a prior on this quantity that is  not part of a model. And is not necessarily a true posterior and not necessarily with the same concentration rate as a true posterior. Constructing a data-dependent distribution on the parameter does not necessarily mean an interesting inference and to keep up with the theme of the conference has no automated claim to [more] “objectivity”.

And after calling a prior both Beauty and The Beast!, Erlis Ruli argued about a “bias-reduction” prior where the prior is solution to a differential equation related with some cumulants, connected with an earlier work of David Firth (Warwick).  An interesting conundrum is how to create an MCMC algorithm when the prior is that intractable, with a possible help from PDMP techniques like the Zig-Zag sampler.

While Peter Orbanz’ talk was centred on a central limit theorem under group invariance, further penalised by being the last of the (sun) day, Peter did a magnificent job of presenting the result and motivating each term. It reminded me of the work Jim Bondar was doing in Ottawa in the 1980’s on Haar measures for Bayesian inference. Including the notion of amenability [a term due to von Neumann] I had not met since then. (Neither have I met Jim since the last summer I spent in Carleton.) The CLT and associated LLN are remarkable in that the average is not over observations but over shifts of the same observation under elements of a sub-group of transformations. I wondered as well at the potential connection with the Read Paper of Kong et al. in 2003 on the use of group averaging for Monte Carlo integration [connection apart from the fact that both discussants, Michael Evans and myself, are present at this conference].

mixture modelling for testing hypotheses

Posted in Books, Statistics, University life with tags , , , , , , , , , , on January 4, 2019 by xi'an

After a fairly long delay (since the first version was posted and submitted in December 2014), we eventually revised and resubmitted our paper with Kaniav Kamary [who has now graduated], Kerrie Mengersen, and Judith Rousseau on the final day of 2018. The main reason for this massive delay is mine’s, as I got fairly depressed by the general tone of the dozen of reviews we received after submitting the paper as a Read Paper in the Journal of the Royal Statistical Society. Despite a rather opposite reaction from the community (an admittedly biased sample!) including two dozens of citations in other papers. (There seems to be a pattern in my submissions of Read Papers, witness our earlier and unsuccessful attempt with Christophe Andrieu in the early 2000’s with the paper on controlled MCMC, leading to 121 citations so far according to G scholar.) Anyway, thanks to my co-authors keeping up the fight!, we started working on a revision including stronger convergence results, managing to show that the approach leads to an optimal separation rate, contrary to the Bayes factor which has an extra √log(n) factor. This may sound paradoxical since, while the Bayes factor  converges to 0 under the alternative model exponentially quickly, the convergence rate of the mixture weight α to 1 is of order 1/√n, but this does not mean that the separation rate of the procedure based on the mixture model is worse than that of the Bayes factor. On the contrary, while it is well known that the Bayes factor leads to a separation rate of order √log(n) in parametric models, we show that our approach can lead to a testing procedure with a better separation rate of order 1/√n. We also studied a non-parametric setting where the null is a specified family of distributions (e.g., Gaussians) and the alternative is a Dirichlet process mixture. Establishing that the posterior distribution concentrates around the null at the rate √log(n)/√n. We thus resubmitted the paper for publication, although not as a Read Paper, with hopefully more luck this time!

London snapshot [jatp]

Posted in pictures, Running, Statistics, Travel with tags , , , , , , on April 13, 2017 by xi'an

beyond objectivity, subjectivity, and other ‘bjectivities

Posted in Statistics with tags , , , , , , , , , , , , , on April 12, 2017 by xi'an

Here is my discussion of Gelman and Hennig at the Royal Statistical Society, which I am about to deliver!

objective and subjective RSS Read Paper next week

Posted in Books, pictures, Statistics, Travel, University life, Wines with tags , , , , , , , , , , , , , , on April 5, 2017 by xi'an

Andrew Gelman and Christian Hennig will give a Read Paper presentation next Wednesday, April 12, 5pm, at the Royal Statistical Society, London, on their paper “Beyond subjective and objective in statistics“. Which I hope to attend and else to write a discussion. Since the discussion (to published in Series A) is open to everyone, I strongly encourage ‘Og’s readers to take a look at the paper and the “radical” views therein to hopefully contribute to this discussion. Either as a written discussion or as comments on this very post.

a Bayesian criterion for singular models [discussion]

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , , , on October 10, 2016 by xi'an

London Docks 12/02/09[Here is the discussion Judith Rousseau and I wrote about the paper by Mathias Drton and Martyn Plummer, a Bayesian criterion for singular models, which was discussed last week at the Royal Statistical Society. There is still time to send a written discussion! Note: This post was written using the latex2wp converter.]

It is a well-known fact that the BIC approximation of the marginal likelihood in a given irregular model {\mathcal M_k} fails or may fail. The BIC approximation has the form

\displaystyle BIC_k = \log p(\mathbf Y_n| \hat \pi_k, \mathcal M_k) - d_k \log n /2

where {d_k } corresponds on the number of parameters to be estimated in model {\mathcal M_k}. In irregular models the dimension {d_k} typically does not provide a good measure of complexity for model {\mathcal M_k}, at least in the sense that it does not lead to an approximation of

\displaystyle \log m(\mathbf Y_n |\mathcal M_k) = \log \left( \int_{\mathcal M_k} p(\mathbf Y_n| \pi_k, \mathcal M_k) dP(\pi_k|k )\right) \,.

A way to understand the behaviour of {\log m(\mathbf Y_n |\mathcal M_k) } is through the effective dimension

\displaystyle \tilde d_k = -\lim_n \frac{ \log P( \{ KL(p(\mathbf Y_n| \pi_0, \mathcal M_k) , p(\mathbf Y_n| \pi_k, \mathcal M_k) ) \leq 1/n | k ) }{ \log n}

when it exists, see for instance the discussions in Chambaz and Rousseau (2008) and Rousseau (2007). Watanabe (2009} provided a more precise formula, which is the starting point of the approach of Drton and Plummer:

\displaystyle \log m(\mathbf Y_n |\mathcal M_k) = \log p(\mathbf Y_n| \hat \pi_k, \mathcal M_k) - \lambda_k(\pi_0) \log n + [m_k(\pi_0) - 1] \log \log n + O_p(1)

where {\pi_0} is the true parameter. The authors propose a clever algorithm to approximate of the marginal likelihood. Given the popularity of the BIC criterion for model choice, obtaining a relevant penalized likelihood when the models are singular is an important issue and we congratulate the authors for it. Indeed a major advantage of the BIC formula is that it is an off-the-shelf crierion which is implemented in many softwares, thus can be used easily by non statisticians. In the context of singular models, a more refined approach needs to be considered and although the algorithm proposed by the authors remains quite simple, it requires that the functions { \lambda_k(\pi)} and {m_k(\pi)} need be known in advance, which so far limitates the number of problems that can be thus processed. In this regard their equation (3.2) is both puzzling and attractive. Attractive because it invokes nonparametric principles to estimate the underlying distribution; puzzling because why should we engage into deriving an approximation like (3.1) and call for Bayesian principles when (3.1) is at best an approximation. In this case why not just use a true marginal likelihood?

1. Why do we want to use a BIC type formula?

The BIC formula can be viewed from a purely frequentist perspective, as an example of penalised likelihood. The difficulty then stands into choosing the penalty and a common view on these approaches is to choose the smallest possible penalty that still leads to consistency of the model choice procedure, since it then enjoys better separation rates. In this case a {\log \log n} penalty is sufficient, as proved in Gassiat et al. (2013). Now whether or not this is a desirable property is entirely debatable, and one might advocate that for a given sample size, if the data fits the smallest model (almost) equally well, then this model should be chosen. But unless one is specifying what equally well means, it does not add much to the debate. This also explains the popularity of the BIC formula (in regular models), since it approximates the marginal likelihood and thus benefits from the Bayesian justification of the measure of fit of a model for a given data set, often qualified of being a Bayesian Ockham’s razor. But then why should we not compute instead the marginal likelihood? Typical answers to this question that are in favour of BIC-type formula include: (1) BIC is supposingly easier to compute and (2) BIC does not call for a specification of the prior on the parameters within each model. Given that the latter is a difficult task and that the prior can be highly influential in non-regular models, this may sound like a good argument. However, it is only apparently so, since the only justification of BIC is purely asymptotic, namely, in such a regime the difficulties linked to the choice of the prior disappear. This is even more the case for the sBIC criterion, since it is only valid if the parameter space is compact. Then the impact of the prior becomes less of an issue as non informative priors can typically be used. With all due respect, the solution proposed by the authors, namely to use the posterior mean or the posterior mode to allow for non compact parameter spaces, does not seem to make sense in this regard since they depend on the prior. The same comments apply to the author’s discussion on Prior’s matter for sBIC. Indeed variations of the sBIC could be obtained by penalizing for bigger models via the prior on the weights, for instance as in Mengersen and Rousseau (2011) or by, considering repulsive priors as in Petralia et al. (20120, but then it becomes more meaningful to (again) directly compute the marginal likelihood. Remains (as an argument in its favour) the relative computational ease of use of sBIC, when compared with the marginal likelihood. This simplification is however achieved at the expense of requiring a deeper knowledge on the behaviour of the models and it therefore looses the off-the-shelf appeal of the BIC formula and the range of applications of the method, at least so far. Although the dependence of the approximation of {\log m(\mathbf Y_n |\mathcal M_k)} on {\mathcal M_j }, $latex {j \leq k} is strange, this does not seem crucial, since marginal likelihoods in themselves bring little information and they are only meaningful when compared to other marginal likelihoods. It becomes much more of an issue in the context of a large number of models.

2. Should we care so much about penalized or marginal likelihoods ?

Marginal or penalized likelihoods are exploratory tools in a statistical analysis, as one is trying to define a reasonable model to fit the data. An unpleasant feature of these tools is that they provide numbers which in themselves do not have much meaning and can only be used in comparison with others and without any notion of uncertainty attached to them. A somewhat richer approach of exploratory analysis is to interrogate the posterior distributions by either varying the priors or by varying the loss functions. The former has been proposed in van Havre et l. (2016) in mixture models using the prior tempering algorithm. The latter has been used for instance by Yau and Holmes (2013) for segmentation based on Hidden Markov models. Introducing a decision-analytic perspective in the construction of information criteria sounds to us like a reasonable requirement, especially when accounting for the current surge in studies of such aspects.

[Posted as arXiv:1610.02503]