## Archive for objective Bayes

Posted in Books, pictures, Statistics, Travel, University life, Wines with tags , , , , , , , , , , , , , , on April 5, 2017 by xi'an

Andrew Gelman and Christian Hennig will give a Read Paper presentation next Wednesday, April 12, 5pm, at the Royal Statistical Society, London, on their paper “Beyond subjective and objective in statistics“. Which I hope to attend and else to write a discussion. Since the discussion (to published in Series A) is open to everyone, I strongly encourage ‘Og’s readers to take a look at the paper and the “radical” views therein to hopefully contribute to this discussion. Either as a written discussion or as comments on this very post.

## Greek variations on power-expected-posterior priors

Posted in Books, Statistics, University life with tags , , , , , , on October 5, 2016 by xi'an

Dimitris Fouskakis, Ioannis Ntzoufras and Konstantinos Perrakis, from Athens, have just arXived a paper on power-expected-posterior priors. Just like the power prior and the expected-posterior prior, this approach aims at avoiding improper priors by the use of imaginary data, which distribution is itself the marginal against another prior. (In the papers I wrote on that topic with Juan Antonio Cano and Diego Salmerón, we used MCMC to figure out a fixed point for such priors.)

The current paper (which I only perused) studies properties of two versions of power-expected-posterior priors proposed in an earlier paper by the same authors. For the normal linear model. Using a posterior derived from an unormalised powered likelihood either (DR) integrated in the imaginary data against the prior predictive distribution of the reference model based on the powered likelihood, or (CR) integrated in the imaginary data against the prior predictive distribution of the reference model based on the actual likelihood. The baseline model being the G-prior with g=n². Both versions lead to a marginal likelihood that is similar to BIC and hence consistent. The DR version coincides with the original power-expected-posterior prior in the linear case. The CR version involves a change of covariance matrix. All in all, the CR version tends to favour less complex models, but is less parsimonious as a variable selection tool, which sounds a wee bit contradictory. Overall, I thus feel (possibly incorrectly) that the paper is more an appendix to the earlier paper than a paper in itself as I do not get in the end a clear impression of which method should be preferred.

## O’Bayes 2017 in Austin, Texas

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , on March 30, 2016 by xi'an

The next edition of the O’Bayes conference, O’Bayes 2017, will take place at the University of Texas in Austin, with the tentative dates of Dec. 10-13. Somehow making the connection with the previous O’Bayes in Valencià thanks to its Spanish history (even though, technically, Texas was French from 1684 till 1689!!!). With a local committee made of Lizhen Lin, Tom Shively, Carlos Carvalho & Peter Müller. Further details should emerge in the coming months, but keep this objective date in your calendars! (Note that NIPS 2017 will take place in Long Beach, CA, the week before.)

## objectivity in prior distributions for the multinomial model

Posted in Statistics, University life with tags , , , , , on March 17, 2016 by xi'an

Today, Danilo Alvares visiting from the Universitat de Valencià gave a talk at CREST about choosing a prior for the Multinomial distribution. Comparing different Dirichlet priors. In a sense this is an hopeless task, first because there is no reason to pick a particular prior unless one picks a very specific and a-Bayesian criterion to discriminate between priors, second because the multinomial is a weird distribution, hardly a distribution at all in that it results from grouping observations into classes, often based on the observations themselves. A construction that should be included within the choice of the prior maybe? But there lurks a danger of ending up with a data-dependent prior. My other remark about this problem is that, among the token priors, Perk’s prior using 1/k as its hyper-parameter [where k is the number of categories] is rather difficult to justify compared with 1/k² or 1/k³, except for aggregation consistency to some extent. And Laplace’s prior gets highly concentrated as the number of categories grows.

## beyond subjective and objective in Statistics

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , on August 28, 2015 by xi'an

“At the level of discourse, we would like to move beyond a subjective vs. objective shouting match.” (p.30)

This paper by Andrew Gelman and Christian Hennig calls for the abandonment of the terms objective and subjective in (not solely Bayesian) statistics. And argue that there is more than mere prior information and data to the construction of a statistical analysis. The paper is articulated as the authors’ proposal, followed by four application examples, then a survey of the philosophy of science perspectives on objectivity and subjectivity in statistics and other sciences, next to a study of the subjective and objective aspects of the mainstream statistical streams, concluding with a discussion on the implementation of the proposed move. Continue reading

## probabilistic numerics and uncertainty in computations

Posted in Books, pictures, Statistics, University life with tags , , , , , , on June 10, 2015 by xi'an

“We deliver a call to arms for probabilistic numerical methods: algorithms for numerical tasks, including linear algebra, integration, optimization and solving differential equations, that return uncertainties in their calculations.” (p.1)

“(…) formulating quadrature as probabilistic regression precisely captures a trade-off between prior assumptions inherent in a computation and the computational effort required in that computation to achieve a certain precision. Computational rules arising from a strongly constrained hypothesis class can perform much better than less restrictive rules if the prior assumptions are valid.” (p.7)

Another general worry [repeating myself] about setting a prior in those functional spaces is that the posterior may then mostly reflect the choice of the prior rather than the information contained in the “data”. The above quote mentions prior assumptions that seem hard to build from prior opinion about the functional of interest. And even less about the function itself. Coming back from a gathering of “objective Bayesians“, it seems equally hard to agree upon a reference prior. However, since I like the alternative notion of using decision theory in conjunction with probabilistic numerics, it seems hard to object to the use of priors, given the “invariance” of prior x loss… But I would like to understand better how it is possible to check for prior assumption (p.7) without using the data. Or maybe it does not matter so much in this setting? Unlikely, as indicated in the remarks about the bias resulting from the active design (p.13).

A last issue I find related to the exploratory side of the paper is the “big world versus small worlds” debate, namely whether we can use the Bayesian approach to solve a sequence of small problems rather than trying to solve the big problem all at once. Which forces us to model the entirety of unknowns. And almost certainly fail. (This is was the point of the Robbins-Wasserman counterexample.) Adopting a sequence of solutions may be construed as incoherent in that the prior distribution is adapted to the problem rather than encompassing all problems. Although this would not shock the proponents of reference priors.

## An objective prior that unifies objective Bayes and information-based inference

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , on June 8, 2015 by xi'an

During the Valencia O’Bayes 2015 meeting, Colin LaMont and Paul Wiggins arxived a paper entitled “An objective prior that unifies objective Bayes and information-based inference”. It would have been interesting to have the authors in Valencia, as they make bold claims about their w-prior as being uniformly and maximally uninformative. Plus achieving this unification advertised in the title of the paper. Meaning that the free energy (log transform of the inverse evidence) is the Akaike information criterion.

The paper starts by defining a true prior distribution (presumably in analogy with the true value of the parameter?) and generalised posterior distributions as associated with any arbitrary prior. (Some notations are imprecise, check (3) with the wrong denominator or the predictivity that is supposed to cover N new observations on p.2…) It then introduces a discretisation by considering all models within a certain Kullback divergence δ to be undistinguishable. (A definition that does not account for the assymmetry of the Kullback divergence.) From there, it most surprisingly [given the above discretisation] derives a density on the whole parameter space

$\pi(\theta) \propto \text{det} I(\theta)^{1/2} (N/2\pi \delta)^{K/2}$

where N is the number of observations and K the dimension of θ. Dimension which may vary. The dependence on N of the above is a result of using the predictive on N points instead of one. The w-prior is however defined differently: “as the density of indistinguishable models such that the multiplicity is unity for all true models”. Where the log transform of the multiplicity is the expected log marginal likelihood minus the expected log predictive [all expectations under the sampling distributions, conditional on θ]. Rather puzzling in that it involves the “true” value of the parameter—another notational imprecision, since it has to hold for all θ’s—as well as possibly improper priors. When the prior is improper, the log-multiplicity is a difference of two terms such that the first term depends on the constant used with the improper prior, while the second one does not…  Unless the multiplicity constraint also determines the normalising constant?! But this does not seem to be the case when considering the following section on normalising the w-prior. Mentioning a “cutoff” for the integration that seems to pop out of nowhere. Curiouser and curiouser. Due to this unclear handling of infinite mass priors, and since the claimed properties of uniform and maximal uninformativeness are not established in any formal way, and since the existence of a non-asymptotic solution to the multiplicity equation is neither demonstrated, I quickly lost interest in the paper. Which does not contain any worked out example. Read at your own risk!