Archive for objective Bayes

probabilistic numerics and uncertainty in computations

Posted in Books, pictures, Statistics, University life with tags , , , , , , on June 10, 2015 by xi'an

“We deliver a call to arms for probabilistic numerical methods: algorithms for numerical tasks, including linear algebra, integration, optimization and solving differential equations, that return uncertainties in their calculations.” (p.1)

Philipp Hennig, Michael Osborne and Mark Girolami (Warwick) posted on arXiv a paper to appear in Proceedings A of the Royal Statistical Society that relates to the probabilistic numerics workshop they organised in Warwick with Chris Oates two months ago. The paper is both a survey and a tribune about the related questions the authors find of most interest. The overall perspective is proceeding along Persi Diaconis’ call for a principled Bayesian approach to numerical problems. One interesting argument made from the start of the paper is that numerical methods can be seen as inferential rules, in that a numerical approximation of a deterministic quantity like an integral can be interpreted as an estimate, even as a Bayes estimate if a prior is used on the space of integrals. I am always uncertain about this perspective, as for instance illustrated in the post about the missing constant in Larry Wasserman’s paradox. The approximation may look formally the same as an estimate, but there is a design aspect that is almost always attached to numerical approximations and rarely analysed as such. Not mentioning the somewhat philosophical issue that the integral itself is a constant with no uncertainty (while a statistical model should always entertain the notion that a model can be mis-specified). The distinction explains why there is a zero variance importance sampling estimator, while there is no uniformly zero variance estimator in most parametric models. At a possibly deeper level, the debate that still invades the use of Bayesian inference to solve statistical problems would most likely resurface in numerics, in that the significance of a probability statement surrounding a mathematical quantity can only be epistemic and relate to the knowledge (or lack thereof) about this quantity rather than to the quantity itself.

“(…) formulating quadrature as probabilistic regression precisely captures a trade-off between prior assumptions inherent in a computation and the computational effort required in that computation to achieve a certain precision. Computational rules arising from a strongly constrained hypothesis class can perform much better than less restrictive rules if the prior assumptions are valid.” (p.7)

Another general worry [repeating myself] about setting a prior in those functional spaces is that the posterior may then mostly reflect the choice of the prior rather than the information contained in the “data”. The above quote mentions prior assumptions that seem hard to build from prior opinion about the functional of interest. And even less about the function itself. Coming back from a gathering of “objective Bayesians“, it seems equally hard to agree upon a reference prior. However, since I like the alternative notion of using decision theory in conjunction with probabilistic numerics, it seems hard to object to the use of priors, given the “invariance” of prior x loss… But I would like to understand better how it is possible to check for prior assumption (p.7) without using the data. Or maybe it does not matter so much in this setting? Unlikely, as indicated in the remarks about the bias resulting from the active design (p.13).

A last issue I find related to the exploratory side of the paper is the “big world versus small worlds” debate, namely whether we can use the Bayesian approach to solve a sequence of small problems rather than trying to solve the big problem all at once. Which forces us to model the entirety of unknowns. And almost certainly fail. (This is was the point of the Robbins-Wasserman counterexample.) Adopting a sequence of solutions may be construed as incoherent in that the prior distribution is adapted to the problem rather than encompassing all problems. Although this would not shock the proponents of reference priors.

An objective prior that unifies objective Bayes and information-based inference

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , on June 8, 2015 by xi'an

vale9During the Valencia O’Bayes 2015 meeting, Colin LaMont and Paul Wiggins arxived a paper entitled “An objective prior that unifies objective Bayes and information-based inference”. It would have been interesting to have the authors in Valencia, as they make bold claims about their w-prior as being uniformly and maximally uninformative. Plus achieving this unification advertised in the title of the paper. Meaning that the free energy (log transform of the inverse evidence) is the Akaike information criterion.

The paper starts by defining a true prior distribution (presumably in analogy with the true value of the parameter?) and generalised posterior distributions as associated with any arbitrary prior. (Some notations are imprecise, check (3) with the wrong denominator or the predictivity that is supposed to cover N new observations on p.2…) It then introduces a discretisation by considering all models within a certain Kullback divergence δ to be undistinguishable. (A definition that does not account for the assymmetry of the Kullback divergence.) From there, it most surprisingly [given the above discretisation] derives a density on the whole parameter space

\pi(\theta) \propto \text{det} I(\theta)^{1/2} (N/2\pi \delta)^{K/2}

where N is the number of observations and K the dimension of θ. Dimension which may vary. The dependence on N of the above is a result of using the predictive on N points instead of one. The w-prior is however defined differently: “as the density of indistinguishable models such that the multiplicity is unity for all true models”. Where the log transform of the multiplicity is the expected log marginal likelihood minus the expected log predictive [all expectations under the sampling distributions, conditional on θ]. Rather puzzling in that it involves the “true” value of the parameter—another notational imprecision, since it has to hold for all θ’s—as well as possibly improper priors. When the prior is improper, the log-multiplicity is a difference of two terms such that the first term depends on the constant used with the improper prior, while the second one does not…  Unless the multiplicity constraint also determines the normalising constant?! But this does not seem to be the case when considering the following section on normalising the w-prior. Mentioning a “cutoff” for the integration that seems to pop out of nowhere. Curiouser and curiouser. Due to this unclear handling of infinite mass priors, and since the claimed properties of uniform and maximal uninformativeness are not established in any formal way, and since the existence of a non-asymptotic solution to the multiplicity equation is neither demonstrated, I quickly lost interest in the paper. Which does not contain any worked out example. Read at your own risk!

O-Bayes15 [day #1]

Posted in Books, pictures, Running, Statistics, Travel, University life, Wines with tags , , , , , , on June 3, 2015 by xi'an

vale3So here we are back together to talk about objective Bayes methods, and in the City of Valencià as well.! A move back to a city where the 1998 O’Bayes took place. In contrast with my introductory tutorial, the morning tutorials by Luis Pericchi and Judith Rousseau were investigating fairly technical and advanced, Judith looking at the tools used in the frequentist (Bernstein-von Mises) analysis of priors, with forays in empirical Bayes, giving insights into a wide range of recent papers in the field. And Luis covering works on Bayesian robustness in the sense of resisting to over-influential observations. Following works of him and of Tony O’Hagan and coauthors. Which means characterising tails of prior versus sampling distribution to allow for the posterior reverting to the prior in case of over-influential datapoints. Funny enough, after a great opening by Carmen and Ed remembering Susie, Chris Holmes also covered Bayesian robust analysis. More in the sense of incompletely or mis-  specified models. (On the side, rekindling one comment by Susie and the need to embed robust Bayesian analysis within decision theory.) Which was also much Chris’ point, in line with the recent Watson and Holmes’ paper. Dan Simpson in his usual kick-the-anthill-real-hard-and-set-fire-to-it discussion pointed out the possible discrepancy between objective and robust Bayesian analysis. (With lines like “modern statistics has proven disruptive to objective Bayes”.) Which is not that obvious because the robust approach simply reincorporates the decision theory within the objective framework. (Dan also concluded with the comic strip below, whose message can be interpreted in many ways…! Or not.)

The second talk of the afternoon was given by Veronika Ročková on a novel type of spike-and-slab prior to handle sparse regression, bringing an alternative to the standard Lasso. The prior is a mixture of two Laplace priors whose scales are constrained in connection with the actual number of non-zero coefficients. I had not heard of this approach before (although Veronika and Ed have an earlier paper on a spike-and-slab prior to handle multicolinearity that Veronika presented in Boston last year) and I was quite impressed by the combination of minimax properties and practical determination of the scales. As well as by the performances of this spike-and-slab Lasso. I am looking forward the incoming paper!

The day ended most nicely in the botanical gardens of the University of Valencià, with an outdoor reception surrounded by palm trees and parakeet cries…

Bayesian propaganda?

Posted in Books, Kids, pictures, Statistics, University life with tags , , , , , , , , , on April 20, 2015 by xi'an

“The question is about frequentist approach. Bayesian is admissable [sic] only by wrong definition as it starts with the assumption that the prior is the correct pre-information. James-Stein beats OLS without assumptions. If there is an admissable [sic] frequentist estimator then it will correspond to a true objective prior.”

I had a wee bit of a (minor, very minor!) communication problem on X validated, about a question on the existence of admissible estimators of the linear regression coefficient in multiple dimensions, under squared error loss. When I first replied that all Bayes estimators with finite risk were de facto admissible, I got the above reply, which clearly misses the point, and as I had edited the OP question to include more tags, the edited version was reverted with a comment about Bayesian propaganda! This is rather funny, if not hilarious, as (a) Bayes estimators are indeed admissible in the classical or frequentist sense—I actually fail to see a definition of admissibility in the Bayesian sense—and (b) the complete class theorems of Wald, Stein, and others (like Jack Kiefer, Larry Brown, and Jim Berger) come from the frequentist quest for best estimator(s). To make my point clearer, I also reproduced in my answer the Stein’s necessary and sufficient condition for admissibility from my book but it did not help, as the theorem was “too complex for [the OP] to understand”, which shows in fine the point of reading textbooks!

a Nice talk

Posted in Books, Statistics, Travel, University life with tags , , , , , , , , on February 20, 2015 by xi'an

Today, I give a talk on our testing paper in Nice, in a workshop run in connection with our Calibration ANR grant:

The slides are directly extracted from the paper but it still took me quite a while to translate the paper into those, during the early hours of our Czech break this week.

One added perk of travelling to Nice is the flight there, as it parallels the entire French Alps, a terrific view in nice weather!

O’Bayes 2015: back in València

Posted in pictures, Statistics, Travel, University life with tags , , , , , on September 11, 2014 by xi'an

The next O’Bayes meeting (more precisely the International Workshop on Objective Bayes Methodology, O-Bayes15), will take place in València, Spain, on June 1-4, 2015. This is the second time an O’Bayes conference takes place in València, after the one José Miguel Bernardo organised in 1998 there.  The principal objectives of O-Bayes15 will be to facilitate the exchange of recent research developments in objective Bayes theory, methodology and applications, and related topics (like limited information Bayesian statistics), to provide opportunities for new researchers, and to establish new collaborations and partnerships. Most importantly, O-Bayes15 will be dedicated to our friend Susie Bayarri, to celebrate her life and contributions to Bayesian Statistics. Check the webpage of O-Bayes15 for the program (under construction) and the practical details. Looking forward to the meeting and hopeful for a broadening of the basis of the O’Bayes community and of its scope!

JSM 2014, Boston [#3]

Posted in Statistics, University life with tags , , , , , , , on August 8, 2014 by xi'an

Today I gave a talk in the Advances in model selection session. Organised by Veronika Rockova and Ed George. (A bit of pre-talk stress: I actually attempted to change my slides at 5am and only managed to erase the current version! I thus left early enough to stop by the presentation room…) Here are the final slides, which have much in common with earlier versions, but also borrowed from Jean-Michel Marin’s talk in Cambridge. A posteriori, I think the talk missed one slide on the practical run of the ABC random forest algorithm, since later questions showed miscomprehension from the audience.

The other talks in this session were by Andreas Buja [whom I last met in Budapest last year] on valid post-modelling inference. A very relevant reflection on the fundamental bias in statistical modelling. Then by Nick Polson, about efficient ways to compute MAP for objective functions that are irregular.  Great entry into optimisation methods I had never heard of earlier.! (The abstract is unrelated.) And last but not least by Veronika Rockova, on mixing Indian buffet processes with spike-and-slab priors for factor analysis with unknown numbers of factors. A definitely advanced contribution to factor analysis, with a very nice idea of introducing a non-identifiable rotation to align on orthogonal designs. (Here too the abstract is unrelated, a side effect of the ASA requiring abstracts sent very long in advance.)

Although discussions lasted well into the following Bayesian Inference: Theory and Foundations session, I managed to listen to a few talks there. In particular, a talk by Keli Liu on constructing non-informative priors. A question of direct relevance. The notion of objectivity is to achieve a frequentist distribution of the Bayes factor associated with the point null that is constant. Or has a constant quantile at a given level. The second talk by Alexandra Bolotskikh related to older interests of mine’s, namely the construction of improved confidence regions in the spirit of Stein. (Not that surprising, given that a coauthor is Marty Wells, who worked with George and I on the topic.) A third talk by Abhishek Pal Majumder (jointly with Jan Hanning) dealt on a new type of fiducial distributions, with matching prior properties. This sentence popped a lot over the past days, but this is yet another area where I remain puzzled by the very notion. I mean the notion of fiducial distribution. Esp. in this case where the matching prior gets even closer to being plain Bayesian.


Get every new post delivered to your Inbox.

Join 878 other followers