Archive for Bayesian nonparametrics

noninformative Bayesian prior with a finite support

Posted in Statistics, University life with tags , , , , , , on December 4, 2018 by xi'an

A few days ago, Pierre Jacob pointed me to a PNAS paper published earlier this year on a form of noninformative Bayesian analysis by Henri Mattingly and coauthors. They consider a prior that “maximizes the mutual information between parameters and predictions”, which sounds very much like José Bernardo’s notion of reference priors. With the rather strange twist of having the prior depending on the data size m even they work under an iid assumption. Here information is defined as the difference between the entropy of the prior and the conditional entropy which is not precisely defined in the paper but looks like the expected [in the data x] Kullback-Leibler divergence between prior and posterior. (I have general issues with the paper in that I often find it hard to read for a lack of precision and of definition of the main notions.)

One highly specific (and puzzling to me) feature of the proposed priors is that they are supported by a finite number of atoms, which reminds me very much of the (minimax) least favourable priors over compact parameter spaces, as for instance in the iconic paper by Casella and Strawderman (1984). For the same mathematical reason that non-constant analytic functions must have separated maxima. This is conducted under the assumption and restriction of a compact parameter space, which must be chosen in most cases. somewhat arbitrarily and not without consequences. I can somehow relate to the notion that a finite support prior translates the limited precision in the estimation brought by a finite sample. In other words, given a sample size of m, there is a maximal precision one can hope for, producing further decimals being silly. Still, the fact that the support of the prior is fixed a priori, completely independently of the data, is both unavoidable (for the prior to be prior!) and very dependent on the choice of the compact set. I would certainly prefer to see a maximal degree of precision expressed a posteriori, meaning that the support would then depend on the data. And handling finite support posteriors is rather awkward in that many notions like confidence intervals do not make much sense in that setup. (Similarly, one could argue that Bayesian non-parametric procedures lead to estimates with a finite number of support points but these are determined based on the data, not a priori.)

Interestingly, the derivation of the “optimal” prior is operated by iterations where the next prior is the renormalised version of the current prior times the exponentiated Kullback-Leibler divergence, which is “guaranteed to converge to the global maximum” for a discretised parameter space. The authors acknowledge that the resolution is poorly suited to multidimensional settings and hence to complex models, and indeed the paper only covers a few toy examples of moderate and even humble dimensions.

Another difficulty with the paper is the absence of temporal consistency: since the prior depends on the sample size, the posterior for n i.i.d. observations is no longer the prior for the (n+1)th observation.

“Because it weights the irrelevant parameter volume, the Jeffreys prior has strong dependence on microscopic effects invisible to experiment”

I simply do not understand the above sentence that apparently counts as a criticism of Jeffreys (1939). And would appreciate anyone enlightening me! The paper goes into comparing priors through Bayes factors, which ignores the main difficulty of an automated solution such as Jeffreys priors in its inability to handle infinite parameter spaces by being almost invariably improper.

BNP12

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , on October 9, 2018 by xi'an

The next BNP (Bayesian nonparametric) conference is taking place in Oxford (UK), prior to the O’Bayes 2019 conference in Warwick, in June 24-28 and June 29-July 2, respectively. At this stage, the Scientific Committee of BNP12 invites submissions for possible contributed talks. The deadline for submitting a title/abstract is 15th December 2018. And the submission of applications for travel support closes on 15th December 2018. Currently, there are 35 awards that could be either travel awards or accommodation awards. The support is for junior researchers (students currently enrolled in a Dphil (PhD) programme or having graduated after 1st October 2015). The applicant agrees to present her/his work at the conference as a poster or oraly if awarded the travel support.

As for O’Bayes 2019, we are currently composing the programme, following the 20 years tradition of these O’Bayes meetings of having the Scientific Committee (Marilena Barbieri, Ed George, Brunero Liseo, Luis Pericchi, Judith Rousseau and myself) inviting about 25 speakers to present their recent work and 25 discussants to… discuss these works. With a first day of introductory tutorials to Bayes, O’Bayes and beyond. I (successfully) proposed this date and location to the O’Bayes board to take advantage of the nonparametric Bayes community present in the vicinity so that they could attend both meetings at limited cost and carbon impact.

yes, another potential satellite to ISBA 2018!

Posted in Statistics with tags , , , , , , , , , , on May 22, 2018 by xi'an

On July 2-4, 2018, there will be an ISBA sponsored workshop on Bayesian non-parametrics for signal and image processing, in Bordeaux, France. This is a wee bit further than Warwick (BAYsm) or Rennes (MCqMC), but still manageable from Edinburgh with direct flights (if on Ryanair). Deadline for free (yes, free!) registration is May 31.

BimPressioNs [BNP11]

Posted in Books, pictures, Statistics, Travel, University life, Wines with tags , , , , , , , , , on June 29, 2017 by xi'an

While my participation to BNP 11 has so far been more at the janitor level [although not gaining George Casella’s reputation on NPR!] than at the scientific one, since we had decided in favour of the least expensive and unstaffed option for coffee breaks, to keep the registration fees at a minimum [although I would have gladly gone all the way to removing all coffee breaks!, if only because such breaks produce much garbage], I had fairly good chats at the second poster session, in particular around empirical likelihoods and HMC for discrete parameters, the first one based on the general Cressie-Read formulation and the second around the recently arXived paper of Nishimura et al., which I wanted to read. Plus many other good chats full stop, around terrific cheese platters!

View this post on Instagram

Best conference spread ever

A post shared by Shane Jensen (@tastierkakes) on

This morning, the coffee breaks were much more under control and I managed to enjoy [and chair] the entire session on empirical likelihood, with absolutely fantastic talks from Nils Hjort and Art Owen (the third speaker having gone AWOL, possibly a direct consequence of Trump’s travel ban).

exciting week[s]

Posted in Mountains, pictures, Running, Statistics with tags , , , , , , , , , , , , , , on June 27, 2017 by xi'an

The past week was quite exciting, despite the heat wave that hit Paris and kept me from sleeping and running! First, I made a two-day visit to Jean-Michel Marin in Montpellier, where we discussed the potential Peer Community In Computational Statistics (PCI Comput Stats) with the people behind PCI Evol Biol at INRA, Hopefully taking shape in the coming months! And went one evening through a few vineyards in Saint Christol with Jean-Michel and Arnaud. Including a long chat with the owner of Domaine Coste Moynier. [Whose domain includes the above parcel with views of Pic Saint-Loup.] And last but not least! some work planning about approximate MCMC.

On top of this, we submitted our paper on ABC with Wasserstein distances [to be arXived in an extended version in the coming weeks], our revised paper on ABC consistency thanks to highly constructive and comments from the editorial board, which induced a much improved version in my opinion, and we received a very positive return from JCGS for our paper on weak priors for mixtures! Next week should be exciting as well, with BNP 11 taking place in downtown Paris, at École Normale!!!

Moment conditions and Bayesian nonparametrics

Posted in R, Statistics, University life with tags , , , , , , , , , , on August 6, 2015 by xi'an

Luke Bornn, Neil Shephard, and Reza Solgi (all from Harvard) have arXived a pretty interesting paper on simulating targets on a zero measure set. Although it is not initially presented this way, but rather in non-parametric terms as moment conditions

\mathbb{E}_\theta[g(X,\beta)]=0

where θ is the parameter of the sampling distribution, constrained by the value of β. (Which also contains quantile regression.) The very problem of simulating under a hard constraint has been bugging me for years and it is hence very exciting to see them come up with a proposal towards solving this difficulty! Even though it is restricted here to observations with a finite support (hence allowing for the use of a parametric Dirichlet prior). One interesting extension (Section 3.6) processed in the paper is the case when the support is unknown, but finite, with some points in the support being unobserved. Maybe connecting with non-parametrics if a prior is added on the number of unobserved points.

The setting of constricting θ via a parameterised moment condition relates to moment defined econometrics models, in a similar spirit to Gallant’s paper I recently discussed, but equally to empirical likelihood, which would then benefit from a fully Bayesian treatment thanks to the approach advocated by the authors.

bornnshepardDespite the zero-measure difficulty, or more exactly the non-linear manifold structure of the parameter space, for instance

β = log {θ/(1-θ)}

the authors manage to define a “projected” [my words] measure on the set of admissible pairs (β,θ). In a sense this is related with the choice of a certain metric, but the so-called Hausdorff reference measure allows for an automated definition of the original prior. It took me a (wee) while to spot (p.7) that the starting point was not a (unconstrained) prior on that (unconstrained) pair (β,θ) but directly on the manifold

\mathbb{E}_\theta[g(X,\beta)]=0.

Which makes its construction a difficulty. Even though, as noted in Section 4, all that we need is a prior over θ since the Hausdorff-Jacobian identity defines the “joint”, in a sort of backward way. (This is a wee bit confusing in that β being a transform of θ, all we need is a prior over θ, but we nonetheless end up with a different density on the joint distribution on the pair (β,θ). Any connection with incompatible priors merged together into a consensus prior?) Another question extending the scope of the paper would be to define Jeffreys’ or reference priors in this manifold sense.

The authors also discuss (Section 4.3) the problem I originally thought they were processing, namely starting from an unconstrained pair (β,θ) and it corresponding prior. The projected prior can then be defined based on a version of the original density on the constrained space, but it is definitely arbitrary. In that sense the paper does not address the general problem.

bornnshepard1“…traditional simulation algorithms will fail because the prior and the posterior of the model are supported on a zero Lebesgue measure set…” (p.10)

I somewhat resist this presentation through the measure zero set: once the prior is defined on a manifold, the fact that it is a measure zero set in a larger space is moot. Provided one can simulate a proposal over that manifold, e.g., a random walk, absolutely continuous wrt the same dominating measure, and compute or estimate a Metropolis-Hastings ratio of densities against a common measure, one can formally run MCMC on manifolds as well as regular Euclidean spaces. A first and theoretically straightforward (?) solution is to solve the constraint

\mathbb{E}_\theta[g(X,\beta)]=0

in β=β(θ). Then the joint prior p(β,θ) can be projected by the Hausdorff projection into p(θ). For instance, in the case of the above logit transform, the projected density is

p(θ)=p(β,θ) {1+1/θ²(1-θ)²}½

In practice, the inversion may be too costly and Bornn et al. directly simulate the pair (β,θ) within the manifold capitalising on the fact that the constraint is linear in θ given β. Indeed, in this setting, β is unconstrained and θ can be simulated from a proposal restricted to the hyperplane. Gibbs-like.

Conditional love [guest post]

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , , on August 4, 2015 by xi'an

[When Dan Simpson told me he was reading Terenin’s and Draper’s latest arXival in a nice Bath pub—and not a nice bath tub!—, I asked him for a blog entry and he agreed. Here is his piece, read at your own risk! If you remember to skip the part about Céline Dion, you should enjoy it very much!!!]

Probability has traditionally been described, as per Kolmogorov and his ardent follower Katy Perry, unconditionally. This is, of course, excellent for those of us who really like measure theory, as the maths is identical. Unfortunately mathematical convenience is not necessarily enough and a large part of the applied statistical community is working with Bayesian methods. These are unavoidably conditional and, as such, it is natural to ask if there is a fundamentally conditional basis for probability.

Bruno de Finetti—and later Richard Cox and Edwin Jaynes—considered conditional bases for Bayesian probability that are, unfortunately, incomplete. The critical problem is that they mainly consider finite state spaces and construct finitely additive systems of conditional probability. For a variety of reasons, neither of these restrictions hold much truck in the modern world of statistics.

In a recently arXiv’d paper, Alexander Terenin and David Draper devise a set of axioms that make the Cox-Jaynes system of conditional probability rigorous. Furthermore, they show that the complete set of Kolmogorov axioms (including countable additivity) can be derived as theorems from their axioms by conditioning on the entire sample space.

This is a deep and fundamental paper, which unfortunately means that I most probably do not grasp it’s complexities (especially as, for some reason, I keep reading it in pubs!). However I’m going to have a shot at having some thoughts on it, because I feel like it’s the sort of paper one should have thoughts on. Continue reading