Archive for Gibbs posterior

day four at ISBA 22

Posted in Mountains, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , , , , , , , , on July 3, 2022 by xi'an

Woke up an hour later today! Which left me time to work on [shortening] my slides for tomorrow, run to Mon(t) Royal, and bike to St-Viateur Bagels for freshly baked bagels. (Which seemed to be missing salt, despite my low tolerance for salt in general.)

Terrific plenary lecture by Pierre Jacob in his Susie Bayarri’s Lecture about cut models!  Offering a very complete picture of the reasons for seeking modularisation, the theoretical and practical difficulties with the approach, and some asymptotics as well. Followed a great discussion by Judith on cut posteriors separating interest parameters from nuisance parameters, especially in semi-parametric models. Even introducing two priors on the same parameters! And by Jim Berger, who coauthored with Susie the major cut paper inspiring this work, and illustrated the concept on computer experiments (not falling into the fallacy pointed out by Martin Plummer at MCMski(v) in Chamonix!).

Speaking of which, the Scientific Committee for the incoming BayesComp²³ in Levi, Finland, had a working meeting to which I participated towards building the programme as it is getting near. For those interested in building a session, they should make preparations and take advantage of being together in Mon(t)réal, as the call is coming out pretty soon!

Attended a session on divide-and-conquer methods for dependent data, with Sanvesh Srivastava considering the case of hidden Markov models and block processing the observed sequence. Which is sort of justified by the forgettability of long-past observations. I wonder if better performances could be achieved otherwise as the data on a given time interval gives essentially information on the hidden chain at other time periods.

I was informed this morn that Jackie Wong, one speaker in our session tomorrow could not make it to Mon(t)réal for visa reasons. Which is unfortunate for him, the audience and everyone involved in the organisation. This reinforces my call for all-time hybrid conferences that avoid penalising (or even discriminating) against participants who cannot physically attend for ethical, political (visa), travel, health, financial, parental, or any other, reasons… I am often opposed the drawbacks of lower attendance, risk of a deficit, dilution of the community, but there are answers to those, existing or to be invented, and the huge audience at ISBA demonstrates a need for “real” meetings that could be made more inclusive by mirror (low-key low-cost) meetings.

Finished the day at Isle de Garde with a Pu Ehr flavoured beer, in a particularly lively (if not jazzy) part of the city…

day three at ISBA 22

Posted in Mountains, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , on July 1, 2022 by xi'an

Still woke up early too early [to remain operational for the poster session], finalised the selection of our MASH 2022/3 students, then returned to the Jean-Drapeau pool, which was  even more enjoyable in a crisp bright blue morning (and hardly anyone in my lane).

Attended a talk by Li Ma, who reviewed complexifying stick-breaking priors on the weights and introduced a balanced tree stick mechanism (why same depth?) (with links to Jara & Hanson 2010 and Stefanucci & Canale 2021). Then I listened to Giovanni Rebaubo creating clustering Gibbs-type processes along graphs, I sorted of dozed and missed the point as it felt as if the graph turned from a conceptual connection into a physical one! Catherine Forbes talked about a sequential version of stochastic variational approximation (published in St&Co) exploiting the update-one-at-a-time feature of Bayesian construction, except that each step relies on the previous approximation, meaning that the final—if fin there is!—approximation can end up far away from the optimal stochastic variational approximation. Assessing the divergence away from the target (in real time and tight budget would be nice).

After a quick lunch where I tasted seaweed-shell gyozas (!), I went to the generalised Bayesian inference session on Gibbs posteriors, [sort of] making up for the missed SAVI workshop! With Alice Kirichenko (Warwick) deriving information complexity bounds under misspecification, plus deriving an optimal value for the [vexing] coefficient η [in the Gibbs posterior], and Jack Jewson (ex-Warwick), raising the issue of improper models within Gibbs posteriors, although the reference or dominating measure is a priori arbitrary in these settings. But missing the third talk, about Gibbs posteriors again, and Chris Homes’ discussion, to attend part of the Savage (thesis) Award, with finalists Marta Catalano (Warwick faculty), Aditi Shenvi (Warwick student), and John O’Leary (an academic grand-children of mine’s as Pierre Jacob was his advisor). What a disappointment to have to wait for Friday night to hear the outcome!

I must confess to some  (French-speaker) énervement at hearing Mon(t)-réal massacred as Mon-t-real..! A very minor hindrance though, when put in perspective with my friend and Warwick colleague Gareth Roberts forced to evacuate his hotel last night due to a fire in basement, fortunately unscathed but ruining Day 3 for him… (Making me realise the conference hotel itself underwent a similar event 14 years ago.)

approximate Bayesian inference [survey]

Posted in Statistics with tags , , , , , , , , , , , , , , , , , , on May 3, 2021 by xi'an

In connection with the special issue of Entropy I mentioned a while ago, Pierre Alquier (formerly of CREST) has written an introduction to the topic of approximate Bayesian inference that is worth advertising (and freely-available as well). Its reference list is particularly relevant. (The deadline for submissions is 21 June,)

O’Bayes 19/2

Posted in Books, pictures, Running, Travel, University life with tags , , , , , , , , , , , , , , , , , on July 1, 2019 by xi'an

One talk on Day 2 of O’Bayes 2019 was by Ryan Martin on data dependent priors (or “priors”). Which I have already discussed in this blog. Including the notion of a Gibbs posterior about quantities that “are not always defined through a model” [which is debatable if one sees it like part of a semi-parametric model]. Gibbs posterior that is built through a pseudo-likelihood constructed from the empirical risk, which reminds me of Bissiri, Holmes and Walker. Although requiring a prior on this quantity that is  not part of a model. And is not necessarily a true posterior and not necessarily with the same concentration rate as a true posterior. Constructing a data-dependent distribution on the parameter does not necessarily mean an interesting inference and to keep up with the theme of the conference has no automated claim to [more] “objectivity”.

And after calling a prior both Beauty and The Beast!, Erlis Ruli argued about a “bias-reduction” prior where the prior is solution to a differential equation related with some cumulants, connected with an earlier work of David Firth (Warwick).  An interesting conundrum is how to create an MCMC algorithm when the prior is that intractable, with a possible help from PDMP techniques like the Zig-Zag sampler.

While Peter Orbanz’ talk was centred on a central limit theorem under group invariance, further penalised by being the last of the (sun) day, Peter did a magnificent job of presenting the result and motivating each term. It reminded me of the work Jim Bondar was doing in Ottawa in the 1980’s on Haar measures for Bayesian inference. Including the notion of amenability [a term due to von Neumann] I had not met since then. (Neither have I met Jim since the last summer I spent in Carleton.) The CLT and associated LLN are remarkable in that the average is not over observations but over shifts of the same observation under elements of a sub-group of transformations. I wondered as well at the potential connection with the Read Paper of Kong et al. in 2003 on the use of group averaging for Monte Carlo integration [connection apart from the fact that both discussants, Michael Evans and myself, are present at this conference].

scaling the Gibbs posterior credible regions

Posted in Books, Statistics, University life with tags , , , , , , , on September 11, 2015 by xi'an

“The challenge in implementation of the Gibbs posterior is that it depends on an unspecified scale (or inverse temperature) parameter.”

A new paper by Nick Syring and Ryan Martin was arXived today on the same topic as the one I discussed last January. The setting is the same as with empirical likelihood, namely that the distribution of the data is not specified, while parameters of interest are defined via moments or, more generally, a minimising a loss function. A pseudo-likelihood can then be constructed as a substitute to the likelihood, in the spirit of Bissiri et al. (2013). It is called a “Gibbs posterior” distribution in this paper. So the “Gibbs” in the title has no link with the “Gibbs” in Gibbs sampler, since inference is conducted with respect to this pseudo-posterior. Somewhat logically (!), as n grows to infinity, the pseudo- posterior concentrates upon the pseudo-true value of θ minimising the expected loss, hence asymptotically resembles to the M-estimator associated with this criterion. As I pointed out in the discussion of Bissiri et al. (2013), one major hurdle when turning a loss into a log-likelihood is that it is at best defined up to a scale factor ω. The authors choose ω so that the Gibbs posterior

\exp\{-\omega n l_n(\theta,x) \}\pi(\theta)

is well-calibrated. Where ln is the empirical averaged loss. So the Gibbs posterior is part of the matching prior collection. In practice the authors calibrate ω by a stochastic optimisation iterative process, with bootstrap on the side to evaluate coverage. They briefly consider empirical likelihood as an alternative, on a median regression example, where they show that their “Gibbs confidence intervals (…) are clearly the best” (p.12). Apart from the relevance of being “well-calibrated”, and the asymptotic nature of the results. and the dependence on the parameterisation via the loss function, one may also question the possibility of using this approach in large dimensional cases where all of or none of the parameters are of interest.

%d bloggers like this: