Archive for misspecified model

ISBA 18 tidbits

Posted in Books, Mountains, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , on July 2, 2018 by xi'an

Among a continuous sequence of appealing sessions at this ISBA 2018 meeting [says a member of the scientific committee!], I happened to attend two talks [with a wee bit of overlap] by Sid Chib in two consecutive sessions, because his co-author Ana Simoni (CREST) was unfortunately sick. Their work was about models defined by a collection of moment conditions, as often happens in econometrics, developed in a recent JASA paper by Chib, Shin, and Simoni (2017). With an extension about moving to defining conditional expectations by use of a functional basis. The main approach relies on exponentially tilted empirical likelihoods, which reminded me of the empirical likelihood [BCel] implementation we ran with Kerrie Mengersen and Pierre Pudlo a few years ago. As a substitute to ABC. This problematic made me wonder on how much Bayesian the estimating equation concept is, as it should somewhat involve a nonparametric prior under the moment constraints.

Note that Sid’s [talks and] papers are disconnected from ABC, as everything comes in closed form, apart from the empirical likelihood derivation, as we actually found in our own work!, but this could become a substitute model for ABC uses. For instance, identifying the parameter θ of the model by identifying equations. Would that impose too much input from the modeller? I figure I came with this notion mostly because of the emphasis on proxy models the previous day at ABC in ‘burgh! Another connected item of interest in the work is the possibility of accounting for misspecification of these moment conditions by introducing a vector of errors with a spike & slab distribution, although I am not sure this is 100% necessary without getting further into the paper(s) [blame conference pressure on my time].

Another highlight was attending a fantastic poster session Monday night on computational methods except I would have needed four more hours to get through every and all posters. This new version of ISBA has split the posters between two sites (great) and themes (not so great!), while I would have preferred more sites covering all themes over all nights, to lower the noise (still bearable this year) and to increase the possibility to check all posters of interest in a particular theme…

Mentioning as well a great talk by Dan Roy about assessing deep learning performances by what he calls non-vacuous error bounds. Namely, through PAC-Bayesian bounds. One major comment of his was about deep learning models being much more non-parametric (number of parameters rising with number of observations) than parametric models, meaning that generative adversarial constructs as the one I discussed a few days ago may face a fundamental difficulty as models are taken at face value there.

On closed-form solutions, a closed-form Bayes factor for component selection in mixture models by Fũqene, Steel and Rossell that resemble the Savage-Dickey version, without the measure theoretic difficulties. But with non-local priors. And closed-form conjugate priors for the probit regression model, using unified skew-normal priors, as exhibited by Daniele Durante. Which are product of Normal cdfs and pdfs, and which allow for closed form marginal likelihoods and marginal posteriors as well. (The approach is not exactly conjugate as the prior and the posterior are not in the same family.)

And on the final session I attended, there were two talks on scalable MCMC, one on coresets, which will require some time and effort to assimilate, by Trevor Campbell and Tamara Broderick, and another one using Poisson subsampling. By Matias Quiroz and co-authors. Which did not completely convinced me (but this was the end of a long day…)

All in all, this has been a great edition of the ISBA meetings, if quite intense due to a non-stop schedule, with a very efficient organisation that made parallel sessions manageable and poster sessions back to a reasonable scale [although I did not once manage to cross the street to the other session]. Being in unreasonably sunny Edinburgh helped a lot obviously! I am a wee bit disappointed that no one else follows my call to wear a kilt, but I had low expectations to start with… And too bad I missed the Ironman 70.3 Edinburgh by one day!

Better together in Kolkata [slides]

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , on January 4, 2018 by xi'an

Here are the slides of the talk on modularisation I am giving today at the PC Mahalanobis 125 Conference in Kolkata, mostly borrowed from Pierre’s talk at O’Bayes 2018 last month:

[which made me realise Slideshare has discontinued the option to update one’s presentation, forcing users to create a new presentation for each update!] Incidentally, the amphitheatre at ISI is located right on top of a geological exhibit room with a reconstituted Barapasaurus tagorei so I will figuratively ride a dinosaur during my talk!

Au’Bayes 17

Posted in Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , on December 14, 2017 by xi'an

Some notes scribbled during the O’Bayes 17 conference in Austin, not reflecting on the highly diverse range of talks. And many new faces and topics, meaning O’Bayes is alive and evolving. With all possible objectivity, a fantastic conference! (Not even mentioning the bars where Peter Müller hosted the poster sessions, a feat I would have loved to see duplicated for the posters of ISBA 2018… Or the Ethiopian restaurant just around the corner with the right amount of fierce spices!)

The wiki on objective, reference, vague, neutral [or whichever label one favours] priors that was suggested at the previous O’Bayes meeting in Valencià, was introduced as Wikiprevia by Gonzalo Garcia-Donato. It aims at classifying recommended priors in most of the classical models, along with discussion panels, and it should soon get an official launch, when contributors will be welcome to include articles in a wiki principle. I wish the best to this venture which, I hope, will induce O’Bayesians to contribute actively.

In a brilliant talk that quickly reverted my jetlag doziness, Peter Grünwald returned to the topic he presented last year in Sardinia, namely safe Bayes or powered-down likelihoods to handle some degree of misspecification, with a further twist of introducing an impossible value `o’ that captures missing mass (to be called Peter’s demon?!), which absolute necessity I did not perceive. Food for thoughts, definitely. (But I feel that the only safe Bayes is the dead Bayes, as protecting against all kinds of mispecifications means no action is possible.)

I also appreciated Cristiano Villa’s approach to constructing prior weights in model comparison from a principled and decision-theoretic perspective even though I felt that the notion of ranking parameter importance required too much input to be practically feasible. (Unless I missed that point.)

Laura Ventura gave her talk on using for ABC various scores or estimating equations as summary statistics, rather than the corresponding M-estimators, which offers the appealing feature of reducing computation while being asymptotically equivalent. (A feature we also exploited for the regular score function in our ABC paper with Gael, David, Brendan, and Wonapree.) She mentioned the Hyvärinen score [of which I first heard in Padova!] as a way to bypass issues related to doubly intractable likelihoods. Which is a most interesting proposal that bypasses (ABC) simulations from such complex targets by exploiting a pseudo-posterior.

Veronika Rockova presented a recent work on concentration rates for regression tree methods that produce a rigorous analysis of these methods. Showing that the spike & slab priors plus BART [equals spike & tree] achieve sparsity and optimal concentration. In an oracle sense. With a side entry on assembling partition trees towards creating a new form of BART. Which made me wonder whether or not this was also applicable to random forests. Although they are not exactly Bayes. Demanding work in terms of the theory behind but with impressive consequences!

Just before I left O’Bayes 17 for Houston airport, Nick Polson, along with Peter McCullach, proposed an intriguing notion of sparse Bayes factors, which corresponds to the limit of a Bayes factor when the prior probability υ of the null goes to zero. When the limiting prior is replaced with an exceedance measure that can be normalised into a distribution, but does it make the limit a special prior? Linking  υ with the prior under the null is not an issue (this was the basis of my 1992 Lindley paradox paper) but the sequence of priors indexed by υ need be chosen. And reading from the paper at Houston airport, I could not spot a construction principle that would lead to a reference prior of sorts. One thing that Nick mentioned during his talk was that we observed directly realisations of the data marginal, but this is generally not the case as the observations are associated with a given value of the parameter, not one for each observation.The next edition of the O’Bayes conference will be in… Warwick on June 29-July 2, as I volunteered to organise this edition (16 years after O’Bayes 03 in Aussois!) just after the BNP meeting in Oxford on June 23-28, hopefully creating the environment for fruitful interactions between both communities! (And jumping from Au’Bayes to Wa’Bayes.)

Bayesian synthetic likelihood

Posted in Statistics with tags , , , , , , , on December 13, 2017 by xi'an

Leah Price, Chris Drovandi, Anthony Lee and David Nott published earlier this year a paper in JCGS on Bayesian synthetic likelihood, using Simon Wood’s synthetic likelihood as a substitute to the exact likelihood within a Bayesian approach. While not investigating the theoretical properties of this approximate approach, the paper compares it with ABC on some examples. In particular with respect to the number n of Monte Carlo replications used to approximate the mean and variance of the Gaussian synthetic likelihood.

Since this approach is most naturally associated with an MCMC implementation, it requires new simulations of the summary statistics at each iteration, without a clear possibility to involve parallel runs, in contrast to ABC. However in the final example of the paper, the authors reach values of n of several thousands, making use of multiple cores relevant, if requiring synchronicity and checks at every MCMC iteration.

The authors mention that “ABC can be viewed as a pseudo-marginal method”, but this has a limited appeal since the pseudo-marginal is a Monte Carlo substitute for the ABC target, not the original target. Similarly, there exists an unbiased estimator of the Gaussian density due to Ghurye and Olkin (1969) that allows to perceive the estimated synthetic likelihood version as a pseudo-marginal, once again wrt a target that differs from the original one. And the bias reappears under mis-specification, that is when the summary statistics are not normally distributed. It seems difficult to assess this normality or absence thereof in realistic situations.

“However, when the distribution of the summary statistic is highly irregular, the output of BSL cannot be trusted, while ABC represents a robust alternative in such cases.”

To make synthetic likelihood and ABC algorithms compatible, the authors chose a Normal kernel for ABC. Still, the equivalence is imperfect in that the covariance matrix need be chosen in the ABC case and is estimated in the synthetic one. I am also lost to the argument that the synthetic version is more efficient than ABC, in general (page 8). As for the examples, the first one uses a toy Poisson posterior with a single sufficient summary statistic, which is not very representative of complex situations where summary statistics are extremes or discrete. As acknowledged by the authors this is a case when the Normality assumption applies. For an integer support hidden process like the Ricker model, normality vanishes and the outcomes of ANC and synthetic likelihood differ, which makes it difficult to compare the inferential properties of both versions (rather than the acceptance rates), while using a 13-dimension statistic for estimating a 3-dimension parameter is not recommended for ABC, as discussed by Li and Fearnhead (2017). The same issue appears in the realistic cell motility example, with 145 summaries versus two parameters. (In the philogenies studied by DIYABC, the number of summary statistics is about the same but we now advocate a projection to the parameter dimension by the medium of random forests.)

Given the similarity between both approaches, I wonder at a confluence between them, where synthetic likelihood could maybe be used to devise PCA on the summary statistics and facilitate their projection on a space with much smaller dimensions. Or estimating the mean and variance functions in the synthetic likelihood towards producing directly simulations of the summary statistics.

probably ABC [and provably robust]

Posted in Books, pictures, Statistics, Travel with tags , , , , , , , , on August 8, 2017 by xi'an

Two weeks ago, James Ridgway (formerly CREST) arXived a paper on misspecification and ABC, a topic on which David Frazier, Judith Rousseau and I have been working for a while now [and soon to be arXived as well].  Paper that I re-read on a flight to Amsterdam [hence the above picture], written as a continuation of our earlier paper with David, Gael, and Judith. One specificity of the paper is to use an exponential distribution on the distance between the observed and simulated sample within the ABC distribution. Which reminds me of the resolution by Bissiri, Holmes, and Walker (2016) of the intractability of the likelihood function. James’ paper contains oracle inequalities between the ABC approximation and the genuine distribution of the summary statistics, like a bound on the distance between the expectations of the summary statistics under both models. Which writes down as a sum of a model bias, of two divergences between empirical and theoretical averages, on smoothness penalties, and on a prior impact term. And a similar bound on the distance between the expected distance to the oracle estimator of θ under the ABC distribution [and a Lipschitz type assumption also found in our paper]. Which first sounded weird [to me] as I would have expected the true posterior, until it dawned on me that the ABC distribution is the one used for the estimation [a passing strike of over-Bayesianism!]. While the oracle bound could have been used directly to discuss the rate of convergence of the exponential rate λ to zero [with the sample size n], James goes into the interesting alternative direction of setting a prior on λ, an idea that dates back to Olivier Catoni and Peter Grünwald. Or rather a pseudo-posterior on λ, a common occurrence in the PAC-Bayesian literature. In one of his results, James obtains a dependence of λ on the dimension m of the summary [as well as the root dependence on the sample size n], which seems to contradict our earlier independence result, until one realises this scale parameter is associated with a distance variable, itself scaled in m.

The paper also contains a non-parametric part, where the parameter θ is the unknown distribution of the data and the summary the data itself. Which is quite surprising as I did not deem it possible to handle non-parametrics with ABC. Especially in a misspecified setting (although I have trouble perceiving what this really means).

“We can use most of the Monte Carlo toolbox available in this context.”

The theoretical parts are a bit heavy on notations and hard to read [as a vacation morning read at least!]. They are followed by a Monte Carlo implementation using SMC-ABC.  And pseudo-marginals [at least formally as I do not see how the specific features of pseudo-marginals are more that an augmented representation here]. And adaptive multiple pseudo-samples that reminded me of the Biometrika paper of Anthony Lee and Krys Latuszynski (Warwick). Therefore using indeed most of the toolbox!

Why should I be Bayesian when my model is wrong?

Posted in Books, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , on May 9, 2017 by xi'an

Guillaume Dehaene posted the above question on X validated last Friday. Here is an except from it:

However, as everybody knows, assuming that my model is correct is fairly arrogant: why should Nature fall neatly inside the box of the models which I have considered? It is much more realistic to assume that the real model of the data p(x) differs from p(x|θ) for all values of θ. This is usually called a “misspecified” model.

My problem is that, in this more realistic misspecified case, I don’t have any good arguments for being Bayesian (i.e: computing the posterior distribution) versus simply computing the Maximum Likelihood Estimator.

Indeed, according to Kleijn, v.d Vaart (2012), in the misspecified case, the posterior distribution converges as nto a Dirac distribution centred at the MLE but does not have the correct variance (unless two values just happen to be same) in order to ensure that credible intervals of the posterior match confidence intervals for θ.

Which is a very interesting question…that may not have an answer (but that does not make it less interesting!)

A few thoughts about that meme that all models are wrong: (resonating from last week discussion):

  1. While the hypothetical model is indeed almost invariably and irremediably wrong, it still makes sense to act in an efficient or coherent manner with respect to this model if this is the best one can do. The resulting inference produces an evaluation of the formal model that is the “closest” to the actual data generating model (if any);
  2. There exist Bayesian approaches that can do without the model, a most recent example being the papers by Bissiri et al. (with my comments) and by Watson and Holmes (which I discussed with Judith Rousseau);
  3. In a connected way, there exists a whole branch of Bayesian statistics dealing with M-open inference;
  4. And yet another direction I like a lot is the SafeBayes approach of Peter Grünwald, who takes into account model misspecification to replace the likelihood with a down-graded version expressed as a power of the original likelihood.
  5. The very recent Read Paper by Gelman and Hennig addresses this issue, albeit in a circumvoluted manner (and I added some comments on my blog).
  6. In a sense, Bayesians should be the least concerned among statisticians and modellers about this aspect since the sampling model is to be taken as one of several prior assumptions and the outcome is conditional or relative to all those prior assumptions.

never mind the big data here’s the big models [workshop]

Posted in Kids, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , on December 22, 2015 by xi'an

Maybe the last occurrence this year of the pastiche of the iconic LP of the Sex Pistols!, made by Tamara Polajnar. The last workshop as well of the big data year in Warwick, organised by the Warwick Data Science Institute. I appreciated the different talks this afternoon, but enjoyed particularly Dan Simpson’s and Rob Scheichl’s. The presentation by Dan was so hilarious that I could not resist asking him for permission to post the slides here:

Not only hilarious [and I have certainly missed 67% of the jokes], but quite deep about the meaning(s) of modelling and his views about getting around the most blatant issues. Ron presented a more computational talk on the ways to reach petaflops on current supercomputers, in connection with weather prediction models used (or soon to be used) by the Met office. For a prediction area of 1 km². Along with significant improvements resulting from multiscale Monte Carlo and quasi-Monte Carlo. Definitely impressive! And a brilliant conclusion to the Year of Big Data (and big models).