Archive for summary statistics

conditioning on insufficient statistics in Bayesian regression

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , on October 23, 2021 by xi'an

“…the prior distribution, the loss function, and the likelihood or sampling density (…) a healthy skepticism encourages us to question each of them”

A paper by John Lewis, Steven MacEachern, and Yoonkyung Lee has recently appeared in Bayesian Analysis. Starting with the great motivation of a misspecified model requiring the use of a (thus necessarily) insufficient statistic and moving to their central concern of simulating the posterior based on that statistic.

Model misspecification remains understudied from a B perspective and this paper is thus most welcome in addressing the issue. However, when reading through, one of my criticisms is in defining misspecification as equivalent to outliers in the sample. An outlier model is an easy case of misspecification, in the end, since the original model remains meaningful. (Why should there be “good” versus “bad” data) Furthermore, adding a non-parametric component for the unspecified part of the data would sound like a “more Bayesian” alternative. Unrelated, I also idly wondered at whether or not normalising flows could be used in this instance..

The problem in selecting a T (Darjeeling of course!) is not really discussed there, while each choice of a statistic T leads to a different signification to what misspecified means and suggests a comparison with Bayesian empirical likelihood.

“Acceptance rates of this [ABC] algorithm can be intolerably low”

Erm, this is not really the issue with ABC, is it?! Especially when the tolerance is induced by the simulations themselves.

When I reached the MCMC (Gibbs?) part of the paper, I first wondered at its relevance for the mispecification issues before realising it had become the focus of the paper. Now, simulating the observations conditional on a value of the summary statistic T is a true challenge. I remember for instance George Casella mentioning it in association with a Student’s t sample in the 1990’s and Kerrie and I having an unsuccessful attempt at it in the same period. Persi Diaconis has written several papers on the problem and I am thus surprised at the dearth of references here, like the rather recent Byrne and Girolami (2013), Florens and Simoni (2015), or Bornn et al. (2019). In the present case, the  linear model assumed as the true model has the exceptional feature that it leads to a feasible transform of an unconstrained simulation into a simulation with fixed statistics, with no measure theoretic worries if not free from considerable efforts to establish the operation is truly valid… And, while simulating (θ,y) makes perfect sense in an insufficient setting, the cost is then precisely the same as when running a vanilla ABC. Which brings us to the natural comparison with ABC. While taking ε=0 may sound as optimal for being “exact”, it is not from an ABC perspective since the convergence rate of the (summary) statistic should be roughly the one of the tolerance (Fearnhead and Liu, Frazier et al., 2018).

“[The Borel Paradox] shows that the concept of a conditional probability with regard to an isolated given hypothesis whose probability equals 0 is inadmissible.” A. Колмого́ров (1933)

As a side note for measure-theoretic purists, the derivation of the conditional of y given T(y)=T⁰ is arbitrary since the event has probability zero (ie, the conditioning set is of measure zero). See the Borel-Kolmogorov paradox. The computations in the paper are undoubtedly correct, but this is only one arbitrary choice of a transform (or conditioning σ-algebra).

Metropolis-Hastings via Classification [One World ABC seminar]

Posted in Statistics, University life with tags , , , , , , , , , , , , , , , on May 27, 2021 by xi'an

Today, Veronika Rockova is giving a webinar on her paper with Tetsuya Kaji Metropolis-Hastings via classification. at the One World ABC seminar, at 11.30am UK time. (Which was also presented at the Oxford Stats seminar last Feb.) Please register if not already a member of the 1W ABC mailing list.

ABC on brain networks

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , , on April 16, 2021 by xi'an

Research Gate sent me an automated email pointing out a recent paper citing some of our ABC papers. The paper is written by Timothy West et al., neuroscientists in the UK, comparing models of Parkinsonian circuit dynamics. Using SMC-ABC. One novelty is the update of the tolerance by a fixed difference, unless the acceptance rate is too low, in which case the tolerance is reinitialised to a starting value.

“(…) the proposal density P(θ|D⁰) is formed from the accepted parameters sets. We use a density approximation to the marginals and a copula for the joint (…) [i.e.] a nonparametric estimation of the marginal densities overeach parameter [and] the t-copula(…) Data are transformed to the copula scale (unit-square) using the kernel density estimator of the cumulative distribution function of each parameter and then transformed to the joint space with the t-copula.”

The construct of the proposal is quite involved, as described in the above quote. The model choice approach is standard (à la Grelaud et al.) but uses the median distance as a tolerance.

“(…) test whether the ABC estimator will: a) yield parameter estimates that are unique to the data from which they have been optimized; and b) yield consistent estimation of parameters across multiple instances (…) test the face validity of the model comparison framework (…) [and] demonstrate the scalability of the optimization and model comparison framework.”

The paper runs a fairly extensive test of the above features, concluding that “the ABC optimized posteriors are consistent across multiple initializations and that the output is determined by differences in the underlying model generating the given data.” Concerning model comparison, the authors mix the ABC Bayes factor with a post-hoc analysis of divergence to discriminate against overfitting. And mention the potential impact of the summary statistics in the conclusion section, albeit briefly, and the remark that the statistics were “sufficient to recover known parameters” is not supporting their use for model comparison. The additional criticism of sampling strategies for approximating Bayes factors is somewhat irrelevant, the main issue with ABC model choice being a change of magnitude in the evidence.

“ABC has established itself as a key tool for parameter estimation in systems biology (…) but is yet to see wide adoption in systems neuroscience. It is known that ABC will not perform well under certain conditions (Sunnåker et al., 2013). Specifically, it has been shown that the
simplest form of ABC algorithm based upon an rejection-sampling approach is inefficient in the case where the prior densities lie far from the true posterior (…) This motivates the use of neurobiologically grounded models over phenomenological models where often the ranges of potential parameter values are unknown.”

likelihood-free and summary-free?

Posted in Books, Mountains, pictures, Statistics, Travel with tags , , , , , , , , , , , , , on March 30, 2021 by xi'an

My friends and coauthors Chris Drovandi and David Frazier have recently arXived a paper entitled A comparison of likelihood-free methods with and without summary statistics. In which they indeed compare these two perspectives on approximate Bayesian methods like ABC and Bayesian synthetic likelihoods.

“A criticism of summary statistic based approaches is that their choice is often ad hoc and there will generally be an  inherent loss of information.”

In ABC methods, the recourse to a summary statistic is often advocated as a “necessary evil” against the greater evil of the curse of dimension, paradoxically providing a faster convergence of the ABC approximation (Fearnhead & Liu, 2018). The authors propose a somewhat generic selection of summary statistics based on [my undergrad mentors!] Gouriéroux’s and Monfort’s indirect inference, using a mixture of Gaussians as their auxiliary model. Summary-free solutions, as in our Wasserstein papers, rely on distances between distributions, hence are functional distances, that can be seen as dimension-free as well (or criticised as infinite dimensional). Chris and David consider energy distances (which sound very much like standard distances, except for averaging over all permutations), maximum mean discrepancy as in Gretton et al. (2012), Cramèr-von Mises distances, and Kullback-Leibler divergences estimated via one-nearest-neighbour formulas, for a univariate sample. I am not aware of any degree of theoretical exploration of these functional approaches towards the precise speed of convergence of the ABC approximation…

“We found that at least one of the full data approaches was competitive with or outperforms ABC with summary statistics across all examples.”

The main part of the paper, besides a survey of the existing solutions, is to compare the performances of these over a few chosen (univariate) examples, with the exact posterior as the golden standard. In the g & k model, the Pima Indian benchmark of ABC studies!, Cramèr does somewhat better. While it does much worse in an M/G/1 example (where Wasserstein does better, and similarly for a stereological extremes example of Bortot et al., 2007). An ordering inversed again for a toad movement model I had not seen before. While the usual provision applies, namely that this is a simulation study on unidimensional data and a small number of parameters, the design of the four comparison experiments is very careful, eliminating versions that are either too costly or too divergence, although this could be potentially criticised for being unrealistic (i.e., when the true posterior is unknown). The computing time is roughly the same across methods, which essentially remove the call to kernel based approximations of the likelihood. Another point of interest is that the distance methods are significantly impacted by transforms on the data, which should not be so for intrinsic distances! Demonstrating the distances are not intrinsic…

Metropolis-Hastings via classification

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , on February 23, 2021 by xi'an

Veronicka Rockova (from Chicago Booth) gave a talk on this theme at the Oxford Stats seminar this afternoon. Starting with a survey of ABC, synthetic likelihoods, and pseudo-marginals, to motivate her approach via GANs, learning an approximation of the likelihood from the GAN discriminator. Her explanation for the GAN type estimate was crystal clear and made me wonder at the connection with Geyer’s 1994 logistic estimator of the likelihood (a form of discriminator with a fixed generator). She also expressed the ABC approximation hence created as the actual posterior times an exponential tilt. Which she proved is of order 1/n. And that a random variant of the algorithm (where the shift is averaged) is unbiased. Most interestingly requiring no calibration and no tolerance. Except indirectly when building the discriminator. And no summary statistic. Noteworthy tension between correct shape and correct location.

%d bloggers like this: