Archive for Monash University

down-under ABC paper accepted in JCGS!

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , on October 25, 2018 by xi'an

Great news!, the ABC paper we had originally started in 2012 in Melbourne with Gael Martin and Brendan MacCabe, before joining forces with David Frazier and Worapree Maneesoothorn, in expanding its scope to using auxiliary likelihoods to run ABC in state-space models, just got accepted in the Journal of Computational and Graphical Statistics. A reason to celebrate with a Mornington Peninsula Pinot Gris wine next time I visit Monash!

ABC forecasts

Posted in Books, pictures, Statistics with tags , , , , , , , , on January 9, 2018 by xi'an

My friends and co-authors David Frazier, Gael Martin, Brendan McCabe, and Worapree Maneesoonthorn arXived a paper on ABC forecasting at the turn of the year. ABC prediction is a natural extension of ABC inference in that, provided the full conditional of a future observation given past data and parameters is available but the posterior is not, ABC simulations of the parameters induce an approximation of the predictive. The paper thus considers the impact of this extension on the precision of the predictions. And argues that it is possible that this approximation is preferable to running MCMC in some settings. A first interesting result is that using ABC and hence conditioning on an insufficient summary statistic has no asymptotic impact on the resulting prediction, provided Bayesian concentration of the corresponding posterior takes place as in our convergence paper under revision.

“…conditioning inference about θ on η(y) rather than y makes no difference to the probabilistic statements made about [future observations]”

The above result holds both in terms of convergence in total variation and for proper scoring rules. Even though there is always a loss in accuracy in using ABC. Now, one may think this is a direct consequence of our (and others) earlier convergence results, but numerical experiments on standard time series show the distinct feature that, while the [MCMC] posterior and ABC posterior distributions on the parameters clearly differ, the predictives are more or less identical! With a potential speed gain in using ABC, although comparing parallel ABC versus non-parallel MCMC is rather delicate. For instance, a preliminary parallel ABC could be run as a burnin’ step for parallel MCMC, since all chains would then be roughly in the stationary regime. Another interesting outcome of these experiments is a case when the summary statistics produces a non-consistent ABC posterior, but still leads to a very similar predictive, as shown on this graph.This unexpected accuracy in prediction may further be exploited in state space models, towards producing particle algorithms that are greatly accelerated. Of course, an easy objection to this acceleration is that the impact of the approximation is unknown and un-assessed. However, such an acceleration leaves room for multiple implementations, possibly with different sets of summaries, to check for consistency over replicates.

Xi’an cuisine [Xi’an series]

Posted in Statistics with tags , , , , , , , , , , , on August 26, 2017 by xi'an

David Frazier sent me a picture of another Xi’an restaurant he found near the campus of Monash University. If this CNN webpage on the ten best dishes in Xi’an is to be believed, this will be a must-go restaurant for my next visit to Melbourne! Especially when reading there that Xi’an claims to have xiaolongbao (soup dumplings) that are superior to those in Shanghai!!! (And when considering that I once went on a xiaolongbao rampage in downtown Melbourne.

model misspecification in ABC

Posted in Statistics with tags , , , , , , , , on August 21, 2017 by xi'an

With David Frazier and Judith Rousseau, we just arXived a paper studying the impact of a misspecified model on the outcome of an ABC run. This is a question that naturally arises when using ABC, but that has been not directly covered in the literature apart from a recently arXived paper by James Ridgway [that was earlier this month commented on the ‘Og]. On the one hand, ABC can be seen as a robust method in that it focus on the aspects of the assumed model that are translated by the [insufficient] summary statistics and their expectation. And nothing else. It is thus tolerant of departures from the hypothetical model that [almost] preserve those moments. On the other hand, ABC involves a degree of non-parametric estimation of the intractable likelihood, which may sound even more robust, except that the likelihood is estimated from pseudo-data simulated from the “wrong” model in case of misspecification.

In the paper, we examine how the pseudo-true value of the parameter [that is, the value of the parameter of the misspecified model that comes closest to the generating model in terms of Kullback-Leibler divergence] is asymptotically reached by some ABC algorithms like the ABC accept/reject approach and not by others like the popular linear regression [post-simulation] adjustment. Which suprisingly concentrates posterior mass on a completely different pseudo-true value. Exploiting our recent assessment of ABC convergence for well-specified models, we show the above convergence result for a tolerance sequence that decreases to the minimum possible distance [between the true expectation and the misspecified expectation] at a slow enough rate. Or that the sequence of acceptance probabilities goes to zero at the proper speed. In the case of the regression correction, the pseudo-true value is shifted by a quantity that does not converge to zero, because of the misspecification in the expectation of the summary statistics. This is not immensely surprising but we hence get a very different picture when compared with the well-specified case, when regression corrections bring improvement to the asymptotic behaviour of the ABC estimators. This discrepancy between two versions of ABC can be exploited to seek misspecification diagnoses, e.g. through the acceptance rate versus the tolerance level, or via a comparison of the ABC approximations to the posterior expectations of quantities of interest which should diverge at rate Vn. In both cases, ABC reference tables/learning bases can be exploited to draw and calibrate a comparison with the well-specified case.

two ABC postdocs at Monash

Posted in Statistics with tags , , , , , , on April 4, 2017 by xi'an

For students, postdocs and faculty working on approximate inference, ABC algorithms,  and likelihood-free methods, this announcement of two postdoc positions at Monash University, Melbourne, Australia, to work with Gael Martin, David Frazier and Catherine Forbes should be of strong relevance and particular interest:

The Department of Econometrics and Business Statistics at Monash is looking to fill two postdoc positions in – one for 12 months and the other for 2 years. The positions will be funded (respectively) by the following ARC Discovery grants:

1. DP150101728: “Approximate Bayesian Computation in State Space Models”. (Chief Investigators: Professor Gael Martin and Associate Professor Catherine Forbes; International Partner Investigators: Professor Brendan McCabe and Professor Christian Robert).

2. DP170100729: “The Validation of Approximate Bayesian Computation: Theory and Practice“. (Chief Investigators: Professor Gael Martin and Dr David Frazier; International Partner Investigators: Professor Christian Robert and Professor Eric Renault).

The deadline for applications is April 28th, 2017, and the nominal starting date is July, 2017 (although there is some degree of flexibility on that front).

Albert’s block

Posted in pictures, Travel, Wines with tags , , , , , , , on February 20, 2017 by xi'an

warp-U bridge sampling

Posted in Books, Statistics, Travel, University life with tags , , , , , , , , , , on October 12, 2016 by xi'an

[I wrote this set of comments right after MCqMC 2016 on a preliminary version of the paper so mileage may vary in terms of the adequation to the current version!]

In warp-U bridge sampling, newly arXived and first presented at MCqMC 16, Xiao-Li Meng continues (in collaboration with Lahzi Wang) his exploration of bridge sampling techniques towards improving the estimation of normalising constants and ratios thereof. The bridge sampling estimator of Meng and Wong (1996) is an harmonic mean importance sampler that requires iterations as it depends on the ratio of interest. Given that the normalising constant of a density does not depend on the chosen parameterisation in the sense that the Jacobian transform preserves this constant, a degree of freedom is in the choice of the parameterisation. This is the idea behind warp transformations. The initial version of Meng and Schilling (2002) used location-scale transforms, while the warp-U solution goes for a multiple location-scale transform that can be seen as based on a location-scale mixture representation of the target. With K components. This approach can also be seen as a sort of artificial reversible jump algorithm when one model is fully known. A strategy Nicolas and I also proposed in our nested sampling Biometrika paper.

Once such a mixture approximation is obtained. each and every component of the mixture can be turned into the standard version of the location-scale family by the appropriate location-scale transform. Since the component index k is unknown for a given X, they call this transform a random transform, which I find somewhat more confusing that helpful. The conditional distribution of the index given the observable x is well-known for mixtures and it is used here to weight the component-wise location-scale transforms of the original distribution p into something that looks rather similar to the standard version of the location-scale family. If no mode has been forgotten by the mixture. The simulations from the original p are then rescaled by one of those transforms, which index k is picked according to the conditional distribution. As explained later to me by XL, the random[ness] in the picture is due to the inclusion of a random ± sign. Still, in the notation introduced in (13), I do not get how the distribution Þ [sorry for using different symbols, I cannot render a tilde on a p] is defined since both ψ and W are random. Is it the marginal? In which case it would read as a weighted average of rescaled versions of p. I have the same problem with Theorem 1 in that I do not understand how one equates Þ with the joint distribution.

Equation (21) is much more illuminating (I find) than the previous explanation in that it exposes the fact that the principle is one of aiming at a new distribution for both the target and the importance function, with hopes that the fit will get better. It could have been better to avoid the notion of random transform, then, but this is mostly a matter of conveying the notion.

On more specifics points (or minutiae), the unboundedness of the likelihood is rarely if ever a problem when using EM. An alternative to the multiple start EM proposal would then be to get sequential and estimate the mixture in a sequential manner, only adding a component when it seems worth it. See eg Chopin and Pelgrin (2004) and Chopin (2007). This could also help with the bias mentioned therein since only a (tiny?) fraction of the data would be used. And the number of components K has an impact on the accuracy of the approximation, as in not missing a mode, and on the computing time. However my suggestion would be to avoid estimating K as this must be immensely costly.

Section 6 obviously relates to my folded Markov interests. If I understand correctly, the paper argues that the transformed density Þ does not need to be computed when considering the folding-move-unfolding step as a single step rather than three steps. I fear the description between equations (30) and (31) is missing the move step over the transformed space. Also on a personal basis I still do not see how to add this approach to our folding methodology, even though the different transforms act as as many replicas of the original Markov chain.