## One World ABC seminar [24.2.22]

Posted in Statistics, University life with tags , , , , , , , , , , on February 22, 2022 by xi'an

The next One World ABC seminar is on Thursday 24 Feb, with Rafael Izbicki talking on Likelihood-Free Frequentist Inference – Constructing Confidence Sets with Correct Conditional Coverage. It will take place at 14:30 CET (GMT+1).

Many areas of science make extensive use of computer simulators that implicitly encode likelihood functions of complex systems. Classical statistical methods are poorly suited for these so-called likelihood-free inference (LFI) settings, outside the asymptotic and low-dimensional regimes. Although new machine learning methods, such as normalizing flows, have revolutionized the sample efficiency and capacity of LFI methods, it remains an open question whether they produce reliable measures of uncertainty. We present a statistical framework for LFI that unifies classical statistics with modern machine learning to: (1) efficiently construct frequentist confidence sets and hypothesis tests with finite-sample guarantees of nominal coverage (type I error control) and power; (2) provide practical diagnostics
for assessing empirical coverage over the entire parameter space. We refer to our framework as likelihood-free frequentist inference (LF2I). Any method that estimates a test statistic, like the likelihood ratio, can be plugged into our framework to create valid confidence sets and compute diagnostics, without costly Monte Carlo samples at fixed parameter settings. In this work, we specifically study the power of two test statistics (ACORE and BFF), which, respectively, maximize versus integrate an odds function over the parameter space. Our study offers multifaceted perspectives on the challenges in LF2I. This is joint work with Niccolo Dalmasso, David Zhao and Ann B. Lee.

## a neat (theoretical) Monte Carlo result

Posted in Books, Statistics, University life with tags , , , , on December 19, 2014 by xi'an

Mark Huber just arXived a short paper where he develops a Monte Carlo approach that bounds the probability of large errors

$\mathbb{P}(|\hat\mu_t-\mu|>\epsilon\mu) < 1/\delta$

by computing a lower bound on the sample size r and I wondered at the presence of μ in the bound as it indicates the approach is not translation invariant. One reason is that the standard deviation of the simulated random variables is bounded by cμ. Another reason is that Mark uses as its estimator the median

$\text{med}(S_1R_1,\ldots,S_tR_t)$

where the S’s are partial averages of sufficient length and the R’s are independent uniforms over (1-ε,1+ε): using those uniforms may improve the coverage of given intervals but it also means that the absolute scale of the error is multiplied by the scale of S, namely μ. I first thought that some a posteriori recentering could improve the bound but since this does not impact the variance of the simulated random variables, I doubt it is possible.

## checking ABC convergence via coverage

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , on January 24, 2013 by xi'an

Dennis Prangle, Michael Blum, G. Popovic and Scott Sisson just arXived a paper on diagnostics for ABC validation via coverage diagnostics. Getting valid approximation diagnostics for ABC is clearly and badly needed and this was the last slide of my talk yesterday at the Winter Workshop in Gainesville. When simulation time is not an issue (!), our DIYABC software does implement a limited coverage assessment by computing the type I error, i.e. by simulating data under the null model and evaluating the number of time it is rejected at the 5% level (see sections 2.11.3 and 3.8 in the documentation). The current paper builds on a similar perspective.

The idea in the paper is that a (Bayesian) credible interval at a given credible level α should have a similar confidence level (at least asymptotically and even more for matching priors) and that simulating pseudo-data with a known parameter value allows for a Monte-Carlo evaluation of the credible interval “true” coverage, hence for a calibration of the tolerance. The delicate issue is about the generation of those “known” parameters. For instance, if the pair (θ, y) is generated from the joint distribution prior x likelihood, and if the credible region is also based on the true posterior, the average coverage is the nominal one. On the other hand, if the credible interval is based on a poor (ABC) approximation to the posterior, the average coverage should differ from the nominal one. Given that ABC is always wrong, however, this may fail to be a powerful diagnostic. In particular, when using insufficient (summary) statistics, the discrepancy should make testing for uniformity harder, shouldn’t it?  Continue reading