About finding tolerance in discrete setups: this also occurs with the probit/logit model where the data is made of 0’s and 1’s. In that case, you can use discrepancies like the Hamming distance in error-correcting codes…

]]>Mark: The link to Murray, Ghahramani and McKay’ 2006 paper is quite relevant. First, because those doubly untractable distributions are a perfect setting for ABC. Second, because the solution of Moller, Pettit, Berthelsen and Reeves (2004, Biometrika) is a close alternative to ABC. Indeed, the core of the Moller et al.’ method is to simulate pseudo-data as in ABC, in order to cancel the untractable part of the likelihood. If one uses as target density on the auxiliary pseudo-data the indicator function used in ABC (assuming this results in a density on the pseudo-data), then we get rather close to ABC-MCMC! Of course, there still are differences in that

(a) the auxiliary variable method of Moller et al. still requires (the functional) part of the likelihood function to be available;

(b) the A in ABC-MCMC approach stands for approximative;

(c) the connection only works when considering a distance between the data and the pseudo-data, not when using summary statistics.

It would nonetheless be interesting to see a comparison between both approaches, for instance in a Potts model.

I also have a related question (if I may). It seems that ABC solves the “doubly intractable” problem of sampling that Murray, Ghahramani and McKay raised in their 2006 paper. Of course this would be amazingly useful!

I’d like to use ABC on my problems in computational linguistics. Here would be the parameters of the grammar I would like to estimate, and might be a sentence (a string of words). The problem is: there are a lot of sentences! Even given the “true” grammar parameters \theta, the probability of generating any particular sentence is astronomically small.

In the case of discrete I don’t see how to define a useful tolerance region, and as far as I can tell, none of the methods you describe in your slides would help much either.

But even if it can’t solve my problems, ABC is still amazing. I’d be very pleased if I had a way to solve problems that Ghahramani and McKay had described as doubly intractable!

Thanks,

Mark

PS. For Probabilistic Context-Free Grammars we have MCMC algorithms (e.g., my paper), but of course real languages aren’t context-free! We have better models, but the partition functions become intractable as the models become more realistic.

]]>Mark: Thank you for your comments. The size of does not impact much the implementation of the ABC method, except that it slows execution speed down. The tolerance region being defined as a empirical quantile of the distance distribution. So the acceptance rate is fixed in advance. Of course, the larger the dataset, the harder it gets to discriminate between datasets. This is one reason why geneticists introduced summary statistics. By drastically reducing the dimension of the problem, they had a clear impact on the quality of the approximation.

]]>