## selecting summary statistics [a tale of two distances]

Posted in Books, Statistics with tags , , , , , , , , , , , , , , on May 23, 2019 by xi'an

As Jonathan Harrison came to give a seminar in Warwick [which I could not attend], it made me aware of his paper with Ruth Baker on the selection of summaries in ABC. The setting is an ABC-SMC algorithm and it relates with Fearnhead and Prangle (2012), Barnes et al. (2012), our own random forest approach, the neural network version of Papamakarios and Murray (2016), and others. The notion here is to seek the optimal weights of different summary statistics in the tolerance distance, towards a maximization of a distance (Hellinger) between prior and ABC posterior (Wasserstein also comes to mind!). A sort of dual of the least informative prior. Estimated by a k-nearest neighbour version [based on samples from the prior and from the ABC posterior] I had never seen before. I first did not get how this k-nearest neighbour distance could be optimised in the weights since the posterior sample was already generated and (SMC) weighted, but the ABC sample can be modified by changing the [tolerance] distance weights and the resulting Hellinger distance optimised this way. (There are two distances involved, in case the above description is too murky!)

“We successfully obtain an informative unbiased posterior.”

The paper spends a significant while in demonstrating that the k-nearest neighbour estimator converges and much less on the optimisation procedure itself, which seems like a real challenge to me when facing a large number of particles and a high enough dimension (in the number of statistics). (In the examples, the size of the summary is 1 (where does the weight matter?), 32, 96, 64, with 5 10⁴, 5 10⁴, 5 10³ and…10 particles, respectively.) The authors address the issue, though, albeit briefly, by mentioning that, for the same overall computation time, the adaptive weight ABC is indeed further from the prior than a regular ABC with uniform weights [rather than weighted by the precisions]. They also argue that down-weighting some components is akin to selecting a subset of summaries, but I beg to disagree with this statement as the weights are never exactly zero, as far as I can see, hence failing to fight the curse of dimensionality. Some LASSO version could implement this feature.

## Bhattacharyya distance versus Kullback-Leibler divergence

Posted in Books, Kids, Statistics with tags , , , , on January 10, 2015 by xi'an

Another question I picked on Cross Validated during the Yule break is about the connection between the Bhattacharyya distance and the Kullback-Leibler divergence, i.e.,

$d_B(p,q)=-\log\left\{\int\sqrt{p(x)q(x)}\,\text{d}x\right\}$

and

$d_{KL}(p\|q)=\int\log\left\{{q(x)}\big/{p(x)}\right\}\,p(x)\,\text{d}x$

Although this Bhattacharyya distance sounds close to the Hellinger distance,

$d_H(p,q)=\left\{1-\int\sqrt{p(x)q(x)}\,\text{d}x\right\}^{1/2}$

the ordering I got by a simple Jensen inequality is

$d_{KL}(p\|q)\ge2d_B(p,q)\ge2d_H(p,q)^2\,.$

and I wonder how useful this ordering could be…