## information loss from the median

Posted in Books, Kids, Statistics with tags , , , , , , on April 19, 2022 by xi'an

An interesting side item from a X validated question about calculating the Fisher information for the Normal median (as an estimator of the mean). While this information is not available in closed form, it has a “nice” expression

$1+n\mathbb E[Z_{n/2:n}\varphi(Z_{n/2:n})]-n\mathbb E[Z_{n/2:n-1}\varphi(Z_{n/2:n-1})]+$
$\frac{n(n-1)}{n/2-2}\varphi(Z_{n/2-2:n-2})^2+\frac{n(n-1)}{n-n/2-1}\varphi(Z_{n/2:n-2})^2$

which can easily be approximated by simulation (much faster than by estimating the variance of said median). This shows that the median is about 1.57 less informative than the empirical mean. Bonus points for computing the information brought by the MAD statistic! (The information loss against the MLE is 2.69,  since the Monte Carlo ratio of their variances is 0.37.)

## likelihood-free inference in high-dimensional models

Posted in Books, R, Statistics, University life with tags , , , , , , , , , on September 1, 2015 by xi'an

“…for a general linear model (GLM), a single linear function is a sufficient statistic for each associated parameter…”

The recently arXived paper “Likelihood-free inference in high-dimensional models“, by Kousathanas et al. (July 2015), proposes an ABC resolution of the dimensionality curse [when the dimension of the parameter and of the corresponding summary statistics] by turning Gibbs-like and by using a component-by-component ABC-MCMC update that allows for low dimensional statistics. In the (rare) event there exists a conditional sufficient statistic for each component of the parameter vector, the approach is just as justified as when using a generic ABC-Gibbs method based on the whole data. Otherwise, that is, when using a non-sufficient estimator of the corresponding component (as, e.g., in a generalised [not general!] linear model), the approach is less coherent as there is no joint target associated with the Gibbs moves. One may therefore wonder at the convergence properties of the resulting algorithm. The only safe case [in dimension 2] is when one of the restricted conditionals does not depend on the other parameter. Note also that each Gibbs step a priori requires the simulation of a new pseudo-dataset, which may be a major imposition on computing time. And that setting the tolerance for each parameter is a delicate calibration issue because in principle the tolerance should depend on the other component values. Continue reading

## an ABC experiment

Posted in Books, pictures, R, Statistics, University life with tags , , , , , , , , on November 24, 2014 by xi'an

In a cross-validated forum exchange, I used the code below to illustrate the working of an ABC algorithm:

#normal data with 100 observations
n=100
x=rnorm(n)
#observed summaries
sumx=c(median(x),mad(x))

#normal x gamma prior
priori=function(N){
return(cbind(rnorm(N,sd=10),
1/sqrt(rgamma(N,shape=2,scale=5))))
}

ABC=function(N,alpha=.05){

prior=priori(N) #reference table

#pseudo-data
summ=matrix(0,N,2)
for (i in 1:N){
xi=rnorm(n)*prior[i,2]+prior[i,1]
summ[i,]=c(median(xi),mad(xi)) #summaries
}

#normalisation factor for the distance
mads=c(mad(summ[,1]),mad(summ[,2]))
#distance
dist=(abs(sumx[1]-summ[,1])/mads[1])+
(abs(sumx[2]-summ[,2])/mads[2])
#selection
posterior=prior[dist<quantile(dist,alpha),]}


Hence I used the median and the mad as my summary statistics. And the outcome is rather surprising, for two reasons: the first one is that the posterior on the mean μ is much wider than when using the mean and the variance as summary statistics. This is not completely surprising in that the latter are sufficient, while the former are not. Still, the (-10,10) range on the mean is way larger… The second reason for surprise is that the true posterior distribution cannot be derived since the joint density of med and mad is unavailable.

After thinking about this for a while, I went back to my workbench to check the difference with using mean and variance. To my greater surprise, I found hardly any difference! Using the almost exact ABC with 10⁶ simulations and a 5% subsampling rate returns exactly the same outcome. (The first row above is for the sufficient statistics (mean,standard deviation) while the second row is for the (median,mad) pair.) Playing with the distance does not help. The genuine posterior output is quite different, as exposed on the last row of the above, using a basic Gibbs sampler since the posterior is not truly conjugate.

## Selecting statistics for [ABC] Bayesian model choice

Posted in Statistics, University life with tags , , , , , , , , , on October 25, 2011 by xi'an

At last, we have completed, arXived, and submitted our paper on the evaluation of summary statistics for Bayesian model choice! (I had presented preliminary versions at the recent workshops in New York and Zürich.) While broader in scope, the results obtained by Judith Rousseau, Jean-Michel Marin, Natesh Pillai, and myself bring an answer to the question raised by our PNAS paper on ABC model choice. Almost as soon as we realised the problem, that is, during MCMC’Ski in Utah, I talked with Judith about a possible classification of statistics in terms of their Bayes factor performances and we started working on that… While the idea of separating the mean behaviour of the statistics under both model came rather early, establishing a complete theoretical framework that validated this intuition took quite a while and the assumptions changed a few times around the summer. The simulations associated with the paper were straightforward in that (a) the setup had been suggested to us by a referee of our PNAS paper: compare normal and Laplace distributions with different summary statistics (inc. the median absolute deviation), (b) the theoretical results told us what to look for, and (c) they did very clearly exhibit the consistency and inconsistency of the Bayes factor/posterior probability predicted by the theory. Both boxplots shown here exhibit this agreement: when using (empirical) mean, median, and variance to compare normal and Laplace models, the posterior probabilities do not select the “true” model but instead aggregate near a fixed value. When using instead the median absolute deviation as summary statistic, the posterior probabilities concentrate near one or zero depending on whether or not the normal model is the true model.

The main result states that, under some “heavy-duty” assumptions, (a) if the “true” mean of the summary statistic can be recovered for both models under comparison, then the Bayes factor has the same asymptotic behaviour as n to the power -(d1 – d2)/2, irrespective of which one is the true model. (The dimensions d1 and  d2 are the effective dimensions of the asymptotic means of the summary statistic under both models.) Therefore, the Bayes factor always asymptotically selects the model having the smallest effective dimension and cannot be consistent. (b) if, instead, the “true” mean of the summary statistic cannot be represented in the other model, then the Bayes factor  is consistent. This means that, somehow, the best statistics to be used in an ABC approximation to a Bayes factor are ancillary statistics with different mean values under both models. Else, the summary statistic must have enough components to prohibit a parameter under the “wrong” model to meet the “true” mean of the summary statistic.

(As a striking coincidence, Hélene Massam and Géard Letac [re]posted today on arXiv a paper about the behaviour of the Bayes factor for contingency tables when the hyperparameter goes to zero, where they establish the consistency of the said Bayes factor under the sparser model. No Jeffreys-Lindley paradox in that case.)