## likelihood-free inference by ratio estimation

Posted in Books, Mountains, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , on September 9, 2019 by xi'an

“This approach for posterior estimation with generative models mirrors the approach of Gutmann and Hyvärinen (2012) for the estimation of unnormalised models. The main difference is that here we classify between two simulated data sets while Gutmann and Hyvärinen (2012) classified between the observed data and simulated reference data.”

A 2018 arXiv posting by Owen Thomas et al. (including my colleague at Warwick, Rito Dutta, CoI warning!) about estimating the likelihood (and the posterior) when it is intractable. Likelihood-free but not ABC, since the ratio likelihood to marginal is estimated in a non- or semi-parametric (and biased) way. Following Geyer’s 1994 fabulous estimate of an unknown normalising constant via logistic regression, the current paper which I read in preparation for my discussion in the ABC optimal design in Salzburg uses probabilistic classification and an exponential family representation of the ratio. Opposing data from the density and data from the marginal, assuming both can be readily produced. The logistic regression minimizing the asymptotic classification error is the logistic transform of the log-ratio. For a finite (double) sample, this minimization thus leads to an empirical version of the ratio. Or to a smooth version if the log-ratio is represented as a convex combination of summary statistics, turning the approximation into an exponential family,  which is a clever way to buckle the buckle towards ABC notions. And synthetic likelihood. Although with a difference in estimating the exponential family parameters β(θ) by minimizing the classification error, parameters that are indeed conditional on the parameter θ. Actually the paper introduces a further penalisation or regularisation term on those parameters β(θ), which could have been processed by Bayesian Lasso instead. This step is essentially dirving the selection of the summaries, except that it is for each value of the parameter θ, at the expense of a X-validation step. This is quite an original approach, as far as I can tell, but I wonder at the link with more standard density estimation methods, in particular in terms of the precision of the resulting estimate (and the speed of convergence with the sample size, if convergence there is).

## vector quantile regression

Posted in pictures, Statistics, University life with tags , , , , , , , on July 4, 2014 by xi'an

My Paris-Dauphine colleague Guillaume Carlier recently arXived a statistics paper entitled Vector quantile regression, co-written with Chernozhukov and Galichon. I was most curious to read the paper as Guillaume is primarily a mathematical analyst working on optimisation problems like optimal transport. And also because I find quantile regression difficult to fathom as a statistical problem. (As it happens, both his co-authors are from econometrics.) The results in the paper are (i) to show that a d-dimensional (Lebesgue) absolutely continuous random variable Y can always be represented as the deterministic transform Y=Q(U), where U is a d-dimensional [0,1] uniform (the paper expresses this transform as conditional on a set of regressors Z, but those essentially play no role) and Q is monotonous in the sense of being the gradient of a convex function,

$Q(u) = \nabla q(u)$ and $\{Q(u)-Q(v)\}^\text{T}(u-v)\ge 0;$

(ii) to deduce from this representation a unique notion of multivariate quantile function; and (iii) to consider the special case when the quantile function Q can be written as the linear

$\beta(U)^\text{T}Z$

where β(U) is a matrix. Hence leading to an estimation problem.

While unsurprising from a measure theoretic viewpoint, the representation theorem (i) is most interesting both for statistical and simulation reasons. Provided the function Q can be easily estimated and derived, respectively. The paper however does not provide a constructive tool for this derivation, besides indicating several characterisations as solutions of optimisation problems. From a statistical perspective, a non-parametric estimation of  β(.) would have useful implications in multivariate regression, although the paper only considers the specific linear case above. Which solution is obtained by a discretisation of all variables and  linear programming.

Posted in R, Statistics, University life with tags , , , , , , , , , on January 5, 2011 by xi'an

Yves Atchadé presented a very recent work on the fundamental issue of estimating the asymptotic variance estimation for adaptive MCMC algorithms, with an intriguing experimental observation that a non-converging bandwidth with rate 1/n was providing better coverage than the converging rate. (I always found the issue of estimating the asymptotic variance both a tough problem and an important item in convergence assessment.) Galin Jones showed new regeneration results for componentwise MCMC samplers, with applications to quantile estimation. The iid structure produced by the regeneration mechanism allows rather naturally to introduce an adaptive improvement in those algorithms, if regeneration occurs often enough. (From the days of my Stat’Sci’ paper on convergence assessment, I  love regeneration techniques for both theoretical and methodological reasons, even though they are often difficult to efficiently implement in practice.) Matti Vihola summarised several of his recent papers on the stability and convergence of adaptive MCMC algorithms, pursuing the Finnish tradition of leadership in adaptive algorithms! One point I found particularly interesting was the possibility of separating ergodicity from the Law of Large Numbers, thus reducing the constraints imposed by the containment condition. In the afternoon, Dawn Woodard discussed the convergence rate of the Gibbs sampler used for genomic motif discovery by Liu, Lawrence and Neuwald (1995). Scott Schmidler concluded the workshop by a far-ranging talk distinguishing between exploration and exploitation in adaptive MCMC algorithms, ie mixing vs burning, with illustrations using the Wang-Landau algorithm.

Thus, as in the previous editions of Adap’ski, we have had a uniformly high quality of talks about the current research in the area of adaptive algorithms (and a wee further). This shows the field is very well active and expanding, aiming at reaching a wider audience by providing verifiable convergence conditions and semi-automated softwares (like Jeff Rosenthal’s amcmc R code we used in Introducing Monte Carlo Methods with R). Looking forward Adap’ski 4 (Adap’skiV?!), hopefully in Europe and why not in Chamonix?! Which could then lead us to call the next meeting Adap’skiX…

## Back to Philly

Posted in Statistics, Travel, University life with tags , , , , , , , on December 15, 2010 by xi'an

Today and tomorrow, I am attending a conference in Wharton in honour of Larry Brown for his 70th birthday. I met Larry in 1988 when visiting Cornell for the year—even using his office in the Math department while he was away on a sabbatical leave—and it really does not feel like that long ago, nor does it feel like Larry is any close to 70 as he looks essentially the same as 22 years ago! The conference is reflecting Larry’s broad range of research from decision-theory and nonparametrics to data analysis. I am thus very glad to celebrate Larry’s birthday with a whole crowd of old and more recent friends. (My talk on Rao-Blackwellisation will be quite similar to the seminar I gave in Stanford last summer [except that I have to talk twice as fast!])

## València 9 snapshot [3]

Posted in Statistics, University life with tags , , , , on June 7, 2010 by xi'an

Today was somehow a low-key day for me in terms of talks as I was preparing a climb in the Benidorm backcountry (thanks to the advice of Alicia Quiròs) and trying to copy routes from the (low oh so low!) debit wireless at the hotel. The session I attended in the morning was on Bayesian non-parametrics, with David Dunson giving a talk on non-parametric classification, a talk whose contents were so dense in information that it felt like three talks rather than one, especially when there was no paper to back it up! Katja Ickstadt modelled graphical dependence structures using non-parametrics but also mixtures of normals across different graph structures, an innovation I found interesting if difficult to interpret. Tom Loredo concluded the session with a broad and exciting picture of the statistical challenges found in spectral astronomy (even though I often struggle to make sense of the frequency data astronomers favour).

The evening talk by Ioanna Manolopoulou was a superbly rendered study on cell dynamics with incredible 3D animations of those cell systems, representing the Langevin diffusion on the force fields in those systems as evolving vector fields. And then I gave my poster on the Savage-Dickey paradox, hence missing all the other posters in this session… The main difficulty in presenting the result was not about the measure-theoretic difficulty, but rather in explaining the Savage-Dickey representation since this was unknown to most passerbys.