## vector quantile regression

Posted in pictures, Statistics, University life with tags , , , , , , , on July 4, 2014 by xi'an

My Paris-Dauphine colleague Guillaume Carlier recently arXived a statistics paper entitled Vector quantile regression, co-written with Chernozhukov and Galichon. I was most curious to read the paper as Guillaume is primarily a mathematical analyst working on optimisation problems like optimal transport. And also because I find quantile regression difficult to fathom as a statistical problem. (As it happens, both his co-authors are from econometrics.) The results in the paper are (i) to show that a d-dimensional (Lebesgue) absolutely continuous random variable Y can always be represented as the deterministic transform Y=Q(U), where U is a d-dimensional [0,1] uniform (the paper expresses this transform as conditional on a set of regressors Z, but those essentially play no role) and Q is monotonous in the sense of being the gradient of a convex function,

$Q(u) = \nabla q(u)$ and $\{Q(u)-Q(v)\}^\text{T}(u-v)\ge 0;$

(ii) to deduce from this representation a unique notion of multivariate quantile function; and (iii) to consider the special case when the quantile function Q can be written as the linear

$\beta(U)^\text{T}Z$

where β(U) is a matrix. Hence leading to an estimation problem.

While unsurprising from a measure theoretic viewpoint, the representation theorem (i) is most interesting both for statistical and simulation reasons. Provided the function Q can be easily estimated and derived, respectively. The paper however does not provide a constructive tool for this derivation, besides indicating several characterisations as solutions of optimisation problems. From a statistical perspective, a non-parametric estimation of  β(.) would have useful implications in multivariate regression, although the paper only considers the specific linear case above. Which solution is obtained by a discretisation of all variables and  linear programming.

Posted in R, Statistics, University life with tags , , , , , , , , , on January 5, 2011 by xi'an

Yves Atchadé presented a very recent work on the fundamental issue of estimating the asymptotic variance estimation for adaptive MCMC algorithms, with an intriguing experimental observation that a non-converging bandwidth with rate 1/n was providing better coverage than the converging rate. (I always found the issue of estimating the asymptotic variance both a tough problem and an important item in convergence assessment.) Galin Jones showed new regeneration results for componentwise MCMC samplers, with applications to quantile estimation. The iid structure produced by the regeneration mechanism allows rather naturally to introduce an adaptive improvement in those algorithms, if regeneration occurs often enough. (From the days of my Stat’Sci’ paper on convergence assessment, I  love regeneration techniques for both theoretical and methodological reasons, even though they are often difficult to efficiently implement in practice.) Matti Vihola summarised several of his recent papers on the stability and convergence of adaptive MCMC algorithms, pursuing the Finnish tradition of leadership in adaptive algorithms! One point I found particularly interesting was the possibility of separating ergodicity from the Law of Large Numbers, thus reducing the constraints imposed by the containment condition. In the afternoon, Dawn Woodard discussed the convergence rate of the Gibbs sampler used for genomic motif discovery by Liu, Lawrence and Neuwald (1995). Scott Schmidler concluded the workshop by a far-ranging talk distinguishing between exploration and exploitation in adaptive MCMC algorithms, ie mixing vs burning, with illustrations using the Wang-Landau algorithm.

Thus, as in the previous editions of Adap’ski, we have had a uniformly high quality of talks about the current research in the area of adaptive algorithms (and a wee further). This shows the field is very well active and expanding, aiming at reaching a wider audience by providing verifiable convergence conditions and semi-automated softwares (like Jeff Rosenthal’s amcmc R code we used in Introducing Monte Carlo Methods with R). Looking forward Adap’ski 4 (Adap’skiV?!), hopefully in Europe and why not in Chamonix?! Which could then lead us to call the next meeting Adap’skiX…

## Back to Philly

Posted in Statistics, Travel, University life with tags , , , , , , , on December 15, 2010 by xi'an

Today and tomorrow, I am attending a conference in Wharton in honour of Larry Brown for his 70th birthday. I met Larry in 1988 when visiting Cornell for the year—even using his office in the Math department while he was away on a sabbatical leave—and it really does not feel like that long ago, nor does it feel like Larry is any close to 70 as he looks essentially the same as 22 years ago! The conference is reflecting Larry’s broad range of research from decision-theory and nonparametrics to data analysis. I am thus very glad to celebrate Larry’s birthday with a whole crowd of old and more recent friends. (My talk on Rao-Blackwellisation will be quite similar to the seminar I gave in Stanford last summer [except that I have to talk twice as fast!])

## València 9 snapshot [3]

Posted in Statistics, University life with tags , , , , on June 7, 2010 by xi'an

Today was somehow a low-key day for me in terms of talks as I was preparing a climb in the Benidorm backcountry (thanks to the advice of Alicia Quiròs) and trying to copy routes from the (low oh so low!) debit wireless at the hotel. The session I attended in the morning was on Bayesian non-parametrics, with David Dunson giving a talk on non-parametric classification, a talk whose contents were so dense in information that it felt like three talks rather than one, especially when there was no paper to back it up! Katja Ickstadt modelled graphical dependence structures using non-parametrics but also mixtures of normals across different graph structures, an innovation I found interesting if difficult to interpret. Tom Loredo concluded the session with a broad and exciting picture of the statistical challenges found in spectral astronomy (even though I often struggle to make sense of the frequency data astronomers favour).

The evening talk by Ioanna Manolopoulou was a superbly rendered study on cell dynamics with incredible 3D animations of those cell systems, representing the Langevin diffusion on the force fields in those systems as evolving vector fields. And then I gave my poster on the Savage-Dickey paradox, hence missing all the other posters in this session… The main difficulty in presenting the result was not about the measure-theoretic difficulty, but rather in explaining the Savage-Dickey representation since this was unknown to most passerbys.

## Another handbook chapter

Posted in Books, Statistics with tags , , , , , on February 11, 2010 by xi'an

As I have received over the past semester half a dozen requests for contributing chapters in different handbooks, I wrote several rather similar introductions to Bayesian statistics and/or to computational statistics. Here is one for an Handbook of Statistical Systems Biology edited by D. Balding, M. Stumpf, and M. Girolami, to be published by Wiley. It is mostly inspired from the second chapter of Bayesian Core so it is not particularly novel. If I find some extra time within the coming months, I will also include a section on nonparametric Bayes… Before, I also have to write a revised edition to my chapter Bayesian Computational Methods in the Handbook of Computational Statistics (selling at an outrageous price, like most handbooks!), edited by J. Gentle, W. Härdle and Y. Mori.

## ABC with neural nets

Posted in Statistics with tags , , on December 12, 2008 by xi'an

Blum and Francois proposed on ArXiv last september a generalisation of Beaumont et al.’s (Genetics, 2002) ABC where the local linear regression of the parameter θ on the sufficient (or summary) statistics s is replaced by a nonlinear regression with heteroskedasticity. The nonlinear mean and variance are estimated by a neural net with one hidden layer, using the R package nnet. The result is interesting in that it seems to allow for the inclusion of more or even all the simulated pairs (θ,s), compared with Beaumont et al.’s (2002). This is somehow to be expected since the nonlinear fit adapts differently to different parts of the space. Therefore, weighting simulated s’ by a kernel Kδ(s-s’) is not very relevant and it is thus not surprising that the window δ is not influential, in contrast with the basic ABC and even Beaumont et al.’s (2002) where δ has a different meaning. I do like the nonparametric perspective adopted in the paper, even though the choice of neural nets is not the only possibility since more generic (or statistical) estimation techniques could be used instead.

The part of the paper I understand less is the adaptive feature. While we did advertise adaptivity in our ABC-PMC paper, the adaptive stage is restricted to one step and seems to only consider a restriction on the support of s. This is rather surprising in that importance sampling is usually prohibited to work on restricted samples because the fundamental importance identity is then lost.