Archive for instrumental variables; non-response

Au Luxembourg

Posted in pictures, Statistics, Travel, University life with tags , , , , , , on December 3, 2013 by xi'an

luxemIn a “crazy travelling week” (dixit my daughter), I gave a talk at an IYS 2013 conference organised by Stephen Senn (formerly at Glasgow) and colleagues in the city of Luxembourg, Grand Duché du Luxembourg. I enjoyed very much the morning train trip there as it was a misty morning, with the sun rising over the frosted-white countryside. (I cannot say much about the city of Luxembourg itself though as I only walked the kilometre from the station to the conference hotel and the same way back. There was a huge gap on the plateau due to a river in the middle, which would have been a nice place to run, I presume…)

One of the few talks I attended there was about an econometric model with instrumental variables. In general, and this dates back to my student’s years at ENSAE, I do not get the motivation for the distinction between endogenous and exogenous in econometrics models. Especially in non-parametric models as, if we do not want to make parametric assumptions, we have difficulties in making instead correlation hypotheses… My bent would be to parametrise everything under the suspicion of this everything being correlated with everything. The instrumental variables econometricians seem so fond of appear to me like magical beings, since we have to know they are instrumental. And because they seem to allow to always come back to a linear setting, by eliminating the non-linear parts. Sounds like a “more for less” free-lunch deal. (Any pointer would be appreciated.) The speaker there actually acknowledged (verbatim) that they are indeed magical and that they cannot be justified by mathematics or statistics. A voodoo part of econometrics then?!

A second talk that left me perplexed was about a generalised finite mixture model. The model sounded like a mixture along time of individuals, ie a sort of clustering of longitudinal data. It looked like it should be easier to estimate than usual mixtures of regressions because an individual contributed to the same regression line for all the times when it was observed. The talk was uninspiring as it missed connections to EM and to Bayesian solutions, focussing instead on a gradient method that sounded inappropriate for a multimodal likelihood. (Funny enough, the choice in the number of regressions was done by BIC.)

mostly nuisance, little interest

Posted in Statistics, University life with tags , , , , , , on February 7, 2013 by xi'an

tree next to my bike parking garage at INSEE, Malakoff, Feb. 02, 2012Sorry for the misleading if catchy (?) title, I mean mostly nuisance parameters, very few parameters of interest! This morning I attended a talk by Eric Lesage from CREST-ENSAI on non-responses in surveys and their modelling through instrumental variables. The weighting formula used to compensate for the missing values was exactly the one at the core of the Robins-Wasserman paradox, discussed a few weeks ago by Jamie in Varanasi. Namely the one with the estimated probability of response at the denominator: The solution adopted in the talk was obviously different, with linear estimators used at most steps to evaluate the bias of the procedure (since researchers in survey sampling seem particularly obsessed with bias!)

On a somehow related topic, Aris Spanos arXived a short note (that I read yesterday) about the Neyman-Scott paradox. The problem is similar to the Robins-Wasserman paradox in that there is an infinity of nuisance parameters (the means of the successive pairs of observations) and that a convergent estimator of the parameter of interest, namely the variance common to all observations, is available. While there exist Bayesian solutions to this problem (see, e.g., this paper by Brunero Liseo), they require some preliminary steps to bypass the difficulty of this infinite number of parameters and, in this respect, are involving ad-hocquery to some extent, because the prior is then designed purposefully so. In other words, missing the direct solution based on the difference of the pairs is a wee frustrating, even though this statistic is not sufficient! The above paper by Brunero also my favourite example in this area: when considering a normal mean in large dimension, if the parameter of interest is the squared norm of this mean, the MLE ||x||² (and the Bayes estimator associated with Jeffreys’ prior) is (are) very poor: the bias is constant and of the order of the dimension of the mean, p. On the other hand, if one starts from ||x||² as the observation (definitely in-sufficient!), the resulting MLE (and the Bayes estimator associated with Jeffreys’ prior) has (have) much nicer properties. (I mentioned this example in my review of Chang’s book as it is paradoxical, gaining in efficiency by throwing away “information”! Of course, the part we throw away does not contain true information about the norm, but the likelihood does not factorise and hence the Bayesian answers differ…)

I showed the paper to Andrew Gelman and here are his comments:

Spanos writes, “The answer is surprisingly straightforward.” I would change that to, “The answer is unsurprisingly straightforward.” He should’ve just asked me the answer first rather than wasting his time writing a paper!

The way it works is as follows. In Bayesian inference, everything unknown is unknown, they have a joint prior and a joint posterior distribution. In frequentist inference, each unknowns quantity is either a parameter or a predictive quantity. Parameters do not have probability distributions (hence the discomfort that frequentists have with notation such as N(y|m,s); they prefer something like N(y;m,s) or f_N(y;m,s)), while predictions do have probability distributions. In frequentist statistics, you estimate parameters and you predict predictors. In this world, estimation and prediction are different. Estimates are evaluated conditional on the parameter. Predictions are evaluated conditional on model parameters but unconditional on the predictive quantities. Hence, mle can work well in many high-dimensional problems, as long as you consider many of the uncertain quantities as predictive. (But mle is still not perfect because of the problem of boundary estimates, e.g., here..


Get every new post delivered to your Inbox.

Join 557 other followers