The control variate writes

where π is the target density and φ is a free function to be optimised. (Under the constraint that πφ is integrable. Then the expectation of ψ_{φ} is indeed zero.) The “explanation” for the sub-root-n behaviour is that ψ_{φ} is chosen as an L_{2} regression. When looking at the sub-root-n convergence proof, the explanation is more of a Rao-Blackwellisation type, assuming a first level convergent (or *presistent*) approximation to the integrand [of the above form ψ_{φ} can be found. The optimal φ is the solution of a differential equation that needs estimating and the paper concentrates on approximating strategies. This connects with Antonietta Mira’s zero variance control variates, but in a non-parametric manner, adopting a Gaussian process as the prior on the unknown φ. And this is where the huge innovation in the paper resides, I think, i.e. in assuming a Gaussian process prior on the control functional *and* in managing to preserve unbiasedness. As in many of its implementations, modelling by Gaussian processes offers nice features, like ψ_{φ} being itself a Gaussian process. Except that it cannot be shown to lead to presistency on a theoretical basis. Even though it appears to hold in the examples of the paper. Apart from this theoretical difficulty, the potential hardship with the method seems to be in the implementation, as there are several parameters and functionals to be calibrated, hence calling for cross-validation which may often be time-consuming. The gains are humongous, so the method should be adopted whenever the added cost in implementing it is reasonable, cost which evaluation is not clearly provided by the paper. In the toy Gaussian example where everything can be computed, I am surprised at the relatively poor performance of a Riemann sum approximation to the integral, wondering at the level of quadrature involved therein. The paper also interestingly connects with O’Hagan’s (1991) Bayes-Hermite [polynomials] quadrature and quasi-Monte Carlo [obviously!].

Filed under: Books, Statistics, University life Tagged: control variate, convergence rate, Gaussian processes, Monte Carlo Statistical Methods, simulation, University of Warwick ]]>

Filed under: Travel, Wines Tagged: carignan, Faugères, grenache, Languedoc, mourvèdre, red wine, Syrah ]]>

Using Bayesian Linear Mixed Models in Psycholinguistics: Some open issues

With the arrival of the probabilistic programming language Stan (and JAGS), it has become relatively easy to fit fairly complex Bayesian linear mixed models. Until now, the main tool that was available in R was lme4. I will talk about how we have fit these models in recently published work (Husain et al 2014, Hofmeister and Vasishth 2014). We are trying to develop a standard approach for fitting these models so that graduate students with minimal training in statistics can fit such models using Stan.

I will discuss some open issues that arose in the course of fitting linear mixed models. In particular, one issue is: should one assume a full variance-covariance matrix for random effects even when there is not enough data to estimate all parameters? In lme4, one often gets convergence failure or degenerate variance-covariance matrices in such cases and so one has to back off to a simpler model. But in Stan it is possible to assume vague priors on each parameter, and fit a full variance-covariance matrix for random effects. The advantage of doing this is that we faithfully express in the model how the data were generated—if there is not enough data to estimate the parameters, the posterior distribution will be dominated by the prior, and if there is enough data, we should get reasonable estimates for each parameter. Currently we fit full variance-covariance matrices, but we have been criticized for doing this. The criticism is that one should not try to fit such models when there is not enough data to estimate parameters. This position is very reasonable when using lme4; but in the Bayesian setting it does not seem to matter.

Filed under: Books, Statistics, University life Tagged: Bayesian linear mixed models., Bayesian modelling, JAGS, linear mixed models, lme4, prior domination, psycholinguistics, STAN, Universität Potsdam ]]>

Filed under: pictures, Running Tagged: geese, Parc de Sceaux ]]>

Filed under: Books, Kids, Running, Statistics, University life Tagged: Birmingham, control variate, Coventry, English train, goose, London Midlands, Mark Girolami, Nicolas Chopin, particle MCMC, simulation model, taxi-driver, Tony O'Hagan, University of Warwick ]]>

Filed under: Kids, pictures, Travel Tagged: Barbamama, Barbapapa, bicycle, mountain bike, Paris ]]>

“Franchement j’ai pas lu encore vos papiers en détails, mais j’apprécie vos axes de recherche et j’aimerai bien en faire autant avec votre collaboration, bien sûr. Actuellement, je suis à la recherche d’un sujet de thèse et c’est pour cela que je vous écris. Je suis prêt à négocier sur tout point et de tout coté.”

[Frankly I have not yet read your papers in detail , but I appreciate your research areas and I would love to do the same with your help , of course. Currently, I am looking for a thesis topic and this is why I write to you. I am willing to negotiate on any point and any side.]

Filed under: Kids, Statistics, University life Tagged: foreign students, PhD s, PhD topic ]]>

**A**pproximate Bayesian computation techniques are 2000’s successors of MCMC methods as handling new models where MCMC algorithms are at a loss, in the same way the latter were able in the 1990’s to cover models that regular Monte Carlo approaches could not reach. While they first sounded like “quick-and-dirty” solutions, only to be considered until more elaborate solutions could (not) be found, they have been progressively incorporated within the statistican’s toolbox as a novel form of non-parametric inference handling partly defined models. A statistically relevant feature of those ACB methods is that they require replacing the data with smaller dimension summaries or statistics, because of the complexity of the former. In almost every case when calling ABC is the unique solution, those summaries are not sufficient and the method thus implies a loss of statistical information, at least at a formal level since relying on the raw data is out of question. This forced reduction of statistical information raises many relevant questions, from the choice of summary statistics to the consistency of the ensuing inference.

In this paper of the special MCMSki 4 issue of *Statistics and Computing*, Stoehr et al. attack the recurrent problem of selecting summary statistics for ABC in a hidden Markov random field, since there is no fixed dimension sufficient statistics in that case. The paper provides a very broad overview of the issues and difficulties related with ABC model choice, which has been the focus of some advanced research only for a few years. Most interestingly, the authors define a novel, local, and somewhat Bayesian misclassification rate, an error that is conditional on the observed value and derived from the ABC reference table. It is the posterior predictive error rate

integrating in both the model index m and the corresponding random variable Y (and the hidden intermediary parameter) given the observation. Or rather given the transform of the observation by the summary statistic S. The authors even go further to define the error rate of a classification rule based on a first (collection of) statistic, conditional on a second (collection of) statistic (see Definition 1). A notion rather delicate to validate on a fully Bayesian basis. And they advocate the substitution of the unreliable (estimates of the) posterior probabilities by this local error rate, estimated by traditional non-parametric kernel methods. Methods that are calibrated by cross-validation. Given a reference summary statistic, this perspective leads (at least in theory) to select the optimal summary statistic as the one leading to the minimal local error rate. Besides its application to hidden Markov random fields, which is of interest *per se*, this paper thus opens a new vista on calibrating ABC methods and evaluating their true performances conditional on the actual data. (The advocated abandonment of the posterior probabilities could almost justify the denomination of a *paradigm shift*. This is also the approach advocated in our random forest paper.)

Filed under: Books, Kids, Statistics, University life Tagged: ABC, arXiv, Cross Validation, Gibbs random field, hidden Markov models, Markov random field, Monte Carlo Statistical Methods, paradigm shift, Pierre Pudlo, predictive loss, simulation, summary statistics ]]>

“…in contrast with the empirical likelihood method, the bootstrap likelihood doesn’t require any set of subjective constrains taking advantage from the bootstrap methodology. This makes the algorithm an automatic and reliable procedure where only a few parameters need to be specified.”

The spirit is indeed quite similar to ours in that a non-parametric substitute plays the role of the actual likelihood, with no correction for the substitution. Both approaches are convergent, with similar or identical convergence speeds. While the empirical likelihood relies on a choice of parameter identifying constraints, the bootstrap version starts directly from the [subjectively] chosen estimator of θ. For it indeed needs to be *chosen*. And computed.

“Another benefit of using the bootstrap likelihood (…) is that the construction of bootstrap likelihood could be done once and not at every iteration as the empirical likelihood. This leads to significant improvement in the computing time when different priors are compared.”

This is an improvement that could apply to the empirical likelihood approach, as well, once a large enough collection of likelihood values has been gathered. But only in small enough dimensions where smooth curve-fitting algorithms can operate. The same criticism applying to the derivation of a non-parametric density estimate for the distribution of the estimator of θ. Critically, the paper only processes examples with a few parameters.

In the comparisons between BC_{el} and BC_{bl} that are produced in the paper, the gain is indeed towards BC_{bl}. Since this paper is mostly based on examples and illustrations, not unlike ours, I would like to see more details on the calibration of the non-parametric methods and of regular ABC, as well as on the computing time. And the variability of both methods on more than a *single* Monte Carlo experiment.

I am however uncertain as to how the authors process the population genetic example. They refer to the composite likelihood used in our paper to set the moment equations. Since this is not the true likelihood, how do the authors select their parameter estimates in the double-bootstrap experiment? The inclusion of Crakel’s and Flegal’s (2013) bivariate Beta, is somewhat superfluous as this example sounds to me like an artificial setting.

In the case of the Ising model, maybe the pre-processing step in our paper with Matt Moores could be compared with the other algorithms. In terms of BC_{bl}, how does the bootstrap operate on an Ising model, i.e. (a) how does one subsample pixels and (b)what are the validity guarantees?

A test that would be of interest is to start from a standard ABC solution and use this solution as the reference estimator of θ, then proceeding to apply BC_{bl} for that estimator. Given that the reference table would have to be produced only once, this would not necessarily increase the computational cost by a large amount…

Filed under: Books, R, Statistics, University life Tagged: ABC, ABCel, bivariate Beta distribution, bootstrap, bootstrap likelihood, double bootstrap, empirical likelihood, Ising model, population genetics ]]>

On December 4-5, Université Paris-Dauphine will host the 6th French Econometric Conference, which celebrates Christian Gouriéroux and his contributions to econometrics. (Christian was my statistics professor during my graduate years at ENSAE and then Head of CREST when I joined this research unit, first as a PhD student and later as Head of the statistics group. And he has always been a tremendous support for me.)

Not only is the program quite impressive, with co-authors of Christian Gouriéroux and a few Nobel laureates (if not the latest, Jean Tirole, who taught economics at ENSAE when I was a student there), but registration is free. I will most definitely attend the talks, as I am in Paris-Dauphine at this time of year (the week before NIPS). In particular, looking forward to Gallant’s views on Bayesian statistics.

Filed under: Books, Kids, pictures, Statistics, University life Tagged: Bayesian econometrics, Christian Gouriéroux, CREST, econometrics, ENSAE, Jean Tirole, Nobel Prize, Université Paris Dauphine ]]>