## off to New York

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , on March 29, 2015 by xi'an

I am off to New York City for two days, giving a seminar at Columbia tomorrow and visiting Andrew Gelman there. My talk will be about testing as mixture estimation, with slides similar to the Nice ones below if slightly upgraded and augmented during the flight to JFK. Looking at the past seminar speakers, I noticed we were three speakers from Paris in the last fortnight, with Ismael Castillo and Paul Doukhan (in the Applied Probability seminar) preceding me. Is there a significant bias there?!

## ABC of simulation estimation with auxiliary statistics

Posted in Statistics, University life with tags , , , , on March 10, 2015 by xi'an

“In the ABC literature, an estimator that uses a general kernel is known as a noisy ABC estimator.”

Another arXival relating M-estimation econometrics techniques with ABC. Written by Jean-Jacques Forneron and Serena Ng from the Department of Economics at Columbia University, the paper tries to draw links between indirect inference and ABC, following the tracks of Drovandi and Pettitt [not quoted there] and proposes a reverse ABC sampler by

1. given a randomness realisation, ε, creating a one-to-one transform of the parameter θ that corresponds to a realisation of a summary statistics;
2. determine the value of the parameter θ that minimises the distance between this summary statistics and the observed summary statistics;
3. weight the above value of the parameter θ by π(θ) J(θ) where J is the Jacobian of the one-to-one transform.

I have difficulties to see why this sequence produces a weighted sample associated with the posterior. Unless perhaps when the minimum of the distance is zero, in which case this amounts to some inversion of the summary statistic (function). And even then, the role of the random bit  ε is unclear. Since there is no rejection. The inversion of the summary statistics seems hard to promote in practice since the transform of the parameter θ into a (random) summary is most likely highly complex.

“The posterior mean of θ constructed from the reverse sampler is the same as the posterior mean of θ computed under the original ABC sampler.”

The authors also state (p.16) that the estimators derived by their reverse method are the same as the original ABC approach but this only happens to hold asymptotically in the sample size. And I am not even sure of this weaker statement as the tolerance does not seem to play a role then. And also because the authors later oppose ABC to their reverse sampler as the latter produces iid draws from the posterior (p.25).

“The prior can be potentially used to further reduce bias, which is a feature of the ABC.”

As an aside, while the paper reviews extensively the literature on minimum distance estimators (called M-estimators in the statistics literature) and on ABC, the first quote is missing the meaning of noisy ABC, which consists in a randomised version of ABC where the observed summary statistic is randomised at the same level as the simulated statistics. And the last quote does not sound right either, as it should be seen as a feature of the Bayesian approach rather than of the ABC algorithm. The paper also attributes the paternity of ABC to Don Rubin’s 1984 paper, “who suggested that computational methods can be used to estimate the posterior distribution of interest even when a model is analytically intractable” (pp.7-8). This is incorrect in that Rubin uses ABC to explain the nature of the Bayesian reasoning, but does not in the least address computational issues.

## WSC [2]011

Posted in Mountains, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , on December 15, 2011 by xi'an

Last day at WSC 2011: as it was again raining, I could not run a second time into the South Mountain Preserve park. (I had a swim at 5am instead and ended up having a nice chat with an old man in the pool under the rain!) My first morning session was rather disappointing with two talks that remained at such a high level of generality as to be useless and a mathematical talk about new forms of stochastic approximation that included proofs and no indication on the calibration of its many parameters. During the coffee break, I tried to have a chat with a vendor of a simulation software but we were using so different vocabularies that I soon gave up. (A lot of the software on display was a-statistical in that users would build a network, specify all parameters, incl. the distributions at the different nodes and start calibrating those parameters towards a behaviour that suited them.) The second session was much more in my area of interest/expertise, with Paul Dupuis giving a talk in the same spirit as the one he gave in New York last September. using large deviations and importance sampling on diffusions. Both following talks were about specially designed importance sampling techniques for rare events and about approximating the zero variance optimal importance function: Yixi Shin gave a talk on cross-entropy based selection of mixtures for the simulation of tail events, connecting somehow with the talk on mixtures of importance sampling distributions I attended yesterday. Although I am afraid I dozed a while during the talk, it was an interesting mix with the determination of the weights by cross-entropy arguments reminded me of what we did for the population Monte Carlo approach (since it also involved some adaptive entropy optimisation). Zdravko Botev gave a talk on approximating the ideal zero variance importance function by MCMC and a sort of Rao-Blackwell estimator that gives an unbiased estimator of this density under stationarity. Then it was time to leave for the airport (and wait in a Starbucks for the plane to Minneapolis and then London to depart, as there is no such thing as a lounge in Phoenix airport…). I had an interesting exchange with a professional magician in the first plane, The Amazing Hondo!, who knew about Persi and was a former math teacher. He explained a few tricks to me, plus showed me his indeed amazing sleight of hands in manipulating cards. In exchange, I took Persi’s book on Magic and Mathematics out of my bag so that he could have look at it. (The trip to London was completely uneventful as I slept most of the way.)

Overall, WSC 2011 was an interesting experience in that (a) the talks I attended on zero variance importance simulation set me thinking again on potential applications of the apparently useless optimality result; (b) it showed me that most people using simulation do not, N.O.T., relate to Monte Carlo techniques (to the extent of being completely foreign to my domains of expertise); and (c) among the parallel sessions that cover military applications, health care simulation, &tc., there always is a theme connecting to mines, which means that I will find sessions to attend when taking part in WSC 2012 in Berlin next year (since I have been invited for a talk). This will be the first time WSC is held outside North America. Hopefully, this will attract simulation addicts from Europe as well as elsewhere.

## workshop in Columbia [day 3]

Posted in pictures, R, Running, Statistics, Travel, University life with tags , , , , , , , , , , on September 27, 2011 by xi'an

Although this was only a half-day of talks, the third day of the workshop was equally thought-challenging and diverse.  (I managed to miss the ten first minutes by taking a Line 3 train to 125th street, having overlooked the earlier split from Line 1… Crossing south Harlem on a Sunday morning is a fairly mild experience though.) Jean-Marc Azaïs gave a personal recollection on the work of Mario Wschebor, who passed away a week ago and should have attended the workshop. Nan Chen talked about the Monte Carlo approximation of quantities of the form

$\mathbb{E}[f(\mathbb{E}[Y|X])]$

which is a problem when f is non linear. This reminded me (and others) of the Bernoulli factory and of the similar trick we use in the vanilla Rao-Blackwellisation paper with Randal Douc. However, the approach was different in that the authors relied on a nested simulation algorithm that did not adapt against f. And did not account for the shape of f. Peter Glynn, while also involved in the above, delivered a talk on the initial transient that showed possibilities for MCMC convergence assessment (even though this is a much less active area than earlier). And, as a fitting conclusion, the conference organiser, Jingchen Liu gave a talk on non-compatible conditionals he and Andrew are using to handle massively-missing datasets. It reminded me of Hobert and Casella (1996, JASA) of course and also of discussions we had in Paris with Andrew and Nicolas. Looking forward to the paper (as I have missed some points about the difference between true and operational models)! Overall, this was thus a terrific workshop (I just wish I could have been able to sleep one hour more each night to be more alert during all talks!) and a fantastic if intense schedule fitting the start of the semester and of teaching (even though Robin had to teach my first R class in English on Friday). I also discovered that several of the participants were attending the Winter Simulation Conference later this year, hence another opportunity to discuss simulation strategies together.

## workshop in Columbia [day 2]

Posted in Kids, pictures, Statistics, Travel, University life with tags , , , , on September 26, 2011 by xi'an

The second day at the workshop was closer to my research topics and thus easier to follow, if equally enjoyable compared with yesterday: Jun Liu’s talk went over his modification of the Clifford-Fearnhead particle algorithm in great details, Sam Kou explained how a simulated annealing algorithm could make considerable improvement in the prediction of the 3D structure of molecules, Jeff Rosenthal showed us the recent results on and applications of adaptive MCMC, Gareth Roberts detailed his new results on the exact simulation of diffusions, and Xiao-Li Meng went back to his 2002 Read Paper to explain how we should use likelihood principles in Monte Carlo as well. And convince me I was “too young” to get the whole idea! (As I was a discussant of this paper.) All talks were thought-provoking and I enjoyed very much Gareth’s approach and description of the algorithm (as did the rest of the audience, to the point of asking too many questions during the talk!). However, the most revealing talk was Xiao-Li’s in that he did succeed in convincing me of the pertinence of his “unknown measure” approach thanks to a multiple mixture example where the actual mixture importance sampler

$\dfrac{1}{n}\sum_{i=1}^n \dfrac{q(x_i)}{\sum \pi_j p_j(x_i)}$

gets dominated by the estimated mixture version

$\dfrac{1}{n}\sum_{i=1}^n \dfrac{q(x_i)}{\sum \hat\pi_j p_j(x_i)}$

Even though I still remain skeptical by the group averaging perspective, for the same reason as earlier that the group is not acting in conjunction with the target function. Hence averaging over transforms of no relevance for the target. Nonetheless, the idea of estimating the best “importance function” based on the simulated values rather than using the genuine importance function is quite a revelation, linking with an earlier question of mine (and others) on the (lack of) exploitation of the known values of the target at the simulated points. (Maybe up to a constant.) Food for thought, certainly… In memory of this discussion, here is a picture [of an ostrich] my daughter drew at the time for my final slide in London: