## an arithmetic mean identity

Posted in Books, pictures, R, Statistics, Travel, University life with tags , , , , , , , , , , , , on December 19, 2019 by xi'an

A 2017 paper by Ana Pajor published in Bayesian Analysis addresses my favourite problem [of computing the marginal likelihood] and which I discussed on the ‘Og, linking with another paper by Lenk published in 2012 in JCGS. That I already discussed here last year. Lenk’s (2009) paper is actually using a technique related to the harmonic mean correction based on HPD regions Darren Wraith and myself proposed at MaxEnt 2009. And which Jean-Michel and I presented at Frontiers of statistical decision making and Bayesian analysis in 2010. As I had only vague memories about the arithmetic mean version, we discussed the paper together with graduate students in Paris Dauphine.

The arithmetic mean solution, representing the marginal likelihood as the prior average of the likelihood, is a well-known approach used as well as the basis for nested sampling. With the improvement consisting in restricting the simulation to a set Ð with sufficiently high posterior probability. I am quite uneasy about P(Ð|y) estimated by 1 as the shape of the set containing all posterior simulations is completely arbitrary, parameterisation dependent, and very random since based on the extremes of this posterior sample. Plus, the set Ð converges to the entire parameter space with the number of posterior simulations. An alternative that we advocated in our earlier paper is to take Ð as the HPD region or a variational Bayes version . But the central issue with the HPD regions is how to construct these from an MCMC output and how to compute both P(Ð) and P(Ð|y). It does not seem like a good idea to set P(Ð|x) to the intended α level for the HPD coverage. Using a non-parametric version for estimating Ð could be in the end the only reasonable solution.

As a test, I reran the example of a conjugate normal model used in the paper, based on (exact) simulations from both the prior and  the posterior, and obtained approximations that were all close from the true marginal. With Chib’s being exact in that case (of course!), and an arithmetic mean surprisingly close without an importance correction:

```> print(c(hame,chme,came,chib))
[1] -107.6821 -106.5968 -115.5950 -115.3610
```

Both harmonic versions are of the right order but not trustworthy, the truncation to such a set Ð as the one chosen in this paper having little impact.

## cheating in long-distance running

Posted in pictures, Running, Travel with tags , , , , , , , , on April 10, 2016 by xi'an

This morning, I was reading at breakfast a NYT article on a presumed cheat by the winner of the 2015 Ironman Canada race, in the category of women ages 40-44… (This gruelling race takes place around Whistler, with a 2.4-mile swim, a 112 mile bike race with a huge differential, and a complete marathon!) Leading to reassess earlier victories by the same runner and ending up by her being barred from running in Ironman and Triathlon Canada races (and loosing her title for the 2015 race as well). This reminded me of Darren Wraith pointing out to me an article in Runner’s World where an independent volunteer was checking times of road-runners across the US towards detecting inconsistencies in split times and between races, or in pictures at alleged split times, eventually exposing a significant number of cheaters that had been undetected by the organisers. While I find the temptation to cheat less of a surprise than the article authors, even when nothing more than local and much temporary fame is at stake, and particularly so when a podium or a selection for a more prestigious race is at stake, the limited involvement of race officials is an issue, given how easy it is to spot those inconsistencies. Actually, it is ridiculously easy to cheat as well: when I ran the last Gertrude Cox scholarship race at JSM in 2009, my wife and I picked our tags together and ended up switching them by mistake. Which made my wife the female winner of the race until I pointed out the switch later that afternoon to the organisers. And spoke with the true winner who was surprised but unsuspecting at not being the winner. This may well be a reason for the phenomenon to be so widespread, namely that it does seem to make sense to try to cheat for a middle-of-the-pack rank, so little sense that one does not bother to voice suspicions to course officials. For instance, when I ran my most recent half-marathon in Argentan, I crossed a runner coming backward on the course route around the 11th kilometre and thought he had either given up or was acting as a pacemaker for another runner. Later I however spotted him during the awards ceremony in the first ten runners of the half-marathon! But did not do anything as I was not 100% sure it was the same runner and as being on the podium was the only reward of a possible cheat… In addition, there was no split time and hence little if any hard fact to back up my story. Maybe I’ll pay more attention on the next race!

## JSM 2009 impressions [day 5]

Posted in Statistics, Travel, University life with tags , , , on August 7, 2009 by xi'an

Last days of conferences are often low-key… This is the case for JSM 2009. The final sessions were almost empty, as most of the 6400 attendees had already left D.C. There were two Bayesian invited sessions this morning on Bayesian model choice. Jim Berger gave a talk about model multiplicity adjustment in subgroup tests that was quite in-line with my own views on the choice of the weights in model comparison, while Robert Kohn’s talk was very much related to mine, although he chose to focus on Chib’s approach to evidence/marginal likelihood approximation.

So, in conclusion of a long and intense meeting, I must acknowledge I enjoyed it. Especially when considering my earlier misgivings. Not only because of the gratifying third place in the Gertrude Cox Scholarship 5k race, but more seriously for having only attended interesting and thought-challenging sessions over the five days. (I came prepared for the frustration of having to face competing sessions by having composed my sequence of sessions in advance.) Maybe by almost exclusively sticking to invited sessions with 30mn talks and special lectures, I also avoided the feeling of wasting my time in too short and incomprehensible talks. Even though I missed some old friends, I met with many and, thanks to filling my agenda prior to the meeting, I mostly avoided the “dinner nightmare” where the party grows to the point of an impossibility theorem (of catering to everyone’s taste and of finding a place accommodating that size!). Thanks to the location in a gigantic conference center, the crowd management was fairly efficient and avoided the suffocation feeling—and hopefully swine flu contamination!—that I had in earlier meetings and that left me exhausted. At this stage, I am even considering attending JSM 2010 in beautiful Vancouver!

## JSM 2009 impressions [day 4]

Posted in Statistics, University life with tags , , , , on August 6, 2009 by xi'an

A very full day today, where I wish I could have been ubiquitous…! I first attended the particle learning session, and thus missed both Gabor Lugosi’s Medallion lecture and the memorial session for David Friedman. The particle learning session has several interesting talks, among which Raquel Prado’s with informed priors about roots in an AR model and Christian Macaro‘s on an innovative construction of mixtures of AR chains as volatilities to overcome the difficulty in handling long memory processes. I then chaired the session organised by Julien Cornebise on population Monte Carlo, a quite exciting and well-attended session, where I found the results of Mark Huber on the product estimator to offer some strong potential to study nested sampling. This means I missed Charlie Geyer’s talk, among others. The afternoon session was where I talked, along with Jun Liu and Simon Tavaré, who both gave talks full of exciting directions in connection with genomics. The planning was so horrendous that both Gareth Roberts and Judea Pearl were giving special invited lectures at the time, not to mention four Bayesian sessions in parallel… The day ended with the COPSS awards, among which The Florence Nightingale David Award was awarded to Nancy Reid for her role model in the profession, a well-deserved recognition indeed!

## JSM 2009 impressions [day 3]

Posted in Books, Running, Statistics, University life with tags , , , , , on August 5, 2009 by xi'an

The day started very early with the Gertrude Cox Scholarship 5k race, since my wife and I had to leave the hotel at 5:15am to catch the first metro to the RFK stadium. We met other runners in the metro and we all managed to get to the parking lot of the stadium. There were actually fewer runners than at the previous Gertrude Cox races I ran (like the first one in 1989 in D.C.), maybe around 40 of us, and the track for the race was one loop around the huge parking lot, not inside the stadium quite obviously. We started at about 6:20am in a warm humid weather and I managed to keep track with the two leaders for about one kilometer (3:38) before setting to my own pace. I stuck to a third place for the rest of the race, ending up in 18:28 about 30 seconds behind David Dunson and more than a minute behind the winner, in what felt like more than 5k.

The first session I attended was the Medallion lecture by Allistair Sinclair who talked about exact convergence speeds for MCMC algorithms in combinatorics. While the talk was beautifully organised and quite broad in reaching to the audience, I must admit I ended up being disappointed at the lack of connection with the MCMC developments found in Statistics, especially the huge corpus of work by Gareth Roberts and Jeff Rosenthal. This is another illustration of the gap between computer scientists working in combinatorics and applied probabilists, even though they are using the same tools. In the afternoon, I went to the Savage Award Finalists session, where the four finalist were presenting their PhD thesis work. Interestingly, they all have some Bayesian features in their work, albeit from different perspectives, and David Dunson managed to give a great discussion on those four theses at the same pace he ran the morning 5k! Later that day, at the SBSS (Section on Bayesian Statistical Science) mixer, the Savage Award was given to Lorenzo Trippa from Milano, now at the M.D. Anderson Cancer Center, Texas A & M, for his extensions of Polya tree models.

I was mentioning the new books in the Use R! series in the previous post. I spotted yesterday a book by Phil Spector on Data Manipulation with R that I immediately bought because Phil’s material on R available on the web has been quite helpful in writing Introducing Monte Carlo Methods with R. (Hence the free cap!) Note that he should not be confused with the music producer Phil Spector, who worked with the Ramones and is now in jail! I incidentally spotted two copies of the paperback version of the The Bayesian Choice printed in hard-cover by mistake but sold at the paperback price. (This is due to the new print-on-demand strategy of publishers that eliminates inventory.)