## Dos de Mayo [book review]

Posted in Books with tags , , , , , , , , on August 23, 2016 by xi'an

Following a discusion I had with Victor Elvirà about Spanish books, I ordered a book by Arturo Pérez-Reverte called a Day of Wrath (un día de cólera), but apparently not translated into English. The day of wrath is the second of May, 1808, when the city of Madrid went to arms against the French occupation by Napoléon’s troops. An uprising that got crushed by Murat’s repression the very same day, but which led to the entire Spain taking arms against the occupation. The book is written out of historical accounts of the many participants to the uprising, from both Madrilene and French sides. Because of so many viewpoints being reported, some for a single paragraph before the victims die, the literary style is not particularly pleasant, but this is nonetheless a gripping book that I read within a single day while going (or trying to get) to San Francisco. And it is historically revealing of how unprepared the French troops were about an uprising by people mostly armed with navajas and a few hunting rifles. Who still managed to hold parts of the town for most of a day, with the help of a single artillery battalion while the rest of the troops stayed in their barracks. The author actually insists very much on that aspect, that the rebellion was mostly due to the action of the people, while leading classes, the Army, and the clergy almost uniformly condemned it. Upped estimations on the number of deaths on that day (and the following days) range around 500 for Madrilenes and 150 for French tropps, but the many stories running in the book give the impression of many more casualties.

## bootstrap(ed) likelihood for ABC

Posted in pictures, Statistics with tags , , , , , , , , on November 6, 2015 by xi'an

This recently arXived paper by Weixuan Zhu , Juan Miguel Marín, and Fabrizio Leisen proposes an alternative to our empirical likelihood ABC paper of 2013, or BCel. Besides the mostly personal appeal for me to report on a Juan Miguel Marín working [in Madrid] on ABC topics, along my friend Jean-Michel Marin!, this paper is another entry on ABC that connects with yet another statistical perspective, namely bootstrap. The proposal, called BCbl, is based on a reference paper by Davison, Hinkley and Worton (1992) which defines a bootstrap likelihood, a notion that relies on a double-bootstrap step to produce a non-parametric estimate of the distribution of a given estimator of the parameter θ. This estimate includes a smooth curve-fitting algorithm step, for which little description is available from the current paper. The bootstrap non-parametric substitute then plays the role of the actual likelihood, with no correction for the substitution just as in our BCel. Both approaches are convergent, with Monte Carlo simulations exhibiting similar or even identical convergence speeds although [unsurprisingly!] no deep theory is available on the comparative advantage.

An important issue from my perspective is that, while the empirical likelihood approach relies on a choice of identifying constraints that strongly impact the numerical value of the likelihood approximation, the bootstrap version starts directly from a subjectively chosen estimator of θ, which may also impact the numerical value of the likelihood approximation. In some ABC settings, finding a primary estimator of θ may be a real issue or a computational burden. Except when using a preliminary ABC step as in semi-automatic ABC. This would be an interesting crash-test for the BCbl proposal! (This would not necessarily increase the computational cost by a large amount.) In addition, I am not sure the method easily extends to larger collections of summary statistics as those used in ABC, in particular because it necessarily relies on non-parametric estimates, only operating in small enough dimensions where smooth curve-fitting algorithms can be used. Critically, the paper only processes examples with a few parameters.

The comparisons between BCel and BCbl that are produced in the paper show some gain towards BCbl. Obviously, it depends on the respective calibrations of the non-parametric methods and of regular ABC, as well as on the available computing time. I find the population genetic example somewhat puzzling: The paper refers to our composite likelihood to set the moment equations. Since this is a pseudo-likelihood, I wonder how the authors do select their parameter estimates in the double-bootstrap experiment. And for the Ising model, it is not straightforward to conceive of a bootstrap algorithm on an Ising model: (a) how does one subsample pixels and (b) what are the validity guarantees for the estimation procedure.

## model selection and multiple testing

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , on October 23, 2015 by xi'an

Ritabrata Dutta, Malgorzata Bogdan and Jayanta Ghosh recently arXived a survey paper on model selection and multiple testing. Which provides a good opportunity to reflect upon traditional Bayesian approaches to model choice. And potential alternatives. On my way back from Madrid, where I got a bit distracted when flying over the South-West French coast, from Biarritz to Bordeaux. Spotting the lake of Hourtain, where I spent my military training month, 29 years ago!

“On the basis of comparison of AIC and BIC, we suggest tentatively that model selection rules should be used for the purpose for which they were introduced. If they are used for other problems, a fresh justification is desirable. In one case, justification may take the form of a consistency theorem, in the other some sort of oracle inequality. Both may be hard to prove. Then one should have substantial numerical assessment over many different examples.”

The authors quickly replace the Bayes factor with BIC, because it is typically consistent. In the comparison between AIC and BIC they mention the connundrum of defining a prior on a nested model from the prior on the nesting model, a problem that has not been properly solved in my opinion. The above quote with its call to a large simulation study reminded me of the paper by Arnold & Loeppky about running such studies through ecdfs. That I did not see as solving the issue. The authors also discuss DIC and Lasso, without making much of a connection between those, or with the above. And then reach the parametric empirical Bayes approach to model selection exemplified by Ed George’s and Don Foster’s 2000 paper. Which achieves asymptotic optimality for posterior prediction loss (p.9). And which unifies a wide range of model selection approaches.

A second part of the survey considers the large p setting, where BIC is not a good approximation to the Bayes factor (when testing whether or not all mean entries are zero). And recalls that there are priors ensuring consistency for the Bayes factor in this very [restrictive] case. Then, in Section 4, the authors move to what they call “cross-validatory Bayes factors”, also known as partial Bayes factors and pseudo-Bayes factors, where the data is split to (a) make the improper prior proper and (b) run the comparison or test on the remaining data. They also show the surprising result that, provided the fraction of the data used to proper-ise the prior does not converge to one, the X validated Bayes factor remains consistent [for the special case above]. The last part of the paper concentrates on multiple testing but is more tentative and conjecturing about convergence results, centring on the differences between full Bayes and empirical Bayes. Then the plane landed in Paris and I stopped my reading, not feeling differently about the topic than when the plane started from Madrid.

Posted in pictures, Statistics, Travel, University life with tags , , on October 9, 2015 by xi'an

I am in Madrid for the day, discussing with friends here the details of a collaboration to a Spanish Antarctica project on wildlife. Which is of course a most exciting prospect!

Posted in Statistics, University life with tags , , , , , , , , , , , , on March 25, 2013 by xi'an

Here are the slides of my talk in Padova for the workshop Recent Advances in statistical inference: theory and case studies (very similar to the slides for the Varanasi and Gainesville meetings, obviously!, with Peter Müller commenting [at last!] that I had picked the wrong photos from Khajuraho!)

and Francesco Pauli, from Trieste, whose slides are:

These were kind and rich discussions with many interesting openings: Stefano’s idea of estimating the pivotal function h is opening new directions, obviously, as it indicates an additional degree of freedom in calibrating the method. Esp. when considering the high variability of the empirical likelihood fit depending on the the function h. For instance, one could start with a large collection of candidate functions and build a regression or a principal component reparameterisation from this collection… (Actually I did not get point #1 about ignoring f: the empirical likelihood is by essence ignoring anything outside the identifying equation, so as long as the equation is valid..) Point #2: Opposing sample free and simulation free techniques is another interesting venue, although I would not say ABC is “sample free”. As to point #3, I will certainly get a look at Monahan and Boos (1992) to see if this can drive the choice of a specific type of pseudo-likelihoods. I like the idea of checking the “coverage of posterior sets” and even more “the likelihood must be the density of a statistic, not necessarily sufficient” as it obviously relates with our current ABC model comparison work… Esp. when the very same paper is mentioned by Francesco as well. Grazie, Stefano! I also appreciate the survey made by Francesco on the consistency conditions, because I think this is an important issue that should be taken into consideration when designing ABC algorithms. (Just pointing out again that, in the theorem of Fearnhead and Prangle (2012) quoting Bernardo and Smith (1992), some conditions are missing for the mathematical consistency to apply.) I also like the agreement we seem to reach about ABC being evaluated per se rather than an a poor man’s Bayesian method. Francesco’s analysis of Monahan and Boos (1992) as validating or not empirical likelihood points out a possible link with the recent coverage analysis of Prangle et al., discussed on the ‘Og a few weeks ago. And an unsuspected link with Larry Wasserman! Grazie, Francesco!

## generalised ratio of uniforms

Posted in R, Statistics, University life with tags , , , on May 15, 2012 by xi'an

A recent arXiv posting of the paper “On the Generalized Ratio of Uniforms as a Combination of Transformed Rejection and Extended Inverse of Density Sampling” by Martino, Luengo, and Míguez from Madrid rekindled my interest in this rather peculiar simulation method. The ratio of uniforms samples uniformly on the subgraph

$\mathcal{A}=\{(v,u);\,0\le u\le\sqrt{p(v/u)}\}$

to produce simulations from p as the ratio v/u. The proof is straightforward first year calculus but I do not find the method intuitive as, say, accept/reject…. The paper gives a very detailed background on those methods, as well as on the “inverse of density method”, which is like looking at the uniform simulation over the subgraph, but with both axes inverted (slice sampling is the same on both). (A minor point of contention or at least misunderstanding: when using the inverse of density method, the authors claim that using the unormalised and the normalised versions of the target leads to the same outcome. While it is true for the direct method, I have trouble seeing the equivalent in the inverse case…) The paper also stresses that the optimal case for accept-reject is when the target is bounded because the uniform can then be used as a proposal. I agree this is a simpler solution but fail to see any optimality in the matter. The authors then study ways of transforming unbounded subgraphs into bounded domains (i.e. bounded pdfs and supports). This imposes conditions on the transform f, which must have finite limits for p(x)/f'(x) or p-1(x)/f'(x) at the boundaries. (An optimal choice is when f is the cdf of p, since then the transform is uniform.)

The remainder (and more innovative) part of the paper is less clear in that I do not get a generic feeling on what it is about! The generalisation of the above is to consider uniform sampling from

$\mathcal{A}_g=\big\{(v,u);\,0\le u\le g^{-1}\{c p[v/g'(u)]\}\big\}$

for a generic increasing function g such that g(0)=0. And c a positive constant. (Any positive constant?!) But this is from a 1991 paper by Jon Wakefield, Alan Gelfand, and Adrian Smith. The extension is thus in finding g such that the above region is bounded and can be explored by uniform sampling over a box.. And in noticing that “the generalized Ratio-of-Uniform method is a combination of the transformed rejection method applied to the inverse density with the extended inverse-of-density method” (p.27).

I wonder at the applicability of the approach for costly target functions p. And at the extension to larger dimensions. And wish I had more time (or more graduate students) to look at possible adaptive constructions of the transform g. An interesting and fruitful read, nonetheless!

## multiple try/point Metropolis algorithm

Posted in Statistics, Travel with tags , , , , , on January 23, 2012 by xi'an

Among the arXiv documents I printed at the turn of the year in order to get a better look at them (in the métro if nowhere else!), there were two papers by Luca Martino and co-authors from Universidad Carlos III, Madrid, A multi-point Metropolis scheme with generic weight functions and Different acceptance functions for multiple try Metropolis schemes. The multiple-try algorithm sounds like another version of the delayed sampling algorithm of Tierney and Mira (1999) and Green and Mira (2001). I somehow missed it, even though it was introduced in Liu et al. (2000) and Quin and Liu (2001). Multiple-try Metropolis builds upon the idea that, instead of making one proposal at a time, it is feasible to build a sequence of proposals and to pick one among those, presumably rather likely and hence more open to being accepted. The sequence of proposals may depend upon the past propositions as well as on the current value, lending some degree of adaptability to the scheme. In the current implementation, the algorithm remains rather clumsy [in my opinion] in that (a) a fixed horizon N need be fixed and (b) an additional series of backward simulations need be produced simply to keep the balance equation happy… Hence a total number of simulations O(N) for one possible acceptance. The first note slightly extends Quin and Liu (2001) by using a fairly general weighting scheme. The second paper studies some particular choices for the weights in a much less adaptive scheme (where parallelisation would be an appropriate alternative, since each proposal in the multiple try only depends on the current value of the chain). But it does not demonstrate a more efficient behaviour than when using a cycle or a mixture of Metropolis-Hastings algorithms. The method seems to regain popularity, though, as Roberto Casarin, Radu Craiu and Fabrizio Leisen (also from Carlos III)  arXived a paper on a multiple-try algorithm, connected with population Monte Carlo, and more recently published it in Statistics and Computing.