## Computational Methods for Bayesian Model Choice

Posted in Statistics, University life with tags , , on February 26, 2009 by xi'an

Next Monday, I am starting a series of four (advanced graduate) lectures about computational methods for Bayesian model choice that tries to summarise what we know about Bayes factors and evidence from a computational viewpoint. This is therefore quite related with the seminar I gave in Montréal

and the seminar I gave in Montpellier

The textbook I use as support is Chen, Shao and Ibrahim’s Monte Carlo Methods in Bayesian Computation because it is more focussed on this specific issue than Monte Carlo Statistical Methods or Bayesian Core. The courses will take place at CREST-ENSAE, Monday 3 from 11am to 1pm, Thursday 5 from 10am till 1pm, Monday 9 from 10am till 1pm and Thursday 12 from 11am till 1pm.

## Good size swans and turkeys

Posted in Books, Statistics with tags , , , , on February 24, 2009 by xi'an

In connection with The Black Swan, Nassim Taleb wrote a small essay called The Fourth Quadrant on The Edge. I found it much more pleasant to read than the book because (a) it directly focus on the difficulty of dealing with fat tail distributions and the prediction of extreme events, and (b) it is delivered in a much more serene tone than the book (imagine, just the single remark about the Frenchs!). The text contains insights on loss functions and inverse problems which, even though they are a bit vague, do mostly make sense. As for The Black Swan, I deplore (a) the underlying determinism of the author, which still seems to believe in an unknown (and possibly unachievable) model that would rule the phenomenon under study and (b) the lack of temporal perspective and of the possibility of modelling jumps as changepoints, i.e. model shifts. Time series have no reason to be stationary, the less so the more they depend on all kinds of exogeneous factors. I actually agree with Taleb that, if there is no information about the form of the tails of the distribution corresponding to the phenomenon under study—assuming there does exist a distribution—, estimating the shape of this tail from the raw data is impossible.

The essay is followed by a technical appendix that expands on fat tails, but not so deeply as to be extremely interesting. A surprising side note is that Taleb seems to associate stochastic volatility with mixtures of Gaussians. In my personal book of models, stochastic volatility is a noisy observation of the exponential of a random walk, something like$\nu_t={\exp(ax_{t-1}+b\epsilon_t)},$thus with much higher variation (and possibly no moments). To state that Student’s t distributions are more variable than stochastic volatility models is therefore unusual… There is also an analysis over a bizillion datasets of the insanity of computing kurtosis when the underlying distribution may not have even a second moment. I could not agree more: trying to summarise fat tail distributions by their four first moments does not make sense, even though it may sell well. The last part of the appendix shows the equal lack of stability of estimates of the tail index${\alpha},$which again is not a surprising phenomenon: if the tail bound K is too low, it may be that the power law has not yet quicked in while, if it is too large, then we always end up with not enough data. The picture shows how the estimate widely varies with K around its theoretical value for the log-normal and three Pareto distributions, based on a million simulations. (And this is under the same assumption of stationarity as above.) So I am not sure what the message is there. (As an aside, there seems to be a mistake in the tail expectation: it should be

$\dfrac{\int_K^\infty x x^{-\alpha} dx}{\int_K^\infty x^{-\alpha} dx} = \dfrac{K(\alpha-1)}{(\alpha-2)}$

if the density decreases in$\alpha\cdots$It is correct when$\alpha$is the tail power of the cdf.)

Posted in Books, Kids with tags , , on February 23, 2009 by xi'an

Over the past weekend, I watched Blade Runner with my kids, as I was forced to inactivity by the demise of my mailbox! I had not watched the movie for twenty years, since the time I was a postdoc at Cornell and enjoying the student movie club, so it was almost like watching Blade Runner for the first time. (In particular, except for the cut of the final scene, I could not spot changes from the 1977 version.)

The atmosphere of the movie has not changed, though, in its oppressiveness. The play on lights is a major factor for this feeling with no natural light ever used but instead side glares that enter buildings periodically, including the apartment of the detective, Deckard (which makes it appear less private, in a Big Brother kind of way), or wax candles for the magnate Tyrell. The sci-fi touch is somewhat light, except for the obligatory flying cars (in 2019?!), which is just as good because this does not age well (like, the computer screens already appear antiquated or the phones are fixed phones, not cell phones). The themes are highly reminiscent of Philip K. Dick‘s universe, with an asianised LA, including origamis (just as in The Man in the High Castle), a permanent ambiguity/paranoia about the status/feeling of the characters (it is never clear in the movie that Deckard is not a replicant), the dubious nature of humanity, and a pessimistic view of the future civilisations. I did not remember, though, the strong connections with the films noirs of the 50’s, from the light—and the omnipresent cigarette smoke diffracting this light—to the costumes, and obviously to the hard-boiled attitude of Deckard. Even though I found the interpretation of Harrisson Ford somehow missing in depth (but this may be part of the ambiguity about his true nature, human versus replicant), I still agree with my former impression of Blade Runner being truly a cult film. (Unsurprisingly, my kids found the movie terrible, if only for the “poor” special effects!)

## Model choice by Kullback projection (2)

Posted in Statistics with tags , , , , on February 20, 2009 by xi'an

Yesterday I talked about the paper of Nott and Cheng at the Bayesian model choice group and [with the help of the group] realised that my earlier comment on the paper

There is however one point with which I disagree, namely that the predictive on the submodel is obtained in the current paper by projecting a Monte Carlo or an MCMC sample from the predictive on the full model, while I think this is incorrect because the likelihood is then computed using the parameter for the full model. Using a projection of such a sample means at least reweighting by the ratio of the likelihoods…

was not completely accurate. The point is [I think] correct when considering the posterior distribution of the projected parameters. Thus, using a projection of an MCMC sample corresponing to the full model will not result in a sample from the posterior distribution of the projected parameters. On the other hand, projecting the MCMC sample in order to get the Kullback-Leibler distance posterior distribution as done in the applications of Section 7 of the paper is completely kosher, since this is a quantity that only depends on the full model parameters. Since Nott and Cheng do not consider the projected model at any time (even though Section 3 is slightly unclear, using a posterior on the projected parameter), there is nothing wrong in their paper and I do find quite interesting the idea that the lasso penalty allows for a simultaneous exploration of the most likely submodels without a recourse to a more advanced technique like reversible jump. (The comparison is obviously biased as the method does not provide a true posterior on the most likely submodels, only an approximation of their probability. Simulating from the constrained projected posterior would require extra steps.)

## ABC methods for model choice in Gibbs random fields

Posted in Statistics with tags , , , , , , on February 19, 2009 by xi'an

We have resubmitted to Bayesian Analysis a revised version of our paper ” ABC methods for model choice in Gibbs random fields” available on arXiv. The only major change is the addition of a second protein example in the biophysical illustration. The core idea in this paper is that, for Gibbs random fields and in particular for Ising models, when comparing several neighbourhood structures, the computation of the posterior probabilities of the models/structures under competition can be operated by likelihood-free simulation techniques akin to the Approximate Bayesian Computation (ABC) algorithm often discussed here. The point for this resolution is that, due to the specific structure of Gibbs random field distributions, there exists a sufficient statistic across models which allows for an exact (rather than Approximate) simulation from the posterior probabilities of the models. Obviously, when the structures grow more complex, it becomes necessary to introduce a true ABC step with a tolerance threshold$\mathbf{\epsilon}$in order to avoid running the algorithm for too long. Our toy example shows that the accuracy of the approximation of the Bayes factor can be greatly improved by resorting to the original ABC approach, since it allows for the inclusion of many more simulations. In the biophysical application to the choice of a folding structure for two proteins, we also demonstrate that we can implement the ABC solution on realistic datasets and, in the examples processed there, that the Bayes factors allow for a ranking more standard methods (FROST, TM-score) do not.

## Just another strike?

Posted in University life on February 18, 2009 by xi'an

As noted on a previous post, most French universities are currently facing strike actions. While it may sound from abroad like nothing unusual, France being famous for its Spring strikes…, this protest runs much deeper than usual strikes. It was started by a greater-autonomy project from the (French) government to delegate pay rises, promotions and teaching duties to university presidents, rather than the present use of a national standard for teaching loads and of a national committee for research evaluation and promotions. This may sound weird to US and UK academics, for whom local bargaining is the rule, but most French academics fear that the complete delegation of those powers to the sole president of their university without any counter-power will leave them unprotected, because of the small size of most universities and of the danger of nepotism and of field bias. (The presidents of French universities being elected from the faculty, they are necessarily from one field…) While I do not think strikes or demonstrations are of much use, and that an increased autonomy of our universities is a good think, I tend to agree with this analysis that university presidents suddenly got much too much power in their hand with no independent body to balance this power. The proposed changes are both too narrow (no autonomy for degrees, fees, salaries) and too broad (reduced budgets for more expenses and less positions for more duties on both administrators and faculty). Being imposed from the top in a traditional French way does not either make them more palatable nor intelligible to the community.

An aggravating factor in this crisis is a speech given by President Sarkozy on January 22 and discussed here in Nature. The tone of this speech was so scathingly dismissive of researchers and of their “petty conservatism” and so misrepresentative of a complex reality that it set most of the academic community against the speaker and obviously against the on-going project…. The most recent French Field medalist, Wendelin Werner, just published an open letter to the President in Le Monde that, given its moderate and non-agressive tone, shows how deep a drift President Sarkozy’s speech has induced between his government and the researchers’ community. I am afraid that the withdrawal of the current project will not, by itself, suffice to reconcile the community.

## The R Companion to MCSM (6)

Posted in Books, Statistics with tags , , , on February 17, 2009 by xi'an

The chapter on the Metropolis algorithm is now completed and we are thus tantalisingly close to the end! (The end of the complete draft…) The completed chapters are

1. Introduction to R programming
2. Random variable generation
3. Monte Carlo methods
4. Controlling and accelerating convergence
5. Monte Carlo optimization
6. Metropolis-Hastings algorithms
7. [=8] Convergence monitoring for MCMC algorithms

We have now reached 226 pages, 72K words (whatever that means using wc on a pdf file), and 69 figures… There still seems to be a possibility that the final chapter on Gibbs sampling could be ready by the end of the month, if both George and I make rush for it. There is no major surprise in the current chapter, with independent Metropolis and random walk Metropolis algorithms being the central heroes, but there still was work to be done, in terms of coding new examples. We also included a nice Bayesian model choice illustration inspired from Bayesian Core where the Metropolis-Hastings algorithm only moves over model indices, the parameters being analytically integrated out. Once the first draft is over, the next stepstone will be the design of the corresponding package, collecting and classifying our R codes and few datasets…

PS—The title preferred by those who voted on the poll seems to be Monte Carlo Methods with R.